SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Applied Text Classification– Is Deep Learning Business Ready?
Examples from the HR Industry
• Introduction to Experteer
• Limitations of job search
• Improving job search with ML
– our current tech
• Usind Deep Learning for text
classification and benchmarking
• Summary and next steps
/ Agenda
/ Introduction
A little bit about me
• Moved to Munich in 2006 from Sofia, Bulgaria;
• Studied Finance/Statistics/Econometry in LMU Munich;
• Initial setup of the data science department in Experteer;
• Lead a large-scale ML initiative to automate core processes;
• Heading Data Services at Experteer.
/ Experteer Data Services
• Provides ML Services and Data Science for
Experteer;
• Offers a range of ML APIs for external HR Tech
Clients;
• Consulting services for applied machine learning
across all industries;
• AI Workshops for value chain optimization
accross all industries.
/ Introduction to Experteer
Europe’s executive
career service
www.experteer.com
/ Traditional Job Search is Bad!
Full-text search is just not enough...
Searching for a “CEO” position on a job board….
Returns „HR Manager“ as a first result.
/ Taxonomy fixes full-text search limitations
Experteer delivers better search results with taxonomy filters.
Career Level Filter
/ Our Problem
Very complex, manual data processing process with exponential costs.
State 2014
• Highly customized and manual process of job data;
• Team of 80-100 ppl;
• Hand-picking and classifying jobs (90% left-out);
• Extensive Job Classification Taxonomy
 19 Functions
 631 Industries on 4 Levels
 8 Career Levels
 Location, education, company and subsidiary, salary, education, travel requirements
• 7 Languages; 12 Countries.
• Major asset:Extremely good quality of the positions. 2 mio+ hand-classified jobs in multiple
languages.
JOB COST: 3€/JOB
/ Job Classification Example
1
Manager, App Store Program Management
Job Summary
Apple is seeking a Manager for the App Store Program Management Team. This role will lead a
team of engineering program managers responsiblefor end-to-end delivery of App Store features
across iOS, macOS, and tvOS platforms.
Key Qualifications
8+ years of professional experience in software program/project/product management
Proven experience in building and managing high performing teams and individuals
Proven track record in managing and deploying large,complex programs
Strong relationship management and facilitation skills both within diverseengineering teams and
cross functional organizations
Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities
Great attention to detail and organized
Excellent written and verbal communication skills
Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value
quickly to our customers
RequiedExperience
- Understanding of mobile software development
- Understanding of server-based software development
- Knowledge of iTunes Connect,App Store, iOStechnologies
Description
We are looking for a seasoned manager with a proven track record in program management. This
is not just a peoplemanager role, you will need to have hands on experience managing software
releases. You will develop tools and processes to gain efficiencies in the build,development,
testing, and deployment lifecycle. The role requires a combination of program and release
management, strong engineering background,and ability to build collaborative relationships
across various teams in Apple. We are looking for someonewho loves digging into details,building
teams, and driving operational efficiencies under demanding timeframes. You take responsibility;
you feel a personal stake in the product you ship; you communicateresponsibilities and scope
clearly; you value integrity; you manage risk; you need to know how things work; you work for the
success of the entire PM team; you thrive in uncertainty and strive to bring order to it; you have
deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are
aware of politics but do not get mired in them.
Education
BS/MS in Computer Science, Engineering or similar technical field
3
2
4
5 6
7
8
• Building teams – indicator for people manager
• Education is an indicator for function.
• Good indicator for the function selection
• “building and managing high performing teams” – this is a
strong indication for a people manager career level;
• Indicates the industry – Software companies
• Indicator for at least people manager career level
• Indicator the function
• “Manager” is a soft indication, has to be looked into
context. It could indicate management responsibilities,
but it depends on the rest ot the responsibilities.
7
1
2
3
4
5
6
8
Career Level Function Industry
/ Our Solution with Machine Learning
Step 1 – break down the whole value chain into small steps
• Breakdown of the whole process in small steps and start with the low-hanging fruits
Linguistic Rules – manage/create rules for extraction/classification QC UI
/ Our Classification Stack
Step 2: Build a classification pipeline
Linguistic Rules
Supplementalbusiness logic
rules to to improve modelscore
Base Model
Ensemble oflinear classifiers
(SVM, NB) with (c)BOW, n-grams
Data
cleaning
Confidence
Evaluation
Random QC
Check
Re-train models with hand-checked data
Create business logic rules (if necessary)
Feature Engineering
Job
DB
/ Goal: Half Cost/Double Jobs
We managed to decrease our cost by more than 50% and increased the output by 3x
0%
20%
40%
60%
80%
100%
120%
140%
160%
0
50000
100000
150000
200000
250000
300000
350000
400000
2010-01
2010-04
2010-07
2010-10
2011-01
2011-04
2011-07
2011-10
2012-01
2012-04
2012-07
2012-10
2013-01
2013-04
2013-07
2013-10
2014-01
2014-04
2014-07
2014-10
2015-01
2015-04
2015-07
2015-10
2016-01
2016-04
2016-07
2016-10
2017-01
Live Jobs
Cost Change %
Live
Jobs
SUCCESS!
Unit Cost
3 0€ per
Job
/ What is Next? DEEP NEURAL NETWORKS!
Benchmark a bunch of DNN on our data!
• Traditional models with BoW and n-grams capture only partially complexity of career levels;
• Deep Neural Networks have recently become very popular for text processing and NLP tasks;
• CNNs have been successfully adapted to computer vision because of the compositional structure
of an image;
• Texts have similar properties: characters combine to form words, n-grams, stems, phrases,
sentences, etc;
• CNN-based models achieve very good performance in laboratory practice, what about real-life
business problems?
• To our knowledge, no one has used deep neural networks for job classification.
ALL LOOKS GOOD! LETS TRY IT!
/ Datasets
Overview of the datasets
DATASET 1
622K jobs (title, description).
Average length: 2203 characters.
Collected over 8 years.
Created by more than 300 people.
Includes datapoints from junior colleagues.
Career Level Number of Jobs
Specialist 189,637
Senior Specialist 283,125
Manager 109,758
Senior Manager 26,815
Business Unit Leader 6,748
Managing Director SME 5,132
Managing Director Large
Comp
386
DATASET 2
243K jobs (title, description).
Average length 2197 characters.
Collected over 6 years.
Created by 80 people.
Excludes junior colleagues.
Only include jobs reviewed by QC Team.
Career Level Number of Jobs
Specialist 57,844
Senior Specialist 142,771
Manager 39,870
Senior Manager 2,897
Business Unit Leader 375
Managing Director SME 152
Managing Director Large
Comp
3
/ Tech Setup & Stack
Machines used for training
Nvidia Titan X 12GB NVIDIA Quadro P6000 24 GB
256 GB Ram
24 x 3.0 GhZ CPU
Machine 1 Machine 2
CUDA + PyTorch
/ The Test: Career Level Classification
Example of how a human would read classify a job for career level
Manager, App Store Program Management
Job Summary
Appleis seeking a Manager for the App Store Program Management Team. This rolewill lead a
team of engineering program managers responsible for end-to-end delivery of App Store features
across iOS, macOS, and tvOS platforms.
Key Qualifications
8+ years of professional experience in software program/project/product management
Proven experience in building and managing high performing teams and individuals
Proven track record in managing and deploying large,complex programs
Strong relationship management and facilitation skills both within diverseengineering teams and
cross functional organizations
Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities
Great attention to detail and organized
Excellent written and verbal communication skills
Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value
quickly to our customers
Required Experience
- Understanding of mobilesoftware development
- Understanding of server-based software development
- Knowledgeof iTunes Connect,App Store, iOStechnologies
Description
We are looking for a seasoned manager with a proven track record in program management. This
is not just a people manager role, you will need to have hands on experience managing software
releases. You will develop tools and processes to gain efficiencies in the build,development,
testing, and deployment lifecycle. The role requires a combination of program and release
management, strong engineering background,and ability to build collaborative relationships
across various teams in Apple. We are looking for someonewho loves digging into details,building
teams, and driving operational efficiencies under demanding timeframes. You take responsibility;
you feel a personal stake in the product you ship; you communicateresponsibilities and scope
clearly; you value integrity; you manage risk; you need to know how things work; you work for the
success of the entire PM team; you thrivein uncertainty and strive to bring order to it; you have
deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are
aware of politics but do not get mired in them.
Education
BS/MS in Computer Science, Engineering or similar technical field
1
3
4
5
• “Leading a team” is a soft indicator for a manager
• “building and managing high performing teams” – this is a
strong indication for a people manager career level.
• Indicator for at least people manager career level;
• “Building teams” – indicator for manager
• “Manager” is a soft indication, has to be looked into
context. It could indicate management responsibilities,
but it depends on the rest ot the responsibilities;
1
2
3
4
5
2
/ Starting with VDCNN
Very Deep Convolutional Neural Networks (Conneau at al. 2017)
MOTIVATION:
• NLP tasks are commonlyapproachedwith RNN (particularly LTSM) and CNNs;
• However, these architectures are rather shallow;
• State-of-the-art computervision has pioneeredDEEP CNNs and greatly profited from such models;
• Builds on ”Character-levelCNN for Text Classification” by Zhang et al. (2016) which outperforms
traditional methodson similardatasets;
• VDCNN operates on character-level– no data preprocessingoraugmentation.
Conneau atal. 2017 - https://arxiv.org/pdf/1606.01781.pdf
/ VDCNN Test 1: Career Level on 622K Dataset
Initial test to get a feeling of how well the model abstracts.
• No oversampling;
• All 7 classes are tested;
• 90/10 split.
Dataset Size 622K
Training Time 25 hours
GPU Titan X 12 GB
Layers 29
Epochs 9
Accuracy 73,5%
VDCNN
ConfusionMatrix
/ VDCNN Test 2: Smaller Dataset
We measure the performance of our smaller dataset. Classes 5,6,7,8 are combined
Dataset Size 243K
Training Time 28 hours, 39 mins
GPU P6000 24 GB
Layers 55
Epochs 30
Accuracy 86,68%
Dataset Size 243K
Training Time 16 hours, 54 mins
GPU Titan X 12 GB
Layers 33
Epochs 30
Accuracy 87,99%
VDCNN
Career Level Number of
Jobs
Specialist 57,844 Class 1
Senior Specialist 142,771 Class 2
Manager 39,870 Class 3
Senior Manager 2,897
Class 4
Business Unit Leader 375
Managing Director SME 152
Managing Director Large
Comp
3
• A cleaner (but smaller) dataset.
• Grouping of classes 5-8 as a new class 4.
• Huge jump in performance.
• Still, very long training times.
/ Deeper look into the confusion matrix
Compare the confusion matrix of both models
Confusion Matrix: 622K – 7 Class
Confusion Matrix: 243K – 4 Class
/ Add a Benchmark: FastText
Facebook open-sourced library for building of scalable solutions for text
representation and classification
https://arxiv.org/abs/1607.01759
Let’s include some benchmarking!
• Developed by Facebook AI Research;
• Released in 2016 due to critical acclaim;
• Scalable across 100Ks of classes due to hierarchical structure;
• Represents sentences as BoW, BoN;
• Sharing information across classes – what one class learns about a word is shared to all;
• Written in C++, so EXTREMELY FAST.
/ VDCNN vs FastText
Classifying for 7 Classes on the 622K Dataset
Dataset Size 622K (7 Classes)
Training Time 5 minutes !!!
GPU N.A.
Layers N.A.
Epochs 10
Accuracy 74,1%
Dataset Size 622K (7 Classes)
Training Time 25 hours
GPU Titan X 12 GB
Layers 29
Epochs 9
Accuracy 73,5%
FastText outperforms VDCNN slightly at only a fraction of the training time.
VDCNN FastText
/ VDCNN vs FastText
Classifying for 4 Classes on the 243K Dataset
Dataset Size 243 (4 Classes)
Training Time 2,5 minutes !!!
GPU N.A.
Layers N.A.
Epochs 10
Accuracy 88,1%
FastText outperforms again VDCNN slightly at only a fraction of the training time.
FastText
Dataset Size 243K (4 Classes)
Training Time 16 hours, 54 mins
GPU Titan X 12 GB
Layers 33
Epochs 30
Accuracy 87,99%
VDCNN
/ HDLText
Hierarchical Deep Learning for Text Classification
Why are trying this?
• Specifically developed for datasets with large corpus (similar to ours);
• Hierarchical classification can be also applied to career levels;
• Outperforms baseline classifiers.
Original paper: https://arxiv.org/abs/1709.08267
Repository: https://github.com/kk7nc/HDLTex
/ HDLTex
Setup of our experiment
• We train a German word vector on our 622K Dataset using https://github.com/stanfordnlp/GloVe
• We train HDLTex with our 243K Dataset
• As we have very few observations in career level 8, we treat 7 and 8 as one class;
• Split 90/10;
• We combine our training data like following;
Class Career Level Class Feature Observations
Class 1 specialist + senior
specialist
Career levels with no
people management
200,615
Class 2 manager + senior manager Career levels with people
management
42,767
Class 3 business unit leader +
managing Director
Career Levels with P&L
Responsibility
530
/ HDLTex (Layer 1 RNN, Layer 2 CNN)
Results from our experiment
Dataset Size 243K
Training Time 10 hours
GPU P6000 24GB
Accuracy 86,8%
HDLTex Confusion Matrix
Training a career level classifier with HDLTex is still not better than FastText and takes longer!
Dataset Size 243K
Training Time 2.5 minutes !!!
GPU N.A.
Accuracy 88,0%
FastTextHDLTex
/ Summary
Let’s review what we have learned today
• Deep Learning is a major step forward in the classification of documents – both VDCNN and
HDLTex outperform our best-practice linear classifiers model;
• Plenty of academic literature and open-source implementations allow data scientist to start
testing in a couple of hours;
• However both deep neural network architectures require long training times, even on
powerful GPUs, which makes experimentation hard;
• Fasttext outperforms all models and can be trained in minutes on a desktop CPU, which
allows for easy MVPs and testing;
• Business owners interested in rapid prototyping should definitely explore FastText for text
classification before jumping on DNN.
/Next Steps
Where we will invest time effort in the next 2 months
• Further tests with HDLTex, especially for the classification of industries (2-Level hierarchy);
• Benchmark FastText to every process where we use linear classifiers and deploy to production;
• Benchmark Deep Pyramid Convolutional Neural Networks (Zhang et al, 2017) to FastText/VDCNN;
• Analyze predictions from FastText, HDLTex, VDCNN and explore opportunities for model stacking.
/ Thank you for your attention
Special thanks to our Data Scientists Viet Nguyen, who made all of
this possible.
AlexanderChukovski
alexander.chukovski@experteer.com
/ Questions from the Audience
Question 1: Why is FastText so fast, compared to Deep Learning?
Written in C++, compiled executables are faster than script languages.
A hierarchical softmax takes advantage of fast computation times.
Fast training times due to successful basic concepts of NLP – bag of words, bag of n-grams.
Question 2: How did you get up to speed with machine learning?
In the beginning we had a lot of help from an external firm “Glanos” in Munich
that helped us build our first ML models and create a production-ready solution.
Most of the research in Deep Learning is free as academic papers and the community
is very fast in building Github repositories with the models.
Question 3: The machines that you have used are very powerful. It this a cloud cluster?
No, we had test access to these machines for a short period of time;
Question 4 Can a normal person or a small company actually run Deep Learning?
This configuration seems expensive.
You can buy a Titan GTX GPU from Nvidia for about €900 on Ebay.
CPU and RAM configuration is not that relevant for Deep Learning, although your RAM should
match your GPU RAM.
A normal desktop with a 12GB GPU should be more than enough to replicate these experiments.
VDCNN only required 4GB of GPU, so we did not fully utilize the full GPU RAM.

Weitere ähnliche Inhalte

Was ist angesagt?

Top 8 agile project manager resume samples
Top 8 agile project manager resume samplesTop 8 agile project manager resume samples
Top 8 agile project manager resume sampleskatsaswan
 
Software Craftsmanship
Software CraftsmanshipSoftware Craftsmanship
Software CraftsmanshipPallav Kumar
 
HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...
HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...
HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...Human Capital Media
 
EMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFT
EMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFTEMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFT
EMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFTHuman Capital Media
 
Sap Training | What is SAP
Sap Training | What is SAPSap Training | What is SAP
Sap Training | What is SAPSanjay Ram
 
Ultimate Recruiting Toolbox
Ultimate Recruiting ToolboxUltimate Recruiting Toolbox
Ultimate Recruiting ToolboxMartin Lagerberg
 
Canang Workshop: Developer to Developer
Canang Workshop: Developer to DeveloperCanang Workshop: Developer to Developer
Canang Workshop: Developer to DeveloperRafizan Baharum
 
Effective Technical Report Writing 12 - 13 March 2017 Dubai, UAE
Effective Technical Report Writing 12 - 13 March 2017 Dubai, UAEEffective Technical Report Writing 12 - 13 March 2017 Dubai, UAE
Effective Technical Report Writing 12 - 13 March 2017 Dubai, UAE360 BSI
 
Effective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, MalaysiaEffective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, Malaysia360 BSI
 
Jonathan Kelly CV 2015
Jonathan Kelly CV 2015Jonathan Kelly CV 2015
Jonathan Kelly CV 2015Jonathan Kelly
 
Effective Technical Report Writing 16 - 17 September 2018 Dubai, UAE
Effective Technical Report Writing 16 - 17 September 2018 Dubai, UAEEffective Technical Report Writing 16 - 17 September 2018 Dubai, UAE
Effective Technical Report Writing 16 - 17 September 2018 Dubai, UAE360 BSI
 
Effective Technical Report Writing 17 - 18 March 2019 Dubai, UAE
Effective Technical Report Writing 17 - 18 March 2019 Dubai, UAEEffective Technical Report Writing 17 - 18 March 2019 Dubai, UAE
Effective Technical Report Writing 17 - 18 March 2019 Dubai, UAE360 BSI
 
Effective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, MalaysiaEffective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, Malaysia360 BSI
 
Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...
Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...
Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...360 BSI
 

Was ist angesagt? (20)

Top 8 agile project manager resume samples
Top 8 agile project manager resume samplesTop 8 agile project manager resume samples
Top 8 agile project manager resume samples
 
Software Craftsmanship
Software CraftsmanshipSoftware Craftsmanship
Software Craftsmanship
 
HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...
HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...
HOW WILL THE FOURTH INDUSTRIAL REVOLUTION IMPACT HR AND LEARNING & DEVELOPMEN...
 
EMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFT
EMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFTEMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFT
EMPLOYEE EXPERIENCE BREAKTHROUGHS - THE HR TECHNOLOGY SHIFT
 
Prasad_Jahagirdar_2016
Prasad_Jahagirdar_2016Prasad_Jahagirdar_2016
Prasad_Jahagirdar_2016
 
Sap Training | What is SAP
Sap Training | What is SAPSap Training | What is SAP
Sap Training | What is SAP
 
RenganS
RenganSRenganS
RenganS
 
RenganS
RenganSRenganS
RenganS
 
E book Ciba SAP
E book Ciba SAPE book Ciba SAP
E book Ciba SAP
 
Ultimate Recruiting Toolbox
Ultimate Recruiting ToolboxUltimate Recruiting Toolbox
Ultimate Recruiting Toolbox
 
Canang Workshop: Developer to Developer
Canang Workshop: Developer to DeveloperCanang Workshop: Developer to Developer
Canang Workshop: Developer to Developer
 
Effective Technical Report Writing 12 - 13 March 2017 Dubai, UAE
Effective Technical Report Writing 12 - 13 March 2017 Dubai, UAEEffective Technical Report Writing 12 - 13 March 2017 Dubai, UAE
Effective Technical Report Writing 12 - 13 March 2017 Dubai, UAE
 
Effective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, MalaysiaEffective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 March 2017 Kuala Lumpur, Malaysia
 
Jonathan Kelly CV 2015
Jonathan Kelly CV 2015Jonathan Kelly CV 2015
Jonathan Kelly CV 2015
 
Effective Technical Report Writing 16 - 17 September 2018 Dubai, UAE
Effective Technical Report Writing 16 - 17 September 2018 Dubai, UAEEffective Technical Report Writing 16 - 17 September 2018 Dubai, UAE
Effective Technical Report Writing 16 - 17 September 2018 Dubai, UAE
 
Effective Technical Report Writing 17 - 18 March 2019 Dubai, UAE
Effective Technical Report Writing 17 - 18 March 2019 Dubai, UAEEffective Technical Report Writing 17 - 18 March 2019 Dubai, UAE
Effective Technical Report Writing 17 - 18 March 2019 Dubai, UAE
 
Pranjal Chakraborty
Pranjal ChakrabortyPranjal Chakraborty
Pranjal Chakraborty
 
Effective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, MalaysiaEffective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, Malaysia
Effective Technical Report Writing 08 - 09 November 2017 Kuala Lumpur, Malaysia
 
Vatsal b shah
Vatsal b shahVatsal b shah
Vatsal b shah
 
Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...
Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...
Effective Technical Report Writing 2 - 3 September 2015 Kuala Lumpur / 7 - 8 ...
 

Ähnlich wie Applied Deep Learning for Text Classification - Examples from the HR Industry

Resume Sreerekha
Resume SreerekhaResume Sreerekha
Resume SreerekhaSreerekha S
 
Resume Platform. Place Your Order: Choose the serv
Resume Platform. Place Your Order: Choose the servResume Platform. Place Your Order: Choose the serv
Resume Platform. Place Your Order: Choose the servBecky Gulson
 
Resume Platform. In todays competitive job market,
Resume Platform. In todays competitive job market,Resume Platform. In todays competitive job market,
Resume Platform. In todays competitive job market,Chelsea Cote
 
Junior Developer Resume. The ProcessOrder Placemen
Junior Developer Resume. The ProcessOrder PlacemenJunior Developer Resume. The ProcessOrder Placemen
Junior Developer Resume. The ProcessOrder Placemenfc50ex0j
 
Senior level job openings in symantec
Senior level job openings in symantecSenior level job openings in symantec
Senior level job openings in symantecShyam Vaidyanathan
 
Driving Your BA Career Roles
Driving Your BA Career RolesDriving Your BA Career Roles
Driving Your BA Career RolesShankar Karthik
 
Soumyabrata Moulick-Business Analyst
Soumyabrata Moulick-Business AnalystSoumyabrata Moulick-Business Analyst
Soumyabrata Moulick-Business AnalystSoumyabrata Moulick
 
Current openings: April 2014
Current openings: April 2014Current openings: April 2014
Current openings: April 2014Wendi Reuter
 
Genesis10jobs1
Genesis10jobs1Genesis10jobs1
Genesis10jobs1chrisroe
 
SoftEd's Business Analysis Training
SoftEd's Business Analysis TrainingSoftEd's Business Analysis Training
SoftEd's Business Analysis TrainingDaniel Luschwitz
 
Good software architecture for business
Good software architecture for business Good software architecture for business
Good software architecture for business Anil Sharma
 
Resume_Shameena_Technical Lead_10Yrs
Resume_Shameena_Technical Lead_10YrsResume_Shameena_Technical Lead_10Yrs
Resume_Shameena_Technical Lead_10YrsShameena Nayeem
 

Ähnlich wie Applied Deep Learning for Text Classification - Examples from the HR Industry (20)

Resume Sreerekha
Resume SreerekhaResume Sreerekha
Resume Sreerekha
 
Resume Platform. Place Your Order: Choose the serv
Resume Platform. Place Your Order: Choose the servResume Platform. Place Your Order: Choose the serv
Resume Platform. Place Your Order: Choose the serv
 
Resume Platform. In todays competitive job market,
Resume Platform. In todays competitive job market,Resume Platform. In todays competitive job market,
Resume Platform. In todays competitive job market,
 
Junior Developer Resume. The ProcessOrder Placemen
Junior Developer Resume. The ProcessOrder PlacemenJunior Developer Resume. The ProcessOrder Placemen
Junior Developer Resume. The ProcessOrder Placemen
 
Senior level job openings in symantec
Senior level job openings in symantecSenior level job openings in symantec
Senior level job openings in symantec
 
Driving Your BA Career Roles
Driving Your BA Career RolesDriving Your BA Career Roles
Driving Your BA Career Roles
 
Microsoft teams.pdf
Microsoft teams.pdfMicrosoft teams.pdf
Microsoft teams.pdf
 
Soumyabrata Moulick-Business Analyst
Soumyabrata Moulick-Business AnalystSoumyabrata Moulick-Business Analyst
Soumyabrata Moulick-Business Analyst
 
RenganS
RenganSRenganS
RenganS
 
need a job?
need a job?need a job?
need a job?
 
Current openings: April 2014
Current openings: April 2014Current openings: April 2014
Current openings: April 2014
 
Genesis10jobs1
Genesis10jobs1Genesis10jobs1
Genesis10jobs1
 
SoftEd's Business Analysis Training
SoftEd's Business Analysis TrainingSoftEd's Business Analysis Training
SoftEd's Business Analysis Training
 
Resume
ResumeResume
Resume
 
Resume
ResumeResume
Resume
 
Good software architecture for business
Good software architecture for business Good software architecture for business
Good software architecture for business
 
Resume_Shameena_Technical Lead_10Yrs
Resume_Shameena_Technical Lead_10YrsResume_Shameena_Technical Lead_10Yrs
Resume_Shameena_Technical Lead_10Yrs
 
Consulting
ConsultingConsulting
Consulting
 
My Resume
My ResumeMy Resume
My Resume
 
ScottDowdenCV
ScottDowdenCVScottDowdenCV
ScottDowdenCV
 

Kürzlich hochgeladen

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Kürzlich hochgeladen (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

Applied Deep Learning for Text Classification - Examples from the HR Industry

  • 1. Applied Text Classification– Is Deep Learning Business Ready? Examples from the HR Industry
  • 2. • Introduction to Experteer • Limitations of job search • Improving job search with ML – our current tech • Usind Deep Learning for text classification and benchmarking • Summary and next steps / Agenda
  • 3. / Introduction A little bit about me • Moved to Munich in 2006 from Sofia, Bulgaria; • Studied Finance/Statistics/Econometry in LMU Munich; • Initial setup of the data science department in Experteer; • Lead a large-scale ML initiative to automate core processes; • Heading Data Services at Experteer.
  • 4. / Experteer Data Services • Provides ML Services and Data Science for Experteer; • Offers a range of ML APIs for external HR Tech Clients; • Consulting services for applied machine learning across all industries; • AI Workshops for value chain optimization accross all industries.
  • 5. / Introduction to Experteer Europe’s executive career service www.experteer.com
  • 6. / Traditional Job Search is Bad! Full-text search is just not enough... Searching for a “CEO” position on a job board…. Returns „HR Manager“ as a first result.
  • 7. / Taxonomy fixes full-text search limitations Experteer delivers better search results with taxonomy filters. Career Level Filter
  • 8. / Our Problem Very complex, manual data processing process with exponential costs. State 2014 • Highly customized and manual process of job data; • Team of 80-100 ppl; • Hand-picking and classifying jobs (90% left-out); • Extensive Job Classification Taxonomy  19 Functions  631 Industries on 4 Levels  8 Career Levels  Location, education, company and subsidiary, salary, education, travel requirements • 7 Languages; 12 Countries. • Major asset:Extremely good quality of the positions. 2 mio+ hand-classified jobs in multiple languages. JOB COST: 3€/JOB
  • 9. / Job Classification Example 1 Manager, App Store Program Management Job Summary Apple is seeking a Manager for the App Store Program Management Team. This role will lead a team of engineering program managers responsiblefor end-to-end delivery of App Store features across iOS, macOS, and tvOS platforms. Key Qualifications 8+ years of professional experience in software program/project/product management Proven experience in building and managing high performing teams and individuals Proven track record in managing and deploying large,complex programs Strong relationship management and facilitation skills both within diverseengineering teams and cross functional organizations Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities Great attention to detail and organized Excellent written and verbal communication skills Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value quickly to our customers RequiedExperience - Understanding of mobile software development - Understanding of server-based software development - Knowledge of iTunes Connect,App Store, iOStechnologies Description We are looking for a seasoned manager with a proven track record in program management. This is not just a peoplemanager role, you will need to have hands on experience managing software releases. You will develop tools and processes to gain efficiencies in the build,development, testing, and deployment lifecycle. The role requires a combination of program and release management, strong engineering background,and ability to build collaborative relationships across various teams in Apple. We are looking for someonewho loves digging into details,building teams, and driving operational efficiencies under demanding timeframes. You take responsibility; you feel a personal stake in the product you ship; you communicateresponsibilities and scope clearly; you value integrity; you manage risk; you need to know how things work; you work for the success of the entire PM team; you thrive in uncertainty and strive to bring order to it; you have deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are aware of politics but do not get mired in them. Education BS/MS in Computer Science, Engineering or similar technical field 3 2 4 5 6 7 8 • Building teams – indicator for people manager • Education is an indicator for function. • Good indicator for the function selection • “building and managing high performing teams” – this is a strong indication for a people manager career level; • Indicates the industry – Software companies • Indicator for at least people manager career level • Indicator the function • “Manager” is a soft indication, has to be looked into context. It could indicate management responsibilities, but it depends on the rest ot the responsibilities. 7 1 2 3 4 5 6 8 Career Level Function Industry
  • 10. / Our Solution with Machine Learning Step 1 – break down the whole value chain into small steps • Breakdown of the whole process in small steps and start with the low-hanging fruits Linguistic Rules – manage/create rules for extraction/classification QC UI
  • 11. / Our Classification Stack Step 2: Build a classification pipeline Linguistic Rules Supplementalbusiness logic rules to to improve modelscore Base Model Ensemble oflinear classifiers (SVM, NB) with (c)BOW, n-grams Data cleaning Confidence Evaluation Random QC Check Re-train models with hand-checked data Create business logic rules (if necessary) Feature Engineering Job DB
  • 12. / Goal: Half Cost/Double Jobs We managed to decrease our cost by more than 50% and increased the output by 3x 0% 20% 40% 60% 80% 100% 120% 140% 160% 0 50000 100000 150000 200000 250000 300000 350000 400000 2010-01 2010-04 2010-07 2010-10 2011-01 2011-04 2011-07 2011-10 2012-01 2012-04 2012-07 2012-10 2013-01 2013-04 2013-07 2013-10 2014-01 2014-04 2014-07 2014-10 2015-01 2015-04 2015-07 2015-10 2016-01 2016-04 2016-07 2016-10 2017-01 Live Jobs Cost Change % Live Jobs SUCCESS! Unit Cost 3 0€ per Job
  • 13. / What is Next? DEEP NEURAL NETWORKS! Benchmark a bunch of DNN on our data! • Traditional models with BoW and n-grams capture only partially complexity of career levels; • Deep Neural Networks have recently become very popular for text processing and NLP tasks; • CNNs have been successfully adapted to computer vision because of the compositional structure of an image; • Texts have similar properties: characters combine to form words, n-grams, stems, phrases, sentences, etc; • CNN-based models achieve very good performance in laboratory practice, what about real-life business problems? • To our knowledge, no one has used deep neural networks for job classification. ALL LOOKS GOOD! LETS TRY IT!
  • 14. / Datasets Overview of the datasets DATASET 1 622K jobs (title, description). Average length: 2203 characters. Collected over 8 years. Created by more than 300 people. Includes datapoints from junior colleagues. Career Level Number of Jobs Specialist 189,637 Senior Specialist 283,125 Manager 109,758 Senior Manager 26,815 Business Unit Leader 6,748 Managing Director SME 5,132 Managing Director Large Comp 386 DATASET 2 243K jobs (title, description). Average length 2197 characters. Collected over 6 years. Created by 80 people. Excludes junior colleagues. Only include jobs reviewed by QC Team. Career Level Number of Jobs Specialist 57,844 Senior Specialist 142,771 Manager 39,870 Senior Manager 2,897 Business Unit Leader 375 Managing Director SME 152 Managing Director Large Comp 3
  • 15. / Tech Setup & Stack Machines used for training Nvidia Titan X 12GB NVIDIA Quadro P6000 24 GB 256 GB Ram 24 x 3.0 GhZ CPU Machine 1 Machine 2 CUDA + PyTorch
  • 16. / The Test: Career Level Classification Example of how a human would read classify a job for career level Manager, App Store Program Management Job Summary Appleis seeking a Manager for the App Store Program Management Team. This rolewill lead a team of engineering program managers responsible for end-to-end delivery of App Store features across iOS, macOS, and tvOS platforms. Key Qualifications 8+ years of professional experience in software program/project/product management Proven experience in building and managing high performing teams and individuals Proven track record in managing and deploying large,complex programs Strong relationship management and facilitation skills both within diverseengineering teams and cross functional organizations Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities Great attention to detail and organized Excellent written and verbal communication skills Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value quickly to our customers Required Experience - Understanding of mobilesoftware development - Understanding of server-based software development - Knowledgeof iTunes Connect,App Store, iOStechnologies Description We are looking for a seasoned manager with a proven track record in program management. This is not just a people manager role, you will need to have hands on experience managing software releases. You will develop tools and processes to gain efficiencies in the build,development, testing, and deployment lifecycle. The role requires a combination of program and release management, strong engineering background,and ability to build collaborative relationships across various teams in Apple. We are looking for someonewho loves digging into details,building teams, and driving operational efficiencies under demanding timeframes. You take responsibility; you feel a personal stake in the product you ship; you communicateresponsibilities and scope clearly; you value integrity; you manage risk; you need to know how things work; you work for the success of the entire PM team; you thrivein uncertainty and strive to bring order to it; you have deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are aware of politics but do not get mired in them. Education BS/MS in Computer Science, Engineering or similar technical field 1 3 4 5 • “Leading a team” is a soft indicator for a manager • “building and managing high performing teams” – this is a strong indication for a people manager career level. • Indicator for at least people manager career level; • “Building teams” – indicator for manager • “Manager” is a soft indication, has to be looked into context. It could indicate management responsibilities, but it depends on the rest ot the responsibilities; 1 2 3 4 5 2
  • 17. / Starting with VDCNN Very Deep Convolutional Neural Networks (Conneau at al. 2017) MOTIVATION: • NLP tasks are commonlyapproachedwith RNN (particularly LTSM) and CNNs; • However, these architectures are rather shallow; • State-of-the-art computervision has pioneeredDEEP CNNs and greatly profited from such models; • Builds on ”Character-levelCNN for Text Classification” by Zhang et al. (2016) which outperforms traditional methodson similardatasets; • VDCNN operates on character-level– no data preprocessingoraugmentation. Conneau atal. 2017 - https://arxiv.org/pdf/1606.01781.pdf
  • 18. / VDCNN Test 1: Career Level on 622K Dataset Initial test to get a feeling of how well the model abstracts. • No oversampling; • All 7 classes are tested; • 90/10 split. Dataset Size 622K Training Time 25 hours GPU Titan X 12 GB Layers 29 Epochs 9 Accuracy 73,5% VDCNN ConfusionMatrix
  • 19. / VDCNN Test 2: Smaller Dataset We measure the performance of our smaller dataset. Classes 5,6,7,8 are combined Dataset Size 243K Training Time 28 hours, 39 mins GPU P6000 24 GB Layers 55 Epochs 30 Accuracy 86,68% Dataset Size 243K Training Time 16 hours, 54 mins GPU Titan X 12 GB Layers 33 Epochs 30 Accuracy 87,99% VDCNN Career Level Number of Jobs Specialist 57,844 Class 1 Senior Specialist 142,771 Class 2 Manager 39,870 Class 3 Senior Manager 2,897 Class 4 Business Unit Leader 375 Managing Director SME 152 Managing Director Large Comp 3 • A cleaner (but smaller) dataset. • Grouping of classes 5-8 as a new class 4. • Huge jump in performance. • Still, very long training times.
  • 20. / Deeper look into the confusion matrix Compare the confusion matrix of both models Confusion Matrix: 622K – 7 Class Confusion Matrix: 243K – 4 Class
  • 21. / Add a Benchmark: FastText Facebook open-sourced library for building of scalable solutions for text representation and classification https://arxiv.org/abs/1607.01759 Let’s include some benchmarking! • Developed by Facebook AI Research; • Released in 2016 due to critical acclaim; • Scalable across 100Ks of classes due to hierarchical structure; • Represents sentences as BoW, BoN; • Sharing information across classes – what one class learns about a word is shared to all; • Written in C++, so EXTREMELY FAST.
  • 22. / VDCNN vs FastText Classifying for 7 Classes on the 622K Dataset Dataset Size 622K (7 Classes) Training Time 5 minutes !!! GPU N.A. Layers N.A. Epochs 10 Accuracy 74,1% Dataset Size 622K (7 Classes) Training Time 25 hours GPU Titan X 12 GB Layers 29 Epochs 9 Accuracy 73,5% FastText outperforms VDCNN slightly at only a fraction of the training time. VDCNN FastText
  • 23. / VDCNN vs FastText Classifying for 4 Classes on the 243K Dataset Dataset Size 243 (4 Classes) Training Time 2,5 minutes !!! GPU N.A. Layers N.A. Epochs 10 Accuracy 88,1% FastText outperforms again VDCNN slightly at only a fraction of the training time. FastText Dataset Size 243K (4 Classes) Training Time 16 hours, 54 mins GPU Titan X 12 GB Layers 33 Epochs 30 Accuracy 87,99% VDCNN
  • 24. / HDLText Hierarchical Deep Learning for Text Classification Why are trying this? • Specifically developed for datasets with large corpus (similar to ours); • Hierarchical classification can be also applied to career levels; • Outperforms baseline classifiers. Original paper: https://arxiv.org/abs/1709.08267 Repository: https://github.com/kk7nc/HDLTex
  • 25. / HDLTex Setup of our experiment • We train a German word vector on our 622K Dataset using https://github.com/stanfordnlp/GloVe • We train HDLTex with our 243K Dataset • As we have very few observations in career level 8, we treat 7 and 8 as one class; • Split 90/10; • We combine our training data like following; Class Career Level Class Feature Observations Class 1 specialist + senior specialist Career levels with no people management 200,615 Class 2 manager + senior manager Career levels with people management 42,767 Class 3 business unit leader + managing Director Career Levels with P&L Responsibility 530
  • 26. / HDLTex (Layer 1 RNN, Layer 2 CNN) Results from our experiment Dataset Size 243K Training Time 10 hours GPU P6000 24GB Accuracy 86,8% HDLTex Confusion Matrix Training a career level classifier with HDLTex is still not better than FastText and takes longer! Dataset Size 243K Training Time 2.5 minutes !!! GPU N.A. Accuracy 88,0% FastTextHDLTex
  • 27. / Summary Let’s review what we have learned today • Deep Learning is a major step forward in the classification of documents – both VDCNN and HDLTex outperform our best-practice linear classifiers model; • Plenty of academic literature and open-source implementations allow data scientist to start testing in a couple of hours; • However both deep neural network architectures require long training times, even on powerful GPUs, which makes experimentation hard; • Fasttext outperforms all models and can be trained in minutes on a desktop CPU, which allows for easy MVPs and testing; • Business owners interested in rapid prototyping should definitely explore FastText for text classification before jumping on DNN.
  • 28. /Next Steps Where we will invest time effort in the next 2 months • Further tests with HDLTex, especially for the classification of industries (2-Level hierarchy); • Benchmark FastText to every process where we use linear classifiers and deploy to production; • Benchmark Deep Pyramid Convolutional Neural Networks (Zhang et al, 2017) to FastText/VDCNN; • Analyze predictions from FastText, HDLTex, VDCNN and explore opportunities for model stacking.
  • 29. / Thank you for your attention Special thanks to our Data Scientists Viet Nguyen, who made all of this possible. AlexanderChukovski alexander.chukovski@experteer.com
  • 30. / Questions from the Audience Question 1: Why is FastText so fast, compared to Deep Learning? Written in C++, compiled executables are faster than script languages. A hierarchical softmax takes advantage of fast computation times. Fast training times due to successful basic concepts of NLP – bag of words, bag of n-grams. Question 2: How did you get up to speed with machine learning? In the beginning we had a lot of help from an external firm “Glanos” in Munich that helped us build our first ML models and create a production-ready solution. Most of the research in Deep Learning is free as academic papers and the community is very fast in building Github repositories with the models. Question 3: The machines that you have used are very powerful. It this a cloud cluster? No, we had test access to these machines for a short period of time; Question 4 Can a normal person or a small company actually run Deep Learning? This configuration seems expensive. You can buy a Titan GTX GPU from Nvidia for about €900 on Ebay. CPU and RAM configuration is not that relevant for Deep Learning, although your RAM should match your GPU RAM. A normal desktop with a 12GB GPU should be more than enough to replicate these experiments. VDCNN only required 4GB of GPU, so we did not fully utilize the full GPU RAM.