Text Classification is a major part of our data processing stack. We have successfully managed to automate a large part of complex document classification processes with traditional linear classifiers, based on n-grams, bad of words and linguistic rules and extensive feature engineering. In this presentation we decided to benchmark couple of deep learning algorithms like Very Deep Convolutional Neural Networks (VDCNN) and Hierarchical Deep Learning for Text Classification (HDLTex) on the classification of career levels for jobs. In parallel, we compare the Deep Learning results to an innovative text classification framework, developed and open-sourced by Facebook (FastText), which yields supreme results.
2. • Introduction to Experteer
• Limitations of job search
• Improving job search with ML
– our current tech
• Usind Deep Learning for text
classification and benchmarking
• Summary and next steps
/ Agenda
3. / Introduction
A little bit about me
• Moved to Munich in 2006 from Sofia, Bulgaria;
• Studied Finance/Statistics/Econometry in LMU Munich;
• Initial setup of the data science department in Experteer;
• Lead a large-scale ML initiative to automate core processes;
• Heading Data Services at Experteer.
4. / Experteer Data Services
• Provides ML Services and Data Science for
Experteer;
• Offers a range of ML APIs for external HR Tech
Clients;
• Consulting services for applied machine learning
across all industries;
• AI Workshops for value chain optimization
accross all industries.
5. / Introduction to Experteer
Europe’s executive
career service
www.experteer.com
6. / Traditional Job Search is Bad!
Full-text search is just not enough...
Searching for a “CEO” position on a job board….
Returns „HR Manager“ as a first result.
7. / Taxonomy fixes full-text search limitations
Experteer delivers better search results with taxonomy filters.
Career Level Filter
8. / Our Problem
Very complex, manual data processing process with exponential costs.
State 2014
• Highly customized and manual process of job data;
• Team of 80-100 ppl;
• Hand-picking and classifying jobs (90% left-out);
• Extensive Job Classification Taxonomy
19 Functions
631 Industries on 4 Levels
8 Career Levels
Location, education, company and subsidiary, salary, education, travel requirements
• 7 Languages; 12 Countries.
• Major asset:Extremely good quality of the positions. 2 mio+ hand-classified jobs in multiple
languages.
JOB COST: 3€/JOB
9. / Job Classification Example
1
Manager, App Store Program Management
Job Summary
Apple is seeking a Manager for the App Store Program Management Team. This role will lead a
team of engineering program managers responsiblefor end-to-end delivery of App Store features
across iOS, macOS, and tvOS platforms.
Key Qualifications
8+ years of professional experience in software program/project/product management
Proven experience in building and managing high performing teams and individuals
Proven track record in managing and deploying large,complex programs
Strong relationship management and facilitation skills both within diverseengineering teams and
cross functional organizations
Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities
Great attention to detail and organized
Excellent written and verbal communication skills
Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value
quickly to our customers
RequiedExperience
- Understanding of mobile software development
- Understanding of server-based software development
- Knowledge of iTunes Connect,App Store, iOStechnologies
Description
We are looking for a seasoned manager with a proven track record in program management. This
is not just a peoplemanager role, you will need to have hands on experience managing software
releases. You will develop tools and processes to gain efficiencies in the build,development,
testing, and deployment lifecycle. The role requires a combination of program and release
management, strong engineering background,and ability to build collaborative relationships
across various teams in Apple. We are looking for someonewho loves digging into details,building
teams, and driving operational efficiencies under demanding timeframes. You take responsibility;
you feel a personal stake in the product you ship; you communicateresponsibilities and scope
clearly; you value integrity; you manage risk; you need to know how things work; you work for the
success of the entire PM team; you thrive in uncertainty and strive to bring order to it; you have
deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are
aware of politics but do not get mired in them.
Education
BS/MS in Computer Science, Engineering or similar technical field
3
2
4
5 6
7
8
• Building teams – indicator for people manager
• Education is an indicator for function.
• Good indicator for the function selection
• “building and managing high performing teams” – this is a
strong indication for a people manager career level;
• Indicates the industry – Software companies
• Indicator for at least people manager career level
• Indicator the function
• “Manager” is a soft indication, has to be looked into
context. It could indicate management responsibilities,
but it depends on the rest ot the responsibilities.
7
1
2
3
4
5
6
8
Career Level Function Industry
10. / Our Solution with Machine Learning
Step 1 – break down the whole value chain into small steps
• Breakdown of the whole process in small steps and start with the low-hanging fruits
Linguistic Rules – manage/create rules for extraction/classification QC UI
11. / Our Classification Stack
Step 2: Build a classification pipeline
Linguistic Rules
Supplementalbusiness logic
rules to to improve modelscore
Base Model
Ensemble oflinear classifiers
(SVM, NB) with (c)BOW, n-grams
Data
cleaning
Confidence
Evaluation
Random QC
Check
Re-train models with hand-checked data
Create business logic rules (if necessary)
Feature Engineering
Job
DB
12. / Goal: Half Cost/Double Jobs
We managed to decrease our cost by more than 50% and increased the output by 3x
0%
20%
40%
60%
80%
100%
120%
140%
160%
0
50000
100000
150000
200000
250000
300000
350000
400000
2010-01
2010-04
2010-07
2010-10
2011-01
2011-04
2011-07
2011-10
2012-01
2012-04
2012-07
2012-10
2013-01
2013-04
2013-07
2013-10
2014-01
2014-04
2014-07
2014-10
2015-01
2015-04
2015-07
2015-10
2016-01
2016-04
2016-07
2016-10
2017-01
Live Jobs
Cost Change %
Live
Jobs
SUCCESS!
Unit Cost
3 0€ per
Job
13. / What is Next? DEEP NEURAL NETWORKS!
Benchmark a bunch of DNN on our data!
• Traditional models with BoW and n-grams capture only partially complexity of career levels;
• Deep Neural Networks have recently become very popular for text processing and NLP tasks;
• CNNs have been successfully adapted to computer vision because of the compositional structure
of an image;
• Texts have similar properties: characters combine to form words, n-grams, stems, phrases,
sentences, etc;
• CNN-based models achieve very good performance in laboratory practice, what about real-life
business problems?
• To our knowledge, no one has used deep neural networks for job classification.
ALL LOOKS GOOD! LETS TRY IT!
14. / Datasets
Overview of the datasets
DATASET 1
622K jobs (title, description).
Average length: 2203 characters.
Collected over 8 years.
Created by more than 300 people.
Includes datapoints from junior colleagues.
Career Level Number of Jobs
Specialist 189,637
Senior Specialist 283,125
Manager 109,758
Senior Manager 26,815
Business Unit Leader 6,748
Managing Director SME 5,132
Managing Director Large
Comp
386
DATASET 2
243K jobs (title, description).
Average length 2197 characters.
Collected over 6 years.
Created by 80 people.
Excludes junior colleagues.
Only include jobs reviewed by QC Team.
Career Level Number of Jobs
Specialist 57,844
Senior Specialist 142,771
Manager 39,870
Senior Manager 2,897
Business Unit Leader 375
Managing Director SME 152
Managing Director Large
Comp
3
15. / Tech Setup & Stack
Machines used for training
Nvidia Titan X 12GB NVIDIA Quadro P6000 24 GB
256 GB Ram
24 x 3.0 GhZ CPU
Machine 1 Machine 2
CUDA + PyTorch
16. / The Test: Career Level Classification
Example of how a human would read classify a job for career level
Manager, App Store Program Management
Job Summary
Appleis seeking a Manager for the App Store Program Management Team. This rolewill lead a
team of engineering program managers responsible for end-to-end delivery of App Store features
across iOS, macOS, and tvOS platforms.
Key Qualifications
8+ years of professional experience in software program/project/product management
Proven experience in building and managing high performing teams and individuals
Proven track record in managing and deploying large,complex programs
Strong relationship management and facilitation skills both within diverseengineering teams and
cross functional organizations
Proven self-starter, who is pro-active and demonstrates creative and critical thinking abilities
Great attention to detail and organized
Excellent written and verbal communication skills
Overall, a highly driven,results-oriented,problem solver who will drive programs to deliver value
quickly to our customers
Required Experience
- Understanding of mobilesoftware development
- Understanding of server-based software development
- Knowledgeof iTunes Connect,App Store, iOStechnologies
Description
We are looking for a seasoned manager with a proven track record in program management. This
is not just a people manager role, you will need to have hands on experience managing software
releases. You will develop tools and processes to gain efficiencies in the build,development,
testing, and deployment lifecycle. The role requires a combination of program and release
management, strong engineering background,and ability to build collaborative relationships
across various teams in Apple. We are looking for someonewho loves digging into details,building
teams, and driving operational efficiencies under demanding timeframes. You take responsibility;
you feel a personal stake in the product you ship; you communicateresponsibilities and scope
clearly; you value integrity; you manage risk; you need to know how things work; you work for the
success of the entire PM team; you thrivein uncertainty and strive to bring order to it; you have
deep wisdom and judgement; you keep your eye on the ball; you build strong relationships; you are
aware of politics but do not get mired in them.
Education
BS/MS in Computer Science, Engineering or similar technical field
1
3
4
5
• “Leading a team” is a soft indicator for a manager
• “building and managing high performing teams” – this is a
strong indication for a people manager career level.
• Indicator for at least people manager career level;
• “Building teams” – indicator for manager
• “Manager” is a soft indication, has to be looked into
context. It could indicate management responsibilities,
but it depends on the rest ot the responsibilities;
1
2
3
4
5
2
17. / Starting with VDCNN
Very Deep Convolutional Neural Networks (Conneau at al. 2017)
MOTIVATION:
• NLP tasks are commonlyapproachedwith RNN (particularly LTSM) and CNNs;
• However, these architectures are rather shallow;
• State-of-the-art computervision has pioneeredDEEP CNNs and greatly profited from such models;
• Builds on ”Character-levelCNN for Text Classification” by Zhang et al. (2016) which outperforms
traditional methodson similardatasets;
• VDCNN operates on character-level– no data preprocessingoraugmentation.
Conneau atal. 2017 - https://arxiv.org/pdf/1606.01781.pdf
18. / VDCNN Test 1: Career Level on 622K Dataset
Initial test to get a feeling of how well the model abstracts.
• No oversampling;
• All 7 classes are tested;
• 90/10 split.
Dataset Size 622K
Training Time 25 hours
GPU Titan X 12 GB
Layers 29
Epochs 9
Accuracy 73,5%
VDCNN
ConfusionMatrix
19. / VDCNN Test 2: Smaller Dataset
We measure the performance of our smaller dataset. Classes 5,6,7,8 are combined
Dataset Size 243K
Training Time 28 hours, 39 mins
GPU P6000 24 GB
Layers 55
Epochs 30
Accuracy 86,68%
Dataset Size 243K
Training Time 16 hours, 54 mins
GPU Titan X 12 GB
Layers 33
Epochs 30
Accuracy 87,99%
VDCNN
Career Level Number of
Jobs
Specialist 57,844 Class 1
Senior Specialist 142,771 Class 2
Manager 39,870 Class 3
Senior Manager 2,897
Class 4
Business Unit Leader 375
Managing Director SME 152
Managing Director Large
Comp
3
• A cleaner (but smaller) dataset.
• Grouping of classes 5-8 as a new class 4.
• Huge jump in performance.
• Still, very long training times.
20. / Deeper look into the confusion matrix
Compare the confusion matrix of both models
Confusion Matrix: 622K – 7 Class
Confusion Matrix: 243K – 4 Class
21. / Add a Benchmark: FastText
Facebook open-sourced library for building of scalable solutions for text
representation and classification
https://arxiv.org/abs/1607.01759
Let’s include some benchmarking!
• Developed by Facebook AI Research;
• Released in 2016 due to critical acclaim;
• Scalable across 100Ks of classes due to hierarchical structure;
• Represents sentences as BoW, BoN;
• Sharing information across classes – what one class learns about a word is shared to all;
• Written in C++, so EXTREMELY FAST.
22. / VDCNN vs FastText
Classifying for 7 Classes on the 622K Dataset
Dataset Size 622K (7 Classes)
Training Time 5 minutes !!!
GPU N.A.
Layers N.A.
Epochs 10
Accuracy 74,1%
Dataset Size 622K (7 Classes)
Training Time 25 hours
GPU Titan X 12 GB
Layers 29
Epochs 9
Accuracy 73,5%
FastText outperforms VDCNN slightly at only a fraction of the training time.
VDCNN FastText
23. / VDCNN vs FastText
Classifying for 4 Classes on the 243K Dataset
Dataset Size 243 (4 Classes)
Training Time 2,5 minutes !!!
GPU N.A.
Layers N.A.
Epochs 10
Accuracy 88,1%
FastText outperforms again VDCNN slightly at only a fraction of the training time.
FastText
Dataset Size 243K (4 Classes)
Training Time 16 hours, 54 mins
GPU Titan X 12 GB
Layers 33
Epochs 30
Accuracy 87,99%
VDCNN
24. / HDLText
Hierarchical Deep Learning for Text Classification
Why are trying this?
• Specifically developed for datasets with large corpus (similar to ours);
• Hierarchical classification can be also applied to career levels;
• Outperforms baseline classifiers.
Original paper: https://arxiv.org/abs/1709.08267
Repository: https://github.com/kk7nc/HDLTex
25. / HDLTex
Setup of our experiment
• We train a German word vector on our 622K Dataset using https://github.com/stanfordnlp/GloVe
• We train HDLTex with our 243K Dataset
• As we have very few observations in career level 8, we treat 7 and 8 as one class;
• Split 90/10;
• We combine our training data like following;
Class Career Level Class Feature Observations
Class 1 specialist + senior
specialist
Career levels with no
people management
200,615
Class 2 manager + senior manager Career levels with people
management
42,767
Class 3 business unit leader +
managing Director
Career Levels with P&L
Responsibility
530
26. / HDLTex (Layer 1 RNN, Layer 2 CNN)
Results from our experiment
Dataset Size 243K
Training Time 10 hours
GPU P6000 24GB
Accuracy 86,8%
HDLTex Confusion Matrix
Training a career level classifier with HDLTex is still not better than FastText and takes longer!
Dataset Size 243K
Training Time 2.5 minutes !!!
GPU N.A.
Accuracy 88,0%
FastTextHDLTex
27. / Summary
Let’s review what we have learned today
• Deep Learning is a major step forward in the classification of documents – both VDCNN and
HDLTex outperform our best-practice linear classifiers model;
• Plenty of academic literature and open-source implementations allow data scientist to start
testing in a couple of hours;
• However both deep neural network architectures require long training times, even on
powerful GPUs, which makes experimentation hard;
• Fasttext outperforms all models and can be trained in minutes on a desktop CPU, which
allows for easy MVPs and testing;
• Business owners interested in rapid prototyping should definitely explore FastText for text
classification before jumping on DNN.
28. /Next Steps
Where we will invest time effort in the next 2 months
• Further tests with HDLTex, especially for the classification of industries (2-Level hierarchy);
• Benchmark FastText to every process where we use linear classifiers and deploy to production;
• Benchmark Deep Pyramid Convolutional Neural Networks (Zhang et al, 2017) to FastText/VDCNN;
• Analyze predictions from FastText, HDLTex, VDCNN and explore opportunities for model stacking.
29. / Thank you for your attention
Special thanks to our Data Scientists Viet Nguyen, who made all of
this possible.
AlexanderChukovski
alexander.chukovski@experteer.com
30. / Questions from the Audience
Question 1: Why is FastText so fast, compared to Deep Learning?
Written in C++, compiled executables are faster than script languages.
A hierarchical softmax takes advantage of fast computation times.
Fast training times due to successful basic concepts of NLP – bag of words, bag of n-grams.
Question 2: How did you get up to speed with machine learning?
In the beginning we had a lot of help from an external firm “Glanos” in Munich
that helped us build our first ML models and create a production-ready solution.
Most of the research in Deep Learning is free as academic papers and the community
is very fast in building Github repositories with the models.
Question 3: The machines that you have used are very powerful. It this a cloud cluster?
No, we had test access to these machines for a short period of time;
Question 4 Can a normal person or a small company actually run Deep Learning?
This configuration seems expensive.
You can buy a Titan GTX GPU from Nvidia for about €900 on Ebay.
CPU and RAM configuration is not that relevant for Deep Learning, although your RAM should
match your GPU RAM.
A normal desktop with a 12GB GPU should be more than enough to replicate these experiments.
VDCNN only required 4GB of GPU, so we did not fully utilize the full GPU RAM.