A Matching Approach Based on Term Clusters for eRecruitment

A Matching Approach based on Term
Clusters for eRecruitment
• Kemal Can Kara – Kariyer.net
ICDM 2016 - July 13-17, 2016
• Aşkın Karakaş – Kariyer.net
• Gülşen Bal – Kariyer.net
• Fatmagül Süzen – Kariyer.net
• Tunga Güngör – Boğaziçi University, Computer
Engineering Department

PROJECT SUPPORTERS
Kariyer.net
TUBITAK
First and the largest online recruitment website in Turkey.
The Scientific and Technological Research Council of Turkey
with Project Number : 3130841

AGENDA
• ABOUT Kariyer.net
• INTRODUCTION
• COMPILING TERM LEXICONS (TERM EXTRACTION)
– IDENTIFYING SENTENCES
– CREATING PATTERN RULES
– MORPHOLOGICAL ANALYSIS/DISAMBIGUATION
– DOMAIN RELEVANCE
– DOMAIN CONCENSUS
– LEXICAL COHESION
• FINDING RELATIONS BETWEEN TERMS / TERM
CLUSTERING
• MATCHING OF RESUMES – JOB ADVERTISEMENT
– BOOLEAN WEIGHTING
– TERM CLUSTERS

Every Month in Kariyer.net
• 8.00.000 job application
• 17.000 new job ad
• 8.500.000 visitor
• 250.000 new resumes are added
• ~ 7 job application from • ~ 20 job detail page view in 1 second

PROBLEM
• Examining hundreds of resumes to find the most
appropriate candidate is too much time
consuming. Because most of the information lies
in free-text areas like job experiences.
• Both employers and candidates (especially
candidates) remain incapable of expressing what
they need or what they have.
• Most of the candidate filtering job is been done
by HR employee with given small amount of
given terms.

COMPILING TERM LEXICONS
• Finding Common Sentence Structure

• Finding Common Sentence Structure /
1. We need to define sentences to
divide job advertisements to smaller
parts but not all sentences end with
some punctuation marks. Instead each
sentence is like a long phrase that
emphasizes a qualification.
2. Also we need to define the types of
word/word groups that we specify as
terms.
For example both «mvc and asp» and
«yazılması ve incelenmesi (to analyze
and develop)» are suitable for the rule
that we extract as «T and T».

• Defining Term Places in sentences & Generating Lexicon of
SpecialWords / EndingWords
 T + and/or + T + {specialWords} + {EndingWords}
 T , T, T { EndingWords }
 T(3) + {specialWords} + { EndingWords }
 T, T(2) + and + T + { EndingWords }
 T(3) + {specialWords} + ….. + { EndingWords }
 T + {specialWords} + (T,T) + {specialWords} + { EndingWords }
T: Term, T(3) : Terms composed of three words
T(2) : Terms composed of two words
specialWords:{konusunda (about), konularında (in the field of), üzerinde ( upon)...}
EndingWords: { tecrübeli (Experienced in), bilgi sahibi (Having knowledge of),
üzerinde çalışmış (Hands-on experience with)…}
After processing 20 positions  We got 25.196 terms.

• Morphological Analysis
We implement these rules to 21 selected position
(According to having the most number of job ads
and terms) and find 25.196 terms.
Then we select some of the terms from each
domain to confirm they are terms.
After that we obtain their morphological analysis
(Developed in Boğaziçi Uni.) representation.
Finally, we defined these representations as rules
to eliminate unnecessary terms.
Software engineer,
Accounting specialist,
Mechanical engineer,
Architect,
Electrical engineer,
Production engineer,
Graphic designer,
Lawyer,
Electrical and electronic engineer,
Project engineer,
Business analyst,
Quality engineer,
Planning engineer,
Financier,
Interior architect,
Research and development engineer,
Enviromental engineer,
Technical service engineer,
Project manager and industrial engineer
öğrenmeyi seven
Teste dayalı geliştirme
Erkek
yazılım
Java J2EE
Yazılım Mühendisliği
ASP.NET
….
öğrenmeyi seven öğrenmeyi[Unknown
] sev[Verb]+[Pos]-
YAn[Adj+PresPart]
Teste dayalı
geliştirme
test + Noun + A3sg +
Pnon
dayalı +Adj
Geliştirme geliş
+Verb^DB+Verb+Cau
s+Neg+Imp+A2sg
Erkek [Noun]+[A3sg]+[Pno
n]+[Nom]
yazılım [Unknown]
Java J2EE Java[Noun]+[A3sg]+[
Pnon]+[Nom]
J2EE[Unknown]
Yazılım Mühendisliği Yazılım[Unknown]
Mühendisliği[Unkno
wn]
ASP.NET [Unknown]
öğrenmeyi seven
Teste dayalı
geliştirme
Erkek
yazılım
Java J2EE
Yazılım Mühendisliği
ASP.NET
….

We use morphological analysis to define the types of
word/word groups that we specify as terms.
Because both «mvc and asp» and «yazılması ve incelenmesi (to
analyze and develop)» are suitable for the rule that we extract
as «T and T».
- Software
 [Unknown]
 [Noun]+[A3sg]+[Pnon]+[Nom]
 [Noun]+[Acro]+[A3sg]+[Pnon]+[Nom]
 [Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]
 [Noun]+[A3sg]+[Pnon]+[Nom] ,
[Noun]+[A3sg]+[Pnon]+[Nom]
 [Unknown], [Unknown]
- Finance expert
 [Unknown]
 [Noun]+[A3sg]+[Pnon]+[Nom]
 [Noun]+[Acro]+[A3sg]+[Pnon]+[Nom]
 [Adj]
 [Noun]+[A3sg]+[Pnon]+[Nom] ,[Noun]+[A3sg]+SH[P3sg]+[Nom]
 [Noun]+lAr[A3pl]+[Pnon]+[Nom]
 [Adj] , [Noun]+lAr[A3pl]+[Pnon]+[Nom]
 [Noun]+[A3sg]+[Pnon]+[Nom] ,[Noun]+[A3sg]+[Pnon]+[Nom]
 [Verb]+[Pos]-mA[Noun+Inf2]+[A3sg]+[Pnon]+[Nom]
 [Noun]+[Prop]+[A3sg]+[Pnon]+[Nom] ,[Unknown]
• Example of Morphological Analysis Representations

Term Morphological Meaning
ASP ASP[Noun]+[Acro]+[A3sg]+[Pnon]+[Nom]
C C[Noun]+[Acro]+[A3sg]+[Pnon]+[Nom]
Teste dayalı geliştirme
• Teste  test +Noun+A3sg+Pnon+Dat
• Dayalı  dayalı +Adj
• Geliştirme  geliş +Verb^DB+Verb+Caus+Neg+Imp+A2sg
Yazılım Mühendisliği Yazılım[Unknown] Mühendisliği[Unknown]
Uygulama analizi
Uygulama:[Noun]+[A3sg]+[Pnon]+[Nom]
analiz:[Noun]+[A3sg]+[Pnon]+YH[Acc]
Rule Rule Meaning
[Unknown] Unknown word
[Noun]+[A3sg]+[Pnon]+[Nom] Noun
[Noun]+[Acro]+[A3sg]+[Pnon]+[Nom] Acronym Nominative
[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom] Proper Noun Nominative
[Noun]+[A3sg]+[Pnon]+[Nom],
[Noun]+[A3sg]+[Pnon]+[Nom]
Singular Noun Nominative, Singular
Noun Nominative
[Unknown], [Unknown] Unknown word, Unknown word
After the morphological analysis process
we eliminate more than 50% of the
terms that we extract with sentence
structure rules (20.196  10.314 in 21
biggest position domain)

• But we still don’t know which terms actually belongs to the
domain of that position.
• Secondly we have some «unknown» outputs as terms.

• Which terms belongs which position domain
• 1- Domain Relevance (DR) : It is used to determine how
spesific the term t in domain Di
• Determining the domain of terms
𝐷𝑅 𝐷İ t =
𝑃(
𝑡
𝐷 𝑖
)
𝑚𝑎𝑥 𝑗( 𝑃
𝑡
𝐷 𝑗
)
=
𝑓𝑟𝑒𝑞(𝑡,𝐷 𝑖)
𝑚𝑎𝑥 𝑗(𝑓𝑟𝑒𝑞 𝑡,𝐷 𝑗 )
where
P(t|Di) = The probability of term t is in the domain of Di.
Freq(t,Di) = (how many times term t is in the domain of Di) / (how many times all
the terms are in the domain of Di)
For every domain Dj:
maxj(freq(t,Dj))= max (how many times the term t is in the all domains of Dj) /
(how many times all the terms are in the domain of Di)

• 2- Domain Concensus (DC) : It measures the distributed
use of a term in a domain
𝐷𝐶 𝐷İ t = − 𝑃 𝑑 𝑘 ∈ 𝐷𝑖
𝑡
𝑑 𝑘
log( 𝑃 (
𝑡
𝑑 𝑘
)) = − 𝑛𝑜𝑟𝑚 𝑑 𝑘 ∈
𝐷𝑖 − 𝑓𝑟𝑒𝑞(𝑡, 𝑑 𝑘) log(𝑛𝑜𝑟𝑚 − 𝑓𝑟𝑒𝑔 𝑡, 𝑑 𝑘 )
where
P(t|Di) = The probability of term t is in the domain of Di.
Freq(t,Di) = (how many times term t is in the domain of Di) / (how many times all
the terms are in the domain of Di)

• 2- Domain Concensus (DC) : It measures the distributed
use of a term in a domain 𝐷𝑖
• 3- Lexical Cohesion (LC) : It is used to determine whether
the words in the term T occur in the documents seperately or together.
𝐿𝐶𝐷𝑖 =
𝑛 ∙ 𝑓𝑟𝑒𝑞(𝑡, 𝐷𝑖) ∙ log(𝑓𝑟𝑒𝑞(𝑡, 𝐷𝑖)
𝑓𝑟𝑒𝑞𝑤𝑗 (𝑤𝑗, 𝐷𝑖)
where
n: number of terms that t has
freq(t,Di) = (The probability of term t is in the domain of Di.)
wj = jth word of the term, 1<= j <=n
Freq(wj,Di) = (how many times term wj is in the domain of Di).

• Determining the domain of terms & Results
𝐷𝑜𝑚𝑎𝑖𝑛𝑅𝑒𝑠𝑢𝑙𝑡(𝑇, 𝐷) = 𝛼1𝐷𝑅 + 𝛼2𝐷𝐶 + 𝛼3𝐿𝐶
𝛼1 = 𝛼2 = 𝛼3 = 1/3
where T denotes a term and D denotes the domain.
Domain : Software Domain: Mechanical Engineering Domain: Architect
Term Term Term
Xml Autocad Autocad
AJAX aktif detay
jquery Teknik 3D
MVC yangın Ofis
CSS imalat max
Java soğutma 3DMax
NET Hvac Otel
Json AVM Mimar
WCF MS AVM
.Net mekanik şantiye
ASP Havalandırma Projesi
SOAP Çok proje
ASP.Net sıhhi ev
SVN Tesisat SketchUp
.. .. ..

Creation of Term Clusters
• How many times the term used in job advertisements
• How many job advertisements have the term
• Terms which are used together and their frequency
Length Terms
7 ajax, asp.net, CSS, HTML, Javascript, XML ,Web
6 asp.net, C# ,Javascrpit, SQL, XML, Web
5 .NET, asp.net, C#, Web, SQL
5 afnetworking, CoreData, CoreGraphics, CoreLocation, QuartzCore
5 ajax.CSS,HTML,Jquery,JavaScript
3 Amazon AWS, Bamboo, Microsoft Azure
3 Hibernate, J2EE, Spring
3 Cassandra, Hbase, Hadoop
3 JSP,struts, servlet
2 java,Oracle
2 MongoDB, noSQL
2 MS Visio , Ms Project
2 android, ios
2 ABAP, SAP
Table: Example of term clusters with a frequency of 30%
in software domain
Every 2 terms are also
used together with a
frequency of %30 in every
group.

• Visualization of a Term Cluster

• Visualization of a Candidate Resume
Job Experiences:
Visual basic, MS Access, SQL, PL/SQL kullanarak uygulama geliştirilmesi, GSM
teknolojisinin detayları hakkında çalışma, Turkcell network yapısı hakkında
çalışma organizasyon, ekipman, teknoloji, problem çözüm stratejileri, kullanılan
yazılımlar.
Görev Alınan Projeler:
Web Tabanlı Okul Projesi : Yazılım uzmanı olarak görev aldım. ASP.NET ve SQL
Server tabanlı projede katmanlı mimari yapısı kullanılmıştır. Karneler de dahil
tüm kritik raporlar için Active Report ve performans kazanımı için Stored
Procedur kullanılmıştır.
Haftalık Ders Programı (Time Table) : C# ve SQL Server kullanılarak Desktop
Uygulaması yazılmıştır.
Kazanımlar : Active Reports, SQL Server, Stored Procedures, Three Tier
Applications

Matching of Resumes and Job Advertisements
• We implement the well known cosine similarity
measure for evaluating the similarity between
resumes and job advertisements to 5 different
term based method.
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = cos 0 =
𝐴 ∙ 𝐵
𝐴 𝐵
=
𝑖=1
𝑁
𝐴𝑖 × 𝐵𝑖
𝑖=1
𝑛
(𝑎𝑖)2 × 𝑖=1
𝑛
(𝐵𝑖)2

• 1- Boolean Weighting
Term Lexicon Job Advertisement Vector Resume Vector
.NET 1 0
C# 1 0
ASP.NET 0 0
Web 1 0
JAVA 0 1
Hibernate 0 0
XML 1 1
MS SQL Server 1 0
PL/SQL 0 1
iOS 0 0
Ireport 0 0
J2EE 0 1
J2ME 0 1
.. (all terms in this domain) .. ..
Table: An example resume and job advertisement vectors
Calculating Similarity with Cosine

Term Lexicon Job Vector Resume Vector
.NET 1 0
C# 1 0
ASP.NET 0 0
Web 1 0
JAVA 0 1
Hibernate 0 0
XML 1 1
MS SQL Server 1 0
PL/SQL 0 1
iOS 0 0
Ireport 0 0
J2EE 0 1
J2ME 0 1
..(all terms in this domain) .. ..
• 2- Assigning vector values with term clusters
Calculating Similarity with Cosine
Table: An example resume and job advertisement vectors
• We used the relationships between terms
• The system assumes that the terms which are in the same group are related
.NET 1 0
C# 1 0
ASP.NET 0 0
Web 1 0
JAVA 0 1
Hibernate 0 0
XML 1 1
MS SQL Server 1 0
PL/SQL 0 1
iOS 0 0
Ireport 0 0
J2EE 0 1
J2ME 0 1
TermID Term Term2 Weight2
199535J2EE Web 465
200422J2ME Web 28
202196java Web 1756
ASP.NET 0 0
Web 1 0.45
JAVA 0 1
Hibernate 0 0
XML 1 1

Methods :
(1-01) : 1. Boolean Weighting
(1-W) : 2. Assigning vector values with
term clusters
• Experiments
ID Method JA_ID ResumeID Result Index
1 1-01 112 106890930 0.3651483 1
2 1-01 112 102318815 0.2637521 2
3 1-01 112 110299507 0.2581988 3
4 1-01 112 9463976 0.2581988 4
5 1-01 112 11777808 0.2306328 5
6 1-01 156 215689 0.3481553 1
7 1-01 156 102318815 0.3405026 2
8 1-01 156 2563345 0.3333333 3
9 1-01 156 108766850 0.3333333 4
10 1-01 156 105445360 0.3187883 5
11 1-01 163 102318815 0.4865336 1
12 1-01 163 108766850 0.4082482 2
13 1-01 163 8624239 0.3644054 3
14 1-01 163 215689 0.3553345 4
15 1-01 163 11777808 0.3403516 5
16 1-W 112 1307469 0.5207212 1
17 1-W 112 308945 0.3693846 2
18 1-W 112 106890930 0.3651483 3
19 1-W 112 7134122 0.3132371 4
20 1-W 112 102318815 0.2637521 5
21 1-W 156 7134122 0.4235080 1
22 1-W 156 215689 0.3481553 2
23 1-W 156 102318815 0.3405026 3
24 1-W 156 2563345 0.3333333 4
25 1-W 156 108766850 0.3333333 5
26 1-W 163 102318815 0.4865336 1
27 1-W 163 108766850 0.4082482 2
28 1-W 163 8624239 0.3644054 3
29 1-W 163 215689 0.3553345 4
30 1-W 163 11777808 0.3403516 5
10 job and 100 resumes randomly

• Measuring the accuracy of our system & Defining Gold Standart
• We select 3 job advertisements
• Asked HR Specialist in the company to select 5 most suitable resume
for each
Job Ad. Resume 1 Resume 2 Resume 3 Resume 4 Resume 5
112 1321224 163083 41816 106890930 308945
156 41816 2563345 215689 102318815 106890930
163 102318815 41816 168821 215689 11777808
Table 6. Human Resource Specialist Matching Results

• Domain Expert’s Results
IlanNo CV1 CV2 CV3 CV4 CV5
112 109264052 100250475 10002710 105347779 41816
156 7839140 162336 105347779 41816 11777808
163 2563345 7839140
189 10002710 105347779 41816 1321224 162336
212 2563345
465 7839140 162336 41816 105347779 215689
480 10002710 100250475
695 2563345 7839140 10002710 41816 162336
884 41816 162336 2563345
3442 10002710 105347779 41816 162336 109264052
ID Method JA_ID ResumeID Result Index
1 1-01 112 106890930 0.3651483 1
2 1-01 112 102318815 0.2637521 2
3 1-01 112 110299507 0.2581988 3
4 1-01 112 9463976 0.2581988 4
5 1-01 112 11777808 0.2306328 5
6 1-01 156 215689 0.3481553 1
7 1-01 156 102318815 0.3405026 2
8 1-01 156 2563345 0.3333333 3
9 1-01 156 108766850 0.3333333 4
10 1-01 156 105445360 0.3187883 5
11 1-01 163 102318815 0.4865336 1
12 1-01 163 108766850 0.4082482 2
13 1-01 163 8624239 0.3644054 3
14 1-01 163 215689 0.3553345 4
15 1-01 163 11777808 0.3403516 5
16 1-W 112 1307469 0.5207212 1
17 1-W 112 308945 0.3693846 2
18 1-W 112 106890930 0.3651483 3
19 1-W 112 7134122 0.3132371 4
20 1-W 112 102318815 0.2637521 5
21 1-W 156 7134122 0.4235080 1
22 1-W 156 215689 0.3481553 2
23 1-W 156 102318815 0.3405026 3
24 1-W 156 2563345 0.3333333 4
25 1-W 156 108766850 0.3333333 5
26 1-W 163 102318815 0.4865336 1
27 1-W 163 108766850 0.4082482 2
28 1-W 163 8624239 0.3644054 3
29 1-W 163 215689 0.3553345 4
30 1-W 163 11777808 0.3403516 5
Method Similarity
1-01 0.36
1-W 0.42
Table 6. Human Resource Specialist Matching Results

Conclusions
• The proposed system is the first system that works in
Turkish.
• The system extracts the terms from job
advertisements and creates a lexicon of terms. Then
finds their relationships.
• Afterwards, proposed system implements resume and
job advertisement matching with different methods
• Based on performance results, the matching method
that uses the term clusters gives better results.
• Thus, we can say that analysing the relationship
between terms gets the system closer to finding the
appropriate match.

A Matching Approach Based on Term Clusters for eRecruitment

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (10)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie A Matching Approach Based on Term Clusters for eRecruitment

Ähnlich wie A Matching Approach Based on Term Clusters for eRecruitment (20)

Mehr von Kemal Can Kara

Mehr von Kemal Can Kara (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Matching Approach Based on Term Clusters for eRecruitment

Hinweis der Redaktion