SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
DATA MINING AND STATISTICAL ANALYSIS SOLUTIONS
Skills demand analysis based on the data from
online HR websites: Using web scraping and text
mining applications: IT Sector
Habet Madoyan
Vahe Movsisyan
Sunday, July 03, 2016
The analysis is funded by the research grant from American University of Armenia.
Presented at:
IX International School-Seminar. Town of Tsakhkadzor, Republic of Armenia
Methodology:
Overview
Datamotus LLC 2
Introduction
In recent years online job ads became a popular job-search model, that’s
why the research community is increasingly experimenting with the
detailed breakdown of online job ads to study labor market dynamics.
It is estimated that in USA 60-70 percent of job openings are now posted
on the Internet. However these job ads are biased toward industries and
occupations that seek high-skilled, “white-collar” workers.
Introduction
Job seekers, employers, students, researchers, policymakers, higher education
institutions, career advisors, and curriculum developers now view online job ads
data as a practical source to explore the nature of today’s dynamic of labor market.
Online job ads can show the relative demand for different types of skills and levels
of education. The real-time nature of job ads data also allows for the early
detection of labor demand trends, which gives job seekers, employers, and
policymakers a forward-looking analytical tool.
Real-time labor market indicators can be particularly useful in aligning education
and training curricula with workforce needs in emerging or rapidly changing
industries, such as healthcare and information technology, etc.
Job ads provide an incomplete picture of labor
demand
Online job ads data strongly correlate with job
openings data
Web Scraping
Text Mining
Datamotus LLC 7
Synopsys of the study
• Develop an algorithm for web scrapping job announcement
data (careercenter.am)
• Text mining and parsing algorithms to structure job
announcements
• Algorithms to assess and track vacancy rates by:
• Industry
• Job role
• Specific skills
What was done
• Around 20,000 posts are scrapped from the web,
• Posts come in rough, unstructured way. Algorithm is
developed to structure them.
A variable for each “section”
Total vacancy rate (Careercenter) and Official Labor
Demand (2004-2016 I Quarter)
Datamotus LLC 11
500
1000
1500
2000
2500
3000
100
150
200
250
300
350
400
450
500
550
600
2004Q1
2004Q2
2004Q3
2004Q4
2005Q1
2005Q2
2005Q3
2005Q4
2006Q1
2006Q2
2006Q3
2006Q4
2007Q1
2007Q2
2007Q3
2007Q4
2008Q1
2008Q2
2008Q3
2008Q4
2009Q1
2009Q2
2009Q3
2009Q4
2010Q1
2010Q2
2010Q3
2010Q4
2011Q1
2011Q2
2011Q3
2011Q4
2012Q1
2012Q2
2012Q3
2012Q4
2013Q1
2013Q2
2013Q3
2013Q4
2014Q1
2014Q2
2014Q3
2014Q4
2015Q1
2015Q2
2015Q3
2015Q4
2016Q1
Total jobs (Careercenter) Job Demand (NSS, right scale)
Correlation=0.76
Job Market Overview
IT sector
Datamotus LLC 12
ICT sector and overall economy
Datamotus LLC 13
3.00
3.20
3.40
3.60
3.80
4.00
4.20
4.40
1.60
1.70
1.80
1.90
2.00
2.10
2.20
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Average yearly wage in Transport and Communication sector/Average yearly wage in RA
Weight of Transport and Communication sector (including IT sector) in GDP (right scale, in %)
Total vacancy and IT sector vacancy rates (Careercenter,
2004-2016)
Datamotus LLC 14
0
20
40
60
80
100
120
140
160
180
200
100
150
200
250
300
350
400
450
2004Q1
2004Q2
2004Q3
2004Q4
2005Q1
2005Q2
2005Q3
2005Q4
2006Q1
2006Q2
2006Q3
2006Q4
2007Q1
2007Q2
2007Q3
2007Q4
2008Q1
2008Q2
2008Q3
2008Q4
2009Q1
2009Q2
2009Q3
2009Q4
2010Q1
2010Q2
2010Q3
2010Q4
2011Q1
2011Q2
2011Q3
2011Q4
2012Q1
2012Q2
2012Q3
2012Q4
2013Q1
2013Q2
2013Q3
2013Q4
2014Q1
2014Q2
2014Q3
2014Q4
2015Q1
2015Q2
2015Q3
2015Q4
2016Q1
Non IT Jobs (Careercenter) IT Jobs (Careercenter, right scale)
Correlation=0.81
Hard Skills in IT
Sector
Datamotus LLC 15
Time series: Annual demand for top 5 programming languages
Datamotus LLC 16
0
50
100
150
200
250
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
C++ Javascript Java C# PHP
Time series: Annual demand for top 5 programming languages
(parabolic trend)
Datamotus LLC 17
-30
20
70
120
170
220
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Poly. (C++) Poly. (Javascript) Poly. (Java) Poly. (C#) Poly. (PHP)
Analyzing demand for
programming languages using
association rules
Datamotus LLC 18
Arules
• Association rules mining is used to analyse the co-
occurrence of programming languages in a job post
• R package “”arules” and “arulesViz” are used for
the analysis
• Analysis is done for IT jobs only
Association rules: Measures of rules
interestingness
Datamotus LLC 20
Measure 1
Support = 𝑃 𝐴 ∩ 𝐵
Measure 2
Confidence = 𝑃 𝐵|𝐴 = 𝑃(𝐵 ∩ 𝐴)/𝑃(𝐴)
Measure 3
Lift =
𝑃 𝐵|𝐴
𝑃 𝐵
=
𝑃(𝐴∩𝐵)
𝑃(𝐴)
∗
1
𝑃(𝐵)
Suppose we have the rule : IF {A} = > {B}
Visualizing the rules
Datamotus LLC 21
Association Mining for
Programming languages: C++
Datamotus LLC 22
• Set of association rules is generated for top20 programming languages.
• Rules are subsetted with min support of 0.01 and min confidence of 0.1
Two items on the left
One item on the left
Association Mining for
Programming languages: Java
Datamotus LLC 23
Rules visualization:
Java (all rules)
Datamotus LLC 24
Rules Visualization:
Javascript
Datamotus LLC 25
Job Title Analysis
Datamotus LLC 26
IT Job Titles Frequency
Datamotus LLC 27
Most popular Job Titles (2004Q1-2016Q1) Percentage
software developer/engineer 18.29%
quality assurance engineer 5.42%
java software developer 4.98%
system administrator 4.00%
web developer 3.66%
.net developer 2.94%
php developer 2.33%
graphic designer 1.89%
ios developer 1.31%
android developer 1.26%
deep submicron 0.98%
database developer 0.96%
support specialist 0.96%
database administrator 0.92%
technical support 0.89%
technical writer 0.83%
support engineer 0.80%
application developer 0.72%
design engineer 0.72%
r&d engineer 0.68%
team leader 0.67%
frontend developer 0.55%
monitoring evaluation 0.52%
information security 0.50%
senior r&d 0.50%
57.29%
Software developer/engineer
Datamotus LLC 28
0
20
40
60
80
100
120
140
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Quality assurance engineer
Datamotus LLC 29
0
5
10
15
20
25
30
35
40
45
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
quality.assurance.engineer
Java software developer
Datamotus LLC 30
0
5
10
15
20
25
30
35
40
45
50
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
java.software.developer
System administrator
Datamotus LLC 31
0
5
10
15
20
25
30
35
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
system.administrator
Web developer
Datamotus LLC 32
0
5
10
15
20
25
30
35
40
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
web.developer
IT Job Titles vs Programming
languages
Job Titile => Programming language confidence Job Titile => Programming language confidence
{software developer/engineer} => {csharp} 0.33 {java software developer} => {java} 0.98
{software developer/engineer} => {java} 0.30 {java software developer} => {javascript} 0.47
{software developer/engineer} => {javascript} 0.20 {java software developer} => {j} 0.39
{software developer/engineer} => {asp} 0.20 {java software developer} => {shell} 0.11
{software developer/engineer} => {php} 0.12 {java software developer} => {ruby} 0.05
{software developer/engineer} => {j} 0.12 {system administrator} => {perl} 0.09
{software developer/engineer} => {tcl} 0.09 {system administrator} => {shell} 0.09
{software developer/engineer} => {python} 0.07 {system administrator} => {bash} 0.03
{software developer/engineer} => {cplusplus} 0.06 {system administrator} => {pl.sql} 0.02
{software developer/engineer} => {ruby} 0.03 {web developer} => {javascript} 0.76
{software developer/engineer} => {visual.basic} 0.02 {web developer} => {php} 0.57
{software developer/engineer} => {verilog} 0.02 {web developer} => {asp} 0.36
{quality assurance engineer} => {java} 0.27 {web developer} => {csharp} 0.27
{quality assurance engineer} => {shell} 0.25 {web developer} => {ruby} 0.02
{quality assurance engineer} => {perl} 0.22 {.net developer} => {asp} 0.82
{quality assurance engineer} => {python} 0.14 {.net developer} => {csharp} 0.80
{quality assurance engineer} => {tcl} 0.12 {.net developer} => {javascript} 0.42
{quality assurance engineer} => {bash} 0.04 {.net developer} => {visual.basic} 0.03
{quality assurance engineer} => {verilog} 0.04 {php developer} => {php} 1.00
{php developer} => {javascript} 0.71
{php developer} => {ruby} 0.08
{php developer} => {python} 0.07
Datamotus LLC 33
Next Steps:
• Develop machine learning algorithm to classify job ads by sectors,
• Develop state of art text mining and topic modeling algorithms to
predict demand for skills, professions and job roles,
• Create interactive web dashboard (using R shiny) to help:
• Potential job seekers
• Potential employees
• Policy makers
• Universities
Datamotus LLC 34
Thank You For Your Attention!
Datamotus LLC 35

Weitere ähnliche Inhalte

Was ist angesagt?

Project Brief Summary PowerPoint Presentation Slides
Project Brief Summary PowerPoint Presentation Slides Project Brief Summary PowerPoint Presentation Slides
Project Brief Summary PowerPoint Presentation Slides SlideTeam
 
Project Management PowerPoint Presentation Slides
Project Management PowerPoint Presentation SlidesProject Management PowerPoint Presentation Slides
Project Management PowerPoint Presentation SlidesSlideTeam
 
Project management and information technology context
Project management and information technology contextProject management and information technology context
Project management and information technology contextDhani Ahmad
 
Time Management within IT Project Management
Time Management within IT Project ManagementTime Management within IT Project Management
Time Management within IT Project Managementrielaantonio
 
Top Ten Obstacles To Project Success
Top Ten Obstacles To Project SuccessTop Ten Obstacles To Project Success
Top Ten Obstacles To Project SuccessLou Gasco
 
Chap 1 Modern Project Management
Chap 1 Modern Project ManagementChap 1 Modern Project Management
Chap 1 Modern Project Managementproject management
 
Developer group introduction & Salesforce overview
Developer group introduction & Salesforce overviewDeveloper group introduction & Salesforce overview
Developer group introduction & Salesforce overviewSujesh Ramachandran
 
Salesforce Communities
Salesforce CommunitiesSalesforce Communities
Salesforce CommunitiesSunil kumar
 
Salesforce Service cloud 3 presentation
Salesforce Service cloud 3 presentation Salesforce Service cloud 3 presentation
Salesforce Service cloud 3 presentation missmeryl
 
Information System and Information Technology
Information System and Information TechnologyInformation System and Information Technology
Information System and Information Technologymegat zainurul anuar
 
Information Technology Project Management - part 01
Information Technology Project Management - part 01Information Technology Project Management - part 01
Information Technology Project Management - part 01Rizwan Khurram
 
A comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org StrategyA comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org StrategyGaytri khandelwal
 
Fundraising with Salesforce
Fundraising with SalesforceFundraising with Salesforce
Fundraising with SalesforcePurple Vision
 
The Forgotten People - Super Users Are Key
The Forgotten People - Super Users Are KeyThe Forgotten People - Super Users Are Key
The Forgotten People - Super Users Are KeyRiz Khan
 
IT for Management opportunities and challenges.
IT for Management opportunities and challenges.IT for Management opportunities and challenges.
IT for Management opportunities and challenges.RahatKabir6
 

Was ist angesagt? (20)

Basic Logic gates
Basic Logic gatesBasic Logic gates
Basic Logic gates
 
Project Brief Summary PowerPoint Presentation Slides
Project Brief Summary PowerPoint Presentation Slides Project Brief Summary PowerPoint Presentation Slides
Project Brief Summary PowerPoint Presentation Slides
 
Project Management PowerPoint Presentation Slides
Project Management PowerPoint Presentation SlidesProject Management PowerPoint Presentation Slides
Project Management PowerPoint Presentation Slides
 
Obstacles to effective knowledge elicitation
Obstacles to effective knowledge elicitationObstacles to effective knowledge elicitation
Obstacles to effective knowledge elicitation
 
Project management and information technology context
Project management and information technology contextProject management and information technology context
Project management and information technology context
 
Time Management within IT Project Management
Time Management within IT Project ManagementTime Management within IT Project Management
Time Management within IT Project Management
 
Introduction to salesforce ppt
Introduction to salesforce pptIntroduction to salesforce ppt
Introduction to salesforce ppt
 
Top Ten Obstacles To Project Success
Top Ten Obstacles To Project SuccessTop Ten Obstacles To Project Success
Top Ten Obstacles To Project Success
 
Chap 1 Modern Project Management
Chap 1 Modern Project ManagementChap 1 Modern Project Management
Chap 1 Modern Project Management
 
Developer group introduction & Salesforce overview
Developer group introduction & Salesforce overviewDeveloper group introduction & Salesforce overview
Developer group introduction & Salesforce overview
 
Salesforce Communities
Salesforce CommunitiesSalesforce Communities
Salesforce Communities
 
Sales force automation
Sales force automation  Sales force automation
Sales force automation
 
Salesforce Service cloud 3 presentation
Salesforce Service cloud 3 presentation Salesforce Service cloud 3 presentation
Salesforce Service cloud 3 presentation
 
Information System and Information Technology
Information System and Information TechnologyInformation System and Information Technology
Information System and Information Technology
 
Information Technology Project Management - part 01
Information Technology Project Management - part 01Information Technology Project Management - part 01
Information Technology Project Management - part 01
 
A comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org StrategyA comprehensive guide to Salesforce Org Strategy
A comprehensive guide to Salesforce Org Strategy
 
Groupware/CSCW
Groupware/CSCWGroupware/CSCW
Groupware/CSCW
 
Fundraising with Salesforce
Fundraising with SalesforceFundraising with Salesforce
Fundraising with Salesforce
 
The Forgotten People - Super Users Are Key
The Forgotten People - Super Users Are KeyThe Forgotten People - Super Users Are Key
The Forgotten People - Super Users Are Key
 
IT for Management opportunities and challenges.
IT for Management opportunities and challenges.IT for Management opportunities and challenges.
IT for Management opportunities and challenges.
 

Ähnlich wie IT Skills Analysis

가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215dasandata
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseLeMeniz Infotech
 
IRJET- Placement Portal
IRJET- Placement PortalIRJET- Placement Portal
IRJET- Placement PortalIRJET Journal
 
SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP Dave Stokes
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkJim Kaplan CIA CFE
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Dave Stokes
 
Web crawler with seo analysis
Web crawler with seo analysis Web crawler with seo analysis
Web crawler with seo analysis Vikram Parmar
 
SQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce AnalyticsSQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce AnalyticsSumit Sarkar
 
10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model GovernanceQuantUniversity
 
Rietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISVRietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISVFrank Rietta
 
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdfSLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdfSamuelNahum1
 
4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatestashutosh kumar
 
香港六合彩
香港六合彩香港六合彩
香港六合彩weige
 
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...OpenSource Connections
 
Online examination documentation
Online examination documentationOnline examination documentation
Online examination documentationWakimul Alam
 
Draft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industriesDraft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industriesARCResearch
 

Ähnlich wie IT Skills Analysis (20)

가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
 
Java Programming Materials
Java Programming MaterialsJava Programming Materials
Java Programming Materials
 
IRJET- Placement Portal
IRJET- Placement PortalIRJET- Placement Portal
IRJET- Placement Portal
 
SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP
 
LokeshMahawarResume
LokeshMahawarResumeLokeshMahawarResume
LokeshMahawarResume
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
 
Web crawler with seo analysis
Web crawler with seo analysis Web crawler with seo analysis
Web crawler with seo analysis
 
SQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce AnalyticsSQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce Analytics
 
10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance
 
Rietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISVRietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISV
 
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdfSLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
 
50120130406017
5012013040601750120130406017
50120130406017
 
4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
 
Online examination documentation
Online examination documentationOnline examination documentation
Online examination documentation
 
ZaheerFinal20Aug
ZaheerFinal20AugZaheerFinal20Aug
ZaheerFinal20Aug
 
Draft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industriesDraft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industries
 

Kürzlich hochgeladen

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 

Kürzlich hochgeladen (16)

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 

IT Skills Analysis

  • 1. DATA MINING AND STATISTICAL ANALYSIS SOLUTIONS Skills demand analysis based on the data from online HR websites: Using web scraping and text mining applications: IT Sector Habet Madoyan Vahe Movsisyan Sunday, July 03, 2016 The analysis is funded by the research grant from American University of Armenia. Presented at: IX International School-Seminar. Town of Tsakhkadzor, Republic of Armenia
  • 3. Introduction In recent years online job ads became a popular job-search model, that’s why the research community is increasingly experimenting with the detailed breakdown of online job ads to study labor market dynamics. It is estimated that in USA 60-70 percent of job openings are now posted on the Internet. However these job ads are biased toward industries and occupations that seek high-skilled, “white-collar” workers.
  • 4. Introduction Job seekers, employers, students, researchers, policymakers, higher education institutions, career advisors, and curriculum developers now view online job ads data as a practical source to explore the nature of today’s dynamic of labor market. Online job ads can show the relative demand for different types of skills and levels of education. The real-time nature of job ads data also allows for the early detection of labor demand trends, which gives job seekers, employers, and policymakers a forward-looking analytical tool. Real-time labor market indicators can be particularly useful in aligning education and training curricula with workforce needs in emerging or rapidly changing industries, such as healthcare and information technology, etc.
  • 5. Job ads provide an incomplete picture of labor demand Online job ads data strongly correlate with job openings data
  • 6.
  • 8. Synopsys of the study • Develop an algorithm for web scrapping job announcement data (careercenter.am) • Text mining and parsing algorithms to structure job announcements • Algorithms to assess and track vacancy rates by: • Industry • Job role • Specific skills
  • 9. What was done • Around 20,000 posts are scrapped from the web, • Posts come in rough, unstructured way. Algorithm is developed to structure them.
  • 10. A variable for each “section”
  • 11. Total vacancy rate (Careercenter) and Official Labor Demand (2004-2016 I Quarter) Datamotus LLC 11 500 1000 1500 2000 2500 3000 100 150 200 250 300 350 400 450 500 550 600 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4 2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3 2010Q4 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 2013Q1 2013Q2 2013Q3 2013Q4 2014Q1 2014Q2 2014Q3 2014Q4 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 Total jobs (Careercenter) Job Demand (NSS, right scale) Correlation=0.76
  • 12. Job Market Overview IT sector Datamotus LLC 12
  • 13. ICT sector and overall economy Datamotus LLC 13 3.00 3.20 3.40 3.60 3.80 4.00 4.20 4.40 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Average yearly wage in Transport and Communication sector/Average yearly wage in RA Weight of Transport and Communication sector (including IT sector) in GDP (right scale, in %)
  • 14. Total vacancy and IT sector vacancy rates (Careercenter, 2004-2016) Datamotus LLC 14 0 20 40 60 80 100 120 140 160 180 200 100 150 200 250 300 350 400 450 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4 2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3 2010Q4 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 2013Q1 2013Q2 2013Q3 2013Q4 2014Q1 2014Q2 2014Q3 2014Q4 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 Non IT Jobs (Careercenter) IT Jobs (Careercenter, right scale) Correlation=0.81
  • 15. Hard Skills in IT Sector Datamotus LLC 15
  • 16. Time series: Annual demand for top 5 programming languages Datamotus LLC 16 0 50 100 150 200 250 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 C++ Javascript Java C# PHP
  • 17. Time series: Annual demand for top 5 programming languages (parabolic trend) Datamotus LLC 17 -30 20 70 120 170 220 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Poly. (C++) Poly. (Javascript) Poly. (Java) Poly. (C#) Poly. (PHP)
  • 18. Analyzing demand for programming languages using association rules Datamotus LLC 18
  • 19. Arules • Association rules mining is used to analyse the co- occurrence of programming languages in a job post • R package “”arules” and “arulesViz” are used for the analysis • Analysis is done for IT jobs only
  • 20. Association rules: Measures of rules interestingness Datamotus LLC 20 Measure 1 Support = 𝑃 𝐴 ∩ 𝐵 Measure 2 Confidence = 𝑃 𝐵|𝐴 = 𝑃(𝐵 ∩ 𝐴)/𝑃(𝐴) Measure 3 Lift = 𝑃 𝐵|𝐴 𝑃 𝐵 = 𝑃(𝐴∩𝐵) 𝑃(𝐴) ∗ 1 𝑃(𝐵) Suppose we have the rule : IF {A} = > {B}
  • 22. Association Mining for Programming languages: C++ Datamotus LLC 22 • Set of association rules is generated for top20 programming languages. • Rules are subsetted with min support of 0.01 and min confidence of 0.1 Two items on the left One item on the left
  • 23. Association Mining for Programming languages: Java Datamotus LLC 23
  • 24. Rules visualization: Java (all rules) Datamotus LLC 24
  • 27. IT Job Titles Frequency Datamotus LLC 27 Most popular Job Titles (2004Q1-2016Q1) Percentage software developer/engineer 18.29% quality assurance engineer 5.42% java software developer 4.98% system administrator 4.00% web developer 3.66% .net developer 2.94% php developer 2.33% graphic designer 1.89% ios developer 1.31% android developer 1.26% deep submicron 0.98% database developer 0.96% support specialist 0.96% database administrator 0.92% technical support 0.89% technical writer 0.83% support engineer 0.80% application developer 0.72% design engineer 0.72% r&d engineer 0.68% team leader 0.67% frontend developer 0.55% monitoring evaluation 0.52% information security 0.50% senior r&d 0.50% 57.29%
  • 28. Software developer/engineer Datamotus LLC 28 0 20 40 60 80 100 120 140 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
  • 29. Quality assurance engineer Datamotus LLC 29 0 5 10 15 20 25 30 35 40 45 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 quality.assurance.engineer
  • 30. Java software developer Datamotus LLC 30 0 5 10 15 20 25 30 35 40 45 50 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 java.software.developer
  • 31. System administrator Datamotus LLC 31 0 5 10 15 20 25 30 35 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 system.administrator
  • 32. Web developer Datamotus LLC 32 0 5 10 15 20 25 30 35 40 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 web.developer
  • 33. IT Job Titles vs Programming languages Job Titile => Programming language confidence Job Titile => Programming language confidence {software developer/engineer} => {csharp} 0.33 {java software developer} => {java} 0.98 {software developer/engineer} => {java} 0.30 {java software developer} => {javascript} 0.47 {software developer/engineer} => {javascript} 0.20 {java software developer} => {j} 0.39 {software developer/engineer} => {asp} 0.20 {java software developer} => {shell} 0.11 {software developer/engineer} => {php} 0.12 {java software developer} => {ruby} 0.05 {software developer/engineer} => {j} 0.12 {system administrator} => {perl} 0.09 {software developer/engineer} => {tcl} 0.09 {system administrator} => {shell} 0.09 {software developer/engineer} => {python} 0.07 {system administrator} => {bash} 0.03 {software developer/engineer} => {cplusplus} 0.06 {system administrator} => {pl.sql} 0.02 {software developer/engineer} => {ruby} 0.03 {web developer} => {javascript} 0.76 {software developer/engineer} => {visual.basic} 0.02 {web developer} => {php} 0.57 {software developer/engineer} => {verilog} 0.02 {web developer} => {asp} 0.36 {quality assurance engineer} => {java} 0.27 {web developer} => {csharp} 0.27 {quality assurance engineer} => {shell} 0.25 {web developer} => {ruby} 0.02 {quality assurance engineer} => {perl} 0.22 {.net developer} => {asp} 0.82 {quality assurance engineer} => {python} 0.14 {.net developer} => {csharp} 0.80 {quality assurance engineer} => {tcl} 0.12 {.net developer} => {javascript} 0.42 {quality assurance engineer} => {bash} 0.04 {.net developer} => {visual.basic} 0.03 {quality assurance engineer} => {verilog} 0.04 {php developer} => {php} 1.00 {php developer} => {javascript} 0.71 {php developer} => {ruby} 0.08 {php developer} => {python} 0.07 Datamotus LLC 33
  • 34. Next Steps: • Develop machine learning algorithm to classify job ads by sectors, • Develop state of art text mining and topic modeling algorithms to predict demand for skills, professions and job roles, • Create interactive web dashboard (using R shiny) to help: • Potential job seekers • Potential employees • Policy makers • Universities Datamotus LLC 34
  • 35. Thank You For Your Attention! Datamotus LLC 35