Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Â
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
1. Licensed to Analyze?
Who Can Claim to be a Data Scientist?
Defining Roles, Standards and Assessing Skills in Data Science
Initiative for Analytics and Data Science Standards (IADSS)
August 5th | 2019
www.iadss.org
Usama Fayyad | IADSS co-founder, Open Insights - Chairman & CEO
Hamit Hamutcu | IADSS co-founder Analytics Center & Smartcon, Co-founder & CEO
2. www.iadss.org http://blog.kaggle.com/2019/01/18/reviewing-
2018-and-previewing-2019/
Number of analytics professionals is increasing at a high rate
kaggle users
The number of Kaggle
members gives insight
about the rapid increase
in number of analytics
professionals
4,466 24,313 70,980 137,873
240,933
437,442
589,552
1,400,000
2,500,000
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
CAGR %120
(1.55 M Logged-in)
3. www.iadss.org
Do you believe a pilot needs a license? Special training? Certification?
Who would you trust to pilot a plane you are riding on?
Would you rather
ride with him?
Or with them
at the
controls?
4. www.iadss.org
How about a surgeon performing surgery on you?
How do you know he or she is qualified to operate?
Bad things happen when we cannot define
the necessary skills and knowledgeâŚ
5. www.iadss.org
Who is analyzing your data? Are they qualified?
Are you extracting the right value from your data assets?
What happens when the wrong outcomes are provided?
Do you think bad things can
happen when people who are not
qualified are analyzing your data?
In the new world, failure to use
data properly likely means the
failure of your businessâŚ
6. www.iadss.org
estimated audience according to LinkedIn
12.000.000 + Global LinkedIn
Members: Capability Targeting
1.600.000 + Global LinkedIn
Members: Title Targeting
A quick keyword search on LinkedIn shows a large # of professionals defining
themselves in analytics related spaces
7. www.iadss.org
Top 100 LinkedIn Groupâs Member Base: 2,300,000
.. and a multitude of groups with growing memberships
Group Name Members
Big Data and Analytics 322.677
Data Science Central 270.944
Big Data, Analytics, Business Intelligence & Visualization Experts Community 223.728
Big Data | Analytics | Strategy | Finance | Innovation 221.666
Business Intelligence Professionals (BI, Big Data, Analytics, IoT) 209.659
Business Analytics, Big Data, and Artificial Intelligence 157.740
Data Mining, Statistics, Big Data, DataVisualization, and Data Science 155.737
Python Community 129.044
Microsoft Business Intelligence 123.292
Change Consulting | Digital Transformation Data Analytics Security 100.398
Big Data 86.404
Machine Learning and Data Science 75.363
Hadoop Users 70.821
Business Analyst forum [BA forum] 69.802
Big Data, Analytics, Hadoop, NoSQL & Cloud Computing 69.761
TDWI: Analytics and Data Management Discussion Group 68.326
Analytics and Artificial Intelligence (AI) in Marketing and Retail 66.896
Python Professionals 59.743
Data Warehouse - Big Data - Hadoop - Cloud - Data Science - ETL 59.711
Data Scientists 53.295
Big Data & Hadoop Professionals 51.887
Data Warehousing (Business Intelligence, ETL) Professional's Group 51.331
Business Intelligence 47.370
8. www.iadss.org
How many Data Scientists are there in the world?
What do you think?
https://www.huffpost.com/entry/where-will-data-science-b_b_12375864
https://www.forbes.com/sites/louiscolumbus/2017/05/13/ibm-predicts-demand-
for-data-scientists-will-soar-28-by-2020/
https://www.pwc.com/us/en/library/data-science-and-analytics.html
âThere are between 1.5-3 million data scientists in the worldâ (2016) â Anthony
Goldbloom, Co-founder & CEO @Kaggle
⢠200K - 700K new grads join the job market annually
⢠The number of jobs for all US data professionals will increase to 2,720,000 openings by 2020
[IBM].
⢠Annual demand for the fast-growing new roles of data scientists, developers, and engineers in
US will reach nearly 700,000 openings.
Really?
9. www.iadss.org
Despite the increasingly large numbers, there is still data science skills
shortage in US, which was not unexpected âŚ
https://expandedramblings.com/index.php/linkedin-job-statistics/
https://economicgraph.linkedin.com/research/LinkedIns-2017-US-Emerging-Jobs-Report
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-
the-next-frontier-for-innovation
https://economicgraph.linkedin.com/resources/linkedin-workforce-report-august-2018
In 2011, McKinsey forecasted that US
could face a shortage of 150-190K people
with deep analytical skills by 2018 âŚ
⌠as verified by LinkedIn 2018 Workforce
Report; there is a shortage of 151K
people with âdata science skillsâ
Top Emerging Jobs (2012-2017)
10. www.iadss.org
PwC âData Science & Analyticsâ Report does not mention a
shortage, but the lack of skills of existing workforce
https://www.pwc.com/us/en/library/data-science-and-
analytics.html
An alternative domain knowledge
offered by PwC on DS roles:
12. www.iadss.org
In addition to the shortage, we see different backgrounds & skills for data
scientists even at same company âŚ
**** **** ****
13. www.iadss.org
It gets even more complicated when you look across
different companies and sectors
**** **** ****
14. www.iadss.org
At actual job postings, you see a wide variety of role definitions of and
expectations from the same job title
15. www.iadss.org
There are over 250 programs in the US that offer graduate degrees in
Analytics or Data Science
https://analytics.ncsu.edu/?page_id=4184
16. www.iadss.org
Despite having similar names and objectives, course
offerings and approach of these programs vary widely
Brigham Young University
Data Science
http://www.byui.edu/catalog/#/programs/41PwqJ9RZ
Winona State University
Data Science
https://catalog.winona.edu/preview_program.php?catoid=21&poid=4333
CIT111 - Introduction to Databases
CS101 - Introduction to Programming
CS241 - Survey Object-Oriented Programming/Data Struct.
CS450 - Machine Learning and Data Mining
MATH325 - Intermediate Statistics
MATH425 - Applied Linear Regression
MATH488 - Statistical Consulting
+
CS335 - Data Wrangling, Exploration, and Visualization
MATH221A - Business Statistics
Electives + Project + Internship
DSCI 210 - Data Science
DSCI 310 - Data Summary and Visualization
DSCI 325 - Management of Structured Data
STAT 210 - Statistics
STAT 310 - Intermediate Statistics
STAT 360 - Regression Analysis
+
MATH 140 - Applied Calculus
CS 234-250 - Algorithms and Problem-Solving I-II
CS 385 - Applied Database Management Systems
DSCI 395-495 - Professional Skill Development & Communication
Electives + Project OR Internship
17. www.iadss.org
⌠which is also valid for graduate programs
Carnegie Mellon Master of Computational - Data
Science
Columbia University Master of Science in - Data
Science
18. www.iadss.org
LESSON 1
Introduction to Data Science
⢠Pi-Chaun (Data Scientist @
Google): What is Data Science?
⢠Gabor (Data Scientist @ Twitter):
What is Data Science?
⢠Problems solved by data science.
LESSON 2
Data Wrangling
⢠What is Data Wrangling?
⢠Acquiring data.
⢠Common data formats.
LESSON 3
Data Analysis
⢠Statistical rigor.
⢠Kurt (Data Scientist @ Twitter) -
Why is Stats Useful?
⢠Introduction to normal
distribution.
LESSON 4
Data Visualization
⢠Effective information
visualization.
⢠An analysis of Napoleon's
invasion of Russia!
⢠Don (Principal Data Scientist @
AT&T): Communicating Findings.
LESSON 5
MapReduce
⢠Introduction to Big Data and
MapReduce.
⢠Learn the basics of
MapReduce.
⢠Mapper.
Course Content
udacity.com/course/intro-to-data-science--ud359
LESSON 1
Data Management and Visualization
⢠Managing Data
⢠Visualizing Data
LESSON 2
Data Analysis Tools
⢠Hypothesis Testing and ANOVA
⢠Chi Square Test of Independence
⢠Pearson Correlation
⢠Exploring Statistical Interactions
LESSON 3
Regression Modeling in Practice
⢠Basics of Linear Regression
⢠Multiple Regression
⢠Logistic Regression
LESSON 4
Machine Learning for Data
Analysis
⢠Decision Trees
⢠Random Forests
⢠Lasso Regression
⢠K-Means Cluster Analysis
coursera.org/specializations/data-analysis
⌠as well as online courses
Coursera | Intro to Data Science
Udacity | Intro to Data Science
19. www.iadss.org
:
The same confusion exists also in the recruitment process
âMost interviewers had me write pseudocode, in something like Python. Most asked me some product-specific
questions, such as, "How would you use data to improve X feature on our website?" Some interviewers asked me to
write SQL, in addition to or instead of pseudocode. Another question I was often asked was how to set up some kind of
experiment, such as, "How would we design an experiment to see whether our new homepage is better?â or "How can
we use data to improve search results?". One or two interviewers asked me algorithms questions (quicksort, etc) but
not in very much depth.
Beyond that, there was little in common. The formats varied a lot. Some interviews were all-day affairs -
back-to-back meetings with programmers all day - and others were just a quick meeting with a CTO. Some
interviews had me filling whiteboards with code, while others just consisted of a face-to-face conversation. A few of the
interviews involved some sort of social/culture component, ranging from formal interviews with non-
technical people to happy hours.â
â A job searcher on Quora:
20. www.iadss.org
:
There is real and significant cost to employers in
recruitment and matching the right person to the right job
Average Cost of Recruitment
(only fees, internal costs not
included)
$35,000
â According to IBM; Data Science and Analytics jobs remain open an average of 45 days,
5 days longer than average for all positions.
â More than 2.7 million data science and analytics job openings by 2020
â Hiring a data scientist involves multiple rounds of interviews, often carried out by
already scarce existing talent within an organization
â Average turnover ratio is higher for growing roles
https://www.forbes.com/sites/louiscolumbus/2017/05/13/ibm-
predicts-demand-for-data-scientists-will-soar-28-by-2020/
https://www.shrm.org/about-shrm/press-room/press-
releases/pages/human-capital-benchmarking-report.aspx
https://www.burtchworks.com/2019/03/11/how-long-do-data-
scientists-analytics-pros-stay-at-their-jobs/
Hundreds of millions of USD wasted in an
increasingly inefficient recruitment process
New hire
rate
Data
Scientist
ML
Eng/Spec
Data
Analysts
Statistician
s
(3-months) 10-11% 12% 6% 3-4%
Software
Eng
Sales Rep Accountant
4% 4% 2%
21. www.iadss.org
Setting industry standards would support the
healthy growth of the analytics market
⪠As the role of data and analytics
expands very rapidly in creating
new business models or changing
existing ones, demand for analytics
professionals is growing at
increasing rate
⪠Every company has a unique way
of defining roles in related to data
analytics and big data technology
Background
⪠Wide variety of role definitions, expected
hard/ soft skills, and experience/career
development plan.
⪠Lack of standards creates inefficiencies
and difficulties for companies in position
matching, leveraging analytics skills
effectively and retaining talent.
⪠Also makes it hard for professionals to
understand what a position requires and
follow a career-path.
Challenges
⪠A framework to understand the
analytics profession landscape,
how companies structure their
analytics teams, most common job
titles, roles and corresponding skill-
set requirements
Need
22. www.iadss.org
IADSS aims to support the data analytics ecosystem by defining
professional standards and suggesting ways to measure and assess
relevant knowledge and skills
Job Titles, Roles
Knowledge and Skills
Requirements
Assessment and Measurement
Industry Standards
23. www.iadss.org
⌠with involvement of all related parties
⢠Research initiative will focus on existing and emerging data
analytics related roles within organizations, job
requirements for such roles and corresponding skill-sets.
⢠IADSS will then analyze and group findings to create a
standardized list of data analytics roles along with career
paths and profiles/skills of professionals to fulfill the
requirements of these roles
⢠IADSS will rely on the expertise of academicians, industry
experts and professionals to ensure defined standards are
academically sound and rooted in industry realities
⢠This will be an ongoing initiative, continuously updated
according to industry dynamics and emerging topics in data
science
⢠To create awareness, IADSS will work towards promoting
standards globally
24. www.iadss.org
Insight are generated via interviews, surveys and other sources with
participation of key profiles from industry & academia
1-to-1 Interviews Survey Research
3rd party research and
social media analysis
Conferences,
Meet-ups & Workshops
â under the guidance of IADSS Advisory
Board & with the support of
Community Partners
â IADSS aims to engage with the
analytics community through
conferences and events (KDD2018,
ICDM-2018, ODSC-2019, Metis
Demystifying DS 07-2019
25. www.iadss.org
Insight from our 1-1 interviews with data executives
âWe've seen folks create a bunch of beautiful dashboards and cost of tools has
gone down precipitously in the last 20 years but that doesn't mean that you
know what youâre looking at or ensure it wonât be misused and misrepresented.
Same thing on the data science front. The most important thing is not being
able to use an algorithm that you picked off a tool but to know how and
why you're using it.
âI just hired a data scientist and we started with about 60 applicants. The role was
fairly well described and as a result I immediately eliminated 40 without a
screening callâŚ
So I got down to about 20 for screening calls⌠down to 5 interviews onsite.
At the end none of them were acceptable except for one. So from 60 to 1, this is
a huge effort. After the screening, the ones that actually made it to interviews almost
all of them failed on the math questions. There are many of them strong in
engineering and but the math is rare.â
26. www.iadss.org
Insight from our 1-1 interviews with data executives
âYou want to curb attrition and that ends up affecting
your decisions on recognizing and promoting people
between levels which might be inconsistent with actual skill
sets and how they're progressing in their roles. But I don't
see a solution for either because the market is so hot and
they're getting bombarded with job offers. And that
leads to a lot of frustration and cultural impact on the
organizationâ
âI take courses and certifications on platforms like
Coursera as an expression of interest rather than
expertise. It shows commitment to lifelong learning and
which I think is really important for the Data Science
community and participants.â
âI've been on a panel where the panelist next to me who took a statistics course some time when
she was at university and doesn't think math is important for Data Science.â
27. www.iadss.org
IADSS aims the development & adoption of standards in the industry
⢠Publishing report with proposed standards and assessment/measurement
framework
⢠Creating awareness and engagement within the Analytics Community
⢠Driving adoption of standardized skill-sets and roles by the industry through
tools for employers, driving awareness, and education
⢠Collaboration with academia to provide input into program and curriculum
development
⢠Updating standards regularly to keep up with changes in technology and data
analytics field
⢠Developing methodologies and tools to support organizations and
professionals to create more efficient job market and more effective career
development
Long-Term
Objectives
Ubiquity of standardized roles, definitions, and skill
requirements for employers, educators, and practitioners.
28. www.iadss.org
Some Concluding Thoughts
There is a real problem in the industry â We will lose trust and credibility if we do
not define what a Data Scientist is and what to expect from the roleâŚ
⢠We need to come together as a community and think this through â standards can help a lot
⢠Everyone is confusedâŚ
o Employers donât know who to hire and who is qualified
o Educators donât know how to train properly for the real roles out there
o Candidates are confused and donât know what is expected of them
o Bad experiences lead to the steady decline of what will become the essential job of the 21st
century
⢠Much effort and money are wasted filling ill-defined roles with unqualified people
⢠IADSS was created as an industry-wide initiative to address the confusion and de-mystify the
role of a Data Scientist, Analyst, and Data Engineers/Professionals
⢠Join us â take the survey, volunteer, participate, contribute in shaping this new fieldâŚ
29. www.iadss.org
Some Questions to Answer in this Workshop
⢠Sharing early results from surveys
⢠How do we define Data Science?
⢠How many types of jobs do we end up outlining?
o Data Scientist (different types)
o Data Analyst
o Business Analyst
o Data Engineer
o ML Scientist
o ML Engineer
o BI Profesional
⢠How do we help create assessments?
⢠How do we set industry standards that are accepted by majority
31. www.iadss.org
Research for Standards on Definitions of Analytics Roles, Skill-sets
and Career: Survey looks into expected knowledge
Insights about analytics/ data science
team(s)
Training, Development & Hiring
⢠Analytics Director
⢠Analytics Manager
⢠BI Analyst / Specialist
⢠BI Director
⢠Big Data Engineer
⢠Chief Data/ Analytics Officer
⢠Data Analyst
⢠Data Architect
⢠Data Engineer
Job Titles
⢠Education
⢠Data mining basics
⢠Science skills
⢠Engineering skills
⢠Business / soft skills
Join -> bit.ly/IADSSsurvey
⢠Data Miner
⢠Data Modeler
⢠Data Science Director
⢠Data Scientist
⢠Machine Learning
Engineer
⢠Machine Learning
Scientist/ Expert/
Specialist
⢠Scientist / Researcher
⢠Leadership related skills
⢠Business domain skills
⢠Tool skills
32. www.iadss.org
Research participants come from a wide spectrum of industries and
geographies
⢠More than 700 survey responses
collected so far from
professionals and data
science/analytics/BI executives.
⢠We received insight from
hundreds of organizations
globally.
38. www.iadss.org
Quick Quiz
Which of these skills would you consider
âmust-haveâ for being a Data Scientist?
Databases
Statistics
BI & Advanced
Analytics
Cloud Computing General Computing Big Data
Visualization
Data Transformation
Optimization Programming
Domain Expertise
39. www.iadss.org
Initial insights for âData Scientistâ role
Not Relevant Nice to Have Should Have Must Have
Data mining basics
Generating & interpreting basic statistical descriptions, basic visual
descriptions of data, data cleaning, transformation, etc.
Science skills *
Statistics, optimization, predictive modelling, ML, NLP, etc.
Engineering skills *
SW engineering, big data development and maintenance, DB and DWH
development, etc.
Business / soft skills
Building a data driven business narrative, cross functional collaboration,
etc.
Leadership related skills
Business domain skills
Tool skills *
Scripting languages, statistical programming languages, libraries for
machine learning, NoSQL, etc.
* Unrelated individual skills decrease the average for domain
41. www.iadss.org
Tool
Skills
Tool Families
Initial insights for âData Scientistâ role
Some domains in detail
Experience on a specifictool 3.14
Scripting languagesfordata science 3.28
Statisticalprogram m ing languages 3.33
Generalpurpose program m ing languages 2.05
Librariesform achine learning /num ericalcom puting 2.95
Developertools 2.40
SQ L,understanding ofrelationaldatabases 3.14
Enterprise relationalDBM S 2.40
NoSQ L 2.05
O pen-source relationalDBM S 2.30
Integration,ETL,A utom ation,Design tools 1.86
Generalknow ledge on distributed storage/com puting fram ew orks 2.16
Know ledge ofspecific 1.74
Distributed databases,data w arehousing,and storage 1.95
Clusterresource m anagers,otherbig data technologies 1.50
Enterprise advanced analytics/data m ining softw are 2.23
Reporting and visualization softw are 2.42
O ffice productivity,spreadsheettools 2.95
Experience w ith cloud com puting platform s 2.28
Cloud-based data w arehousing 1.98
O therspecificcloud-com puting services 1.68
Specificknow ledge of*NIX system s,shellscripting 1.90
V irtualization,containerization,etc. 1.95
42. www.iadss.org
An alternative must-have analysis: Automatically extracted prototypical skill-
sets from professionalsâ responses
Identified Prototypes Keywords/Highlights
Database Enterprise/Open-Source RDBMS, SQL, NoSQL, Integration and ETL Tools
Basic Analysis & Data Prep. Data Cleaning/Summarization, Generating and Interpreting Basic Visual/Statistical Descriptions from Data
Cloud & Big Data
AWS/Azure/Google Cloud, Hadoop/Spark, Flume/Storm/Flink, HBase/Cassandra, Hive/Parquet,
Docker/Kubernetes, *NIX, General Purpose Programming Languages
Statistical Analysis Statistics, Data Transformation, Statistical Programming Languages, Predictive Modeling
Leadership & Management Executive/Peer Leadership, Project Management, Task Management
Reporting & Cross-team Work
Insight Generation/Presentation, Cross-functional Collaboration, Office Productivity Tools, Reporting &
Visualization Tools
Programming (General) CS Fundamentals, Software Engineering Skills, DB & DWH Development
Subject-Matter Expertise Understanding and Experience of Applying Analytics in Spec. Domain/Industry, Knowledge of KPIs
ML (Engineering) Scripting Languages for Data Science, ML Libraries, Jupyter/Zeppelin Environments
ML (Theory)
Neural Networks/Deep Learning, Predictive Modeling, Optimization, NLP/Time Series/Image & Audio &
Signal Processing/Bioinformatics, Research Background
* Top skills of the prototypical skill-sets extracted with Latent Dirichlet Allocation on self-declared âmust-haveâ skills
44. www.iadss.org
A deeper look into variance: 3 types of Data Scientists
The composition of the skill-sets varies greatly even
across the respondents holding the same job title
Estimated 41% of Data Scientists (Mostly
Stats & ML Engineering)
Estimated 37% of Data Scientists (Mostly
Stats & ML Engineering, lower expertise)
Estimated
22% of Data
Scientists
DS-1
DS-2
DS-3
45. www.iadss.org
IADSS Blog â Research Updates & Stories & Analysis & Career Development
Check our Blog &
Be a Guest Writer or Subscribe
IADSS.org/blog
46. www.iadss.org
Please follow and engage with the initiative through our online channels
(LinkedIn, Twitter, YouTube, and Website)
iadss.org
bit.ly/IADSStwitter
bit.ly/IADSSlinkedin
bit.ly/IADSSyoutube
Twitter.com/IADSSglobal YouTube
IADSS
Take the survey
bit.ly/IADSSsurvey