SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Gender, Education, Skills,
and Compensation in US
Data Science
COLLEEN M. FARRELLY
Kaggle 2017 US Data Scientist Sample
4197 data scientists included who identified the US as their country.
Factors examined focuses on gender, machine learning knowledge, education, and job title.
Visual explorations and a series of machine learning models were run to explore how these factors
impact compensation levels.
1544 provided compensation data to compare salaries by different demographic factors, and
this subsample was examined through machine learning models.
Demographics of US Data Scientists
US data scientists tend to be male and in the field for less than 5
years, though some have been in the field for more than 10 years.
Very few data scientists identify as LGTBQ in the US, despite
increasing levels of openness about this identity.
Education of US Data Scientists
Most data scientists in the US (67% have advanced education.
Common majors include math/stat, engineering, and computer science, though other
sciences are well-represented.
Many data scientists come from well-educated families, where parents have obtained
at least a Bachelor’s degree; 45% come from families with a Master’s degree or higher.
Importance of Different Factors in Job
Considerations
Diversity is not as important
a consideration as language
used, salary offered, impact
potential, and job industry.
Allocation of Time on Data Science
Projects
A lot of time is spent
on gathering data,
and this is a potential
bottleneck in data
science projects.
Education and Machine Learning
Knowledge
Those who are able to innovate new algorithms place the highest relative value on education; they comprise
12% of the US data scientist population.
Those know how to run code or tune parameters place the lowest relative value on education and comprise
19% of data scientists.
About 40% can explain it to someone without technical knowledge, a crucial skill in data science positions.
Skill Disparity between Male and Female
Data Scientists
Males are more likely to be able to innovate than females (13% vs. 9%). They
are also more likely to make the code faster/code from scratch (31% vs. 23%).
Females are more likely to only have enough knowledge to tune parameters or
run a library (25% vs. 17%).
Titles and Skills
Data scientist is the most common title (38%), but account for
only 29% of those who can innovate.
Researchers make up only 19% of titles but a whopping 40% of
those who can innovate.
Analysts make up 17% of titles but only 3% of those who can
innovate algorithms and only 9% of those who can explain the
algorithms to someone non-technical.
Education and Skills
Many more doctoral-
level data scientists are
able to innovate (24%)
than bachelor-level
(6%) or master-level
data scientists (9%).
Bachelor-level data
scientists are more
likely to only know how
to run a library (16%)
than master-level (9%)
or doctoral-level (5%)
data scientists.
Compensation by Skill: Innovation Pays
Compensation by Education and Gender
Finishing
college is
essential. A
professional
or doctoral
degree is
worth the
time and
effort, as
well.
Gender Compensation Disparities and
Compensation by Fields of Study
Females earn quite a bit less compensation than males and LGTBQ individuals.
Engineering provides the most compensation, while humanities provides the least.
IT folks tend to earn less than those in fields of
math/physics/engineering/computer science.
Predictive Modeling of Compensation
Analyses performed on 1522 data scientists providing
compensation information along with all predictors; 22
individuals were missing predictor information.
Several models were run to predict compensation using a
Tweedie distribution: random forest, conditional inference
trees, LASSO, extreme learning machines, evolved trees,
and MARS.
All models yielded similar performance (~3-10% of variance
accounted for).
Age, tenure, and industry were the largest predictors of
compensation.
Major, gender, education, and algorithm understanding
level do play a minor role in compensation, though.
Conclusions
Skills vary widely according to education, gender, and role.
Different skills are associated with different pay, as well as different values of education as a
path to data science.
Tenure, age, and industry play a large role in compensation, but these factors are difficult to
change for data scientists entering the field and studying at university.
Addressing the educational and gender disparities in skill level may be a way to even the
playing field through equipping new data scientists with the most valuable skills and knowledge
levels sought in the field.

Weitere ähnliche Inhalte

Was ist angesagt?

how to develop students to perform internation assessment
how to develop students to perform internation assessment how to develop students to perform internation assessment
how to develop students to perform internation assessment SamerYaqoob
 
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female PopesDebiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popeskjanowicz
 
Reliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical schemeReliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical schemeRone Ryan Desierto
 
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...Donna Madison-Bell
 
A Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing SystemsA Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing Systemssubarna89
 
Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)Selene Camargo Correa
 
Theoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the ClassroomTheoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the ClassroomPeople's Trust Insurance Company
 

Was ist angesagt? (9)

how to develop students to perform internation assessment
how to develop students to perform internation assessment how to develop students to perform internation assessment
how to develop students to perform internation assessment
 
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female PopesDebiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
 
Reliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical schemeReliability acknowledgement statistical tools and analytical scheme
Reliability acknowledgement statistical tools and analytical scheme
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Nick
NickNick
Nick
 
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
A Qualitative Phenomenological Study on Prison Volunteers in California’s Cor...
 
A Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing SystemsA Survey of ‘Bias’ in Natural Language Processing Systems
A Survey of ‘Bias’ in Natural Language Processing Systems
 
Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)Ph.D. Presentation at the University of Barcelona (January 28, 2016)
Ph.D. Presentation at the University of Barcelona (January 28, 2016)
 
Theoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the ClassroomTheoretical Frameworks to Deter Academic Misconduct in the Classroom
Theoretical Frameworks to Deter Academic Misconduct in the Classroom
 

Andere mochten auch

The Neurobiology of Addiction
The Neurobiology of AddictionThe Neurobiology of Addiction
The Neurobiology of AddictionColleen Farrelly
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsColleen Farrelly
 
Trauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceTrauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceColleen Farrelly
 

Andere mochten auch (6)

The Neurobiology of Addiction
The Neurobiology of AddictionThe Neurobiology of Addiction
The Neurobiology of Addiction
 
Neuropsychopharmacology
NeuropsychopharmacologyNeuropsychopharmacology
Neuropsychopharmacology
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Profiles of the Gifted
Profiles of the GiftedProfiles of the Gifted
Profiles of the Gifted
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
Trauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceTrauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and Resilience
 

Ähnlich wie Gender, Education, Skills, and Compensation in US Data Scientists

Fisher Lit Review May 17
Fisher Lit Review May 17Fisher Lit Review May 17
Fisher Lit Review May 17Kathleen Fisher
 
Women who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWomen who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWBDC of Florida
 
Women in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next GenerationWomen in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next GenerationDerick Campbell
 
Schneider milla ict_skills_final
Schneider milla ict_skills_finalSchneider milla ict_skills_final
Schneider milla ict_skills_finalMillaSchneider
 
DIversity Gaps in Computer Science
DIversity Gaps in Computer ScienceDIversity Gaps in Computer Science
DIversity Gaps in Computer ScienceWBDC of Florida
 
Computer science advocacy
Computer science advocacyComputer science advocacy
Computer science advocacyDonghua Gu
 
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?Luis Taveras EMBA, MS
 
Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010Meagan Pollock
 
Equality and Technology_Gregory
Equality and Technology_GregoryEquality and Technology_Gregory
Equality and Technology_Gregorykarengregory2000
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptxonlineinfo4
 
Anaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdfAnaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdfkaasraa
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptxMohitMishra91878
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnPraj H
 

Ähnlich wie Gender, Education, Skills, and Compensation in US Data Scientists (20)

Stem careers
Stem careersStem careers
Stem careers
 
Fisher Lit Review May 17
Fisher Lit Review May 17Fisher Lit Review May 17
Fisher Lit Review May 17
 
Women who choose Computer Science - what really matters
Women who choose Computer Science - what really mattersWomen who choose Computer Science - what really matters
Women who choose Computer Science - what really matters
 
Women in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next GenerationWomen in Technology - Inspiring the Next Generation
Women in Technology - Inspiring the Next Generation
 
Schneider milla ict_skills_final
Schneider milla ict_skills_finalSchneider milla ict_skills_final
Schneider milla ict_skills_final
 
DIversity Gaps in Computer Science
DIversity Gaps in Computer ScienceDIversity Gaps in Computer Science
DIversity Gaps in Computer Science
 
Computer science advocacy
Computer science advocacyComputer science advocacy
Computer science advocacy
 
What is Engineering?
What is Engineering?What is Engineering?
What is Engineering?
 
STEM@theTech-Preso
STEM@theTech-PresoSTEM@theTech-Preso
STEM@theTech-Preso
 
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
Are Schools Getting a Big Enough Bang for Their Education Technology Buck?
 
Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010Closing the Gender Gap in Engineering - Nov 2010
Closing the Gender Gap in Engineering - Nov 2010
 
Equality and Technology_Gregory
Equality and Technology_GregoryEquality and Technology_Gregory
Equality and Technology_Gregory
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptx
 
Anaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdfAnaconda-2021-SODS-Report-Final.pdf
Anaconda-2021-SODS-Report-Final.pdf
 
computer_science_advocacy.ppt
computer_science_advocacy.pptcomputer_science_advocacy.ppt
computer_science_advocacy.ppt
 
computer_science_advocacy.pptx
computer_science_advocacy.pptxcomputer_science_advocacy.pptx
computer_science_advocacy.pptx
 
PowerBeach.ppt
PowerBeach.pptPowerBeach.ppt
PowerBeach.ppt
 
computrS.ppt
computrS.pptcomputrS.ppt
computrS.ppt
 
Post Digital Divide
Post Digital DividePost Digital Divide
Post Digital Divide
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
 

Mehr von Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 

Mehr von Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Kürzlich hochgeladen

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 

Kürzlich hochgeladen (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 

Gender, Education, Skills, and Compensation in US Data Scientists

  • 1. Gender, Education, Skills, and Compensation in US Data Science COLLEEN M. FARRELLY
  • 2. Kaggle 2017 US Data Scientist Sample 4197 data scientists included who identified the US as their country. Factors examined focuses on gender, machine learning knowledge, education, and job title. Visual explorations and a series of machine learning models were run to explore how these factors impact compensation levels. 1544 provided compensation data to compare salaries by different demographic factors, and this subsample was examined through machine learning models.
  • 3. Demographics of US Data Scientists US data scientists tend to be male and in the field for less than 5 years, though some have been in the field for more than 10 years. Very few data scientists identify as LGTBQ in the US, despite increasing levels of openness about this identity.
  • 4. Education of US Data Scientists Most data scientists in the US (67% have advanced education. Common majors include math/stat, engineering, and computer science, though other sciences are well-represented. Many data scientists come from well-educated families, where parents have obtained at least a Bachelor’s degree; 45% come from families with a Master’s degree or higher.
  • 5. Importance of Different Factors in Job Considerations Diversity is not as important a consideration as language used, salary offered, impact potential, and job industry.
  • 6. Allocation of Time on Data Science Projects A lot of time is spent on gathering data, and this is a potential bottleneck in data science projects.
  • 7. Education and Machine Learning Knowledge Those who are able to innovate new algorithms place the highest relative value on education; they comprise 12% of the US data scientist population. Those know how to run code or tune parameters place the lowest relative value on education and comprise 19% of data scientists. About 40% can explain it to someone without technical knowledge, a crucial skill in data science positions.
  • 8. Skill Disparity between Male and Female Data Scientists Males are more likely to be able to innovate than females (13% vs. 9%). They are also more likely to make the code faster/code from scratch (31% vs. 23%). Females are more likely to only have enough knowledge to tune parameters or run a library (25% vs. 17%).
  • 9. Titles and Skills Data scientist is the most common title (38%), but account for only 29% of those who can innovate. Researchers make up only 19% of titles but a whopping 40% of those who can innovate. Analysts make up 17% of titles but only 3% of those who can innovate algorithms and only 9% of those who can explain the algorithms to someone non-technical.
  • 10. Education and Skills Many more doctoral- level data scientists are able to innovate (24%) than bachelor-level (6%) or master-level data scientists (9%). Bachelor-level data scientists are more likely to only know how to run a library (16%) than master-level (9%) or doctoral-level (5%) data scientists.
  • 11. Compensation by Skill: Innovation Pays
  • 12. Compensation by Education and Gender Finishing college is essential. A professional or doctoral degree is worth the time and effort, as well.
  • 13. Gender Compensation Disparities and Compensation by Fields of Study Females earn quite a bit less compensation than males and LGTBQ individuals. Engineering provides the most compensation, while humanities provides the least. IT folks tend to earn less than those in fields of math/physics/engineering/computer science.
  • 14. Predictive Modeling of Compensation Analyses performed on 1522 data scientists providing compensation information along with all predictors; 22 individuals were missing predictor information. Several models were run to predict compensation using a Tweedie distribution: random forest, conditional inference trees, LASSO, extreme learning machines, evolved trees, and MARS. All models yielded similar performance (~3-10% of variance accounted for). Age, tenure, and industry were the largest predictors of compensation. Major, gender, education, and algorithm understanding level do play a minor role in compensation, though.
  • 15. Conclusions Skills vary widely according to education, gender, and role. Different skills are associated with different pay, as well as different values of education as a path to data science. Tenure, age, and industry play a large role in compensation, but these factors are difficult to change for data scientists entering the field and studying at university. Addressing the educational and gender disparities in skill level may be a way to even the playing field through equipping new data scientists with the most valuable skills and knowledge levels sought in the field.