SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
THE OPEN SOURCE DATA SCIENCE MASTERS 
(THE DIY DATA SCIENTIST) 
Clare Corthell 
Data Scientist at Mattermark 
@clarecorthell 
www.datasciencemasters.org
Deal Intelligence Platform 
interface to live data about private companies
TODAY 
• What a Data Scientist does 
• Paths to becoming a Data Scientist 
• Where to start 
• Navigating a path 
• Why you should run toward hard things
WHAT DOES A DATA SCIENTIST DO? 
Data Scientists turn data into knowledge 
by answering the right questions 
Which is also predicated on asking 
the right questions
HOW DO I BECOME A DATA SCIENTIST? 
the answer you don’t want… 
There’s no paved road, no one way
PATHS 
1. Get a Classic Masters from an accredited University 
<Warning> I have yet to see one that’s better than the OSDSM 
2. Attend a Bootcamp or Academy 
• Zipfian Academy (SF) 
• Insight Data Science Fellows (Palo Alto, NYC) 
• Data Science Retreat (Berlin) 
3. Self-Taught 
• The Open Source Data Science Masters
THEORY & APPLICATION 
or, why universities haven’t figured this out yet 
Universities don’t focus on “Data Science” because it’s tightly 
bound to application. 
Universities develop theory. 
Businesses develop applications. 
The two exist symbiotically - they do need each other. 
The goals are simply very different.
• Math 
• Computing 
• Algorithms 
• Distributed Computing 
• Databases 
• Data Mining 
• Machine Learning 
• Graph Theory 
• Natural Language Processing 
• Analysis 
• Visualization 
• Python (language & libraries) 
The 
Open Source 
Data Science 
Masters 
bit.ly/dsmasters 
The internet 
helps me curate - 
hence Open Source
(that’s alot)
CLARE’S PATH 
Previously Product Designer, front end dev 
Transcript bit.ly/corthelldata 
6 months of study 
Data Scientist & 
Machine Learning Developer 
at Mattermark 
My team builds domain-specific systems 
for classification, recommendation, prediction, 
crawling, fact extraction, and more 
languages 
Python 
SQL 
machine learning 
Scikit Learn 
data manipulation 
Pandas 
Numpy 
matplotlib 
NLTK 
design 
html/css/js
1. Get a goal 
2. Get a plan 
3. Get mentorship 
4. Get a project
1. Get a goal 
What kind of “Data Scientist” do you want to be? 
Explore the different roles 
Pick something that sparks your interest 
Find out what those people do on a daily basis
Rachel Schutt, Doing Data Science
Analyzing the Analyzers, O’Reilly
2. Get a plan 
Figure out what skills you need to be minimally effective 
Design a Curriculum (fork the OSDSM!) 
Plan a schedule of study
Dave Holtz 
Airbnb
3. Get mentorship 
Talk to people on twitter 
Ask to buy them coffee 
(with a specific need or question in hand) 
Get informational interviews 
(a lost art; they can turn into real interviews, but are low-pressure)
4. Get a question 
(make it a small question - don’t set yourself up for failure) 
Project Use real-world data to answer a question 
Who do iguana owners connect to on twitter? 
Work on a real business problem 
Help a non-profit* with data they don’t understand 
What channels of marketing are working for us? 
*Orgs that coordinate working with NGOs: Bayes Impact, DataKind
Let’s talk about where this perfect plan 
gets really incredibly difficult 
(Let’s start with a tautology)
HARD THINGS ARE HARD 
Hard things are hard because there are no easy answers or recipes. 
They are hard because your emotions are at odds with your logic. 
They are hard because you don’t know the answer and you cannot 
ask for help without showing weakness. 
Ben Horowitz 
The Hard Thing about Hard Things
When something scares you 
run like hell right into it. 
The hardest things are things people avoid the most. 
That’s your marginal advantage. 
Maybe that’s why there aren’t enough Data Scientists. 
You will figure it out. 
It’s about ego management and problem solving.
RUN TOWARD HARD THINGS 
Choosing what you want to do 
and what to work on 
Not knowing everything 
Being overwhelmed 
Time Management 
Math 
Coding
Not knowing everything 
Being overwhelmed 
There are a million things you could learn and work on. 
That’s overwhelming. But you can’t afford to get overwhelmed. 
You won’t know everything. 
It’s impractical and impossible to know everything. 
Learn to say “I don’t know.” 
FYI Programmers don’t read books. 
They reference them as needed.
Time Management 
How do I do all of this in a reasonable amount of time? 
- You don’t. 
- Be rigorous. 
Ask yourself: 
Will this directly help me achieve my goal? 
Refine your goals, focus your work. 
Don’t switch tasks. 
Focus on one thing at a time.
Why is time management so hard? 
We’re used to other people telling us what to do; 
Teachers 
Managers 
Parents
CODING IS HARD.
a hint for those new to programming 
google 
stackoverflow + problem
why code?
HUMANS SHOULD BE HUMANS 
AND 
COMPUTERS SHOULD BE COMPUTERS. 
You must code. 
Because automation. 
And no, there is no shortcut.
YOUR ADVANTAGE 
Self-study in Data Science is hard. 
But what you spend in energy and commitment 
to self-teaching is returned to you in: 
• Choice of professional focus 
• Respect from potential employers for managing yourself. You 
want to work with people who will respect and recognize that. 
• Skills that are tough to get from a university or employer 
• A path with no gatekeepers - no one will stop you.
Take the first step.
1. Learn to code in Python. 
2. Take Intro to Data Science (UW) 
3. Go get a coffee 
4. Ask one question
i ♥ questions 
datasciencemasters.org 
clare@mattermark.com 
@clarecorthell

Weitere ähnliche Inhalte

Was ist angesagt?

Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school studentsMelanie Manning, CFA
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data ScienceArc & Codementor
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data ScienceSean Taylor
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist Manjunath Sindagi
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
BioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryBioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryFernanda Foertter
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
How to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistHow to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistInside Analysis
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningDomino Data Lab
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introductionDinesh K
 
Life of a data scientist (pub)
Life of a data scientist (pub)Life of a data scientist (pub)
Life of a data scientist (pub)Buhwan Jeong
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data ScientistAlexey Grigorev
 
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioData Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioDavid Coallier
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slidesCloudxLab
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine LearningCorey Chivers
 
Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdfZixunZhou
 

Was ist angesagt? (20)

Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
BioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryBioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discovery
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
How to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistHow to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data Scientist
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine Learning
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
 
Life of a data scientist (pub)
Life of a data scientist (pub)Life of a data scientist (pub)
Life of a data scientist (pub)
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
 
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioData Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slides
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
 
Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdf
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 

Andere mochten auch

Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Edureka!
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Edureka!
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceEdureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analyticsEdureka!
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big DataEdureka!
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenEdureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Top 5 algorithms used in Data Science
Top 5 algorithms used in Data ScienceTop 5 algorithms used in Data Science
Top 5 algorithms used in Data ScienceEdureka!
 
Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureEdureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Edureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 

Andere mochten auch (12)

Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Top 5 algorithms used in Data Science
Top 5 algorithms used in Data ScienceTop 5 algorithms used in Data Science
Top 5 algorithms used in Data Science
 
Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cure
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 

Ähnlich wie The Open Source Data Science Masters: How to Become a Self-Taught Data Scientist

The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)Lakshmi Prasanna
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewAnidata
 
Building a Data Science Portfolio that Rocks
Building a Data Science Portfolio that RocksBuilding a Data Science Portfolio that Rocks
Building a Data Science Portfolio that RocksMichael Galarnyk
 
The top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri AlloucheThe top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri AlloucheOmri Allouche
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Googleprateek kumar
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopCosmoAIMS Bassett
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational DataLars von Sneidern
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data scienceBrian Spiering
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsJose Quesada
 
Cheif product developer scientist
Cheif product developer scientistCheif product developer scientist
Cheif product developer scientistTwikki.Com
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Watershed
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data ScienceMandar Parikh
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype TestingDave Hora
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career InternalsBrent Ozar
 
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...Jack Pringle
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 

Ähnlich wie The Open Source Data Science Masters: How to Become a Self-Taught Data Scientist (20)

The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
 
Building a Data Science Portfolio that Rocks
Building a Data Science Portfolio that RocksBuilding a Data Science Portfolio that Rocks
Building a Data Science Portfolio that Rocks
 
The top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri AlloucheThe top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri Allouche
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Google
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational Data
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
 
PPT
PPTPPT
PPT
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientists
 
Cheif product developer scientist
Cheif product developer scientistCheif product developer scientist
Cheif product developer scientist
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype Testing
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career Internals
 
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 

Kürzlich hochgeladen

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Kürzlich hochgeladen (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

The Open Source Data Science Masters: How to Become a Self-Taught Data Scientist

  • 1. THE OPEN SOURCE DATA SCIENCE MASTERS (THE DIY DATA SCIENTIST) Clare Corthell Data Scientist at Mattermark @clarecorthell www.datasciencemasters.org
  • 2. Deal Intelligence Platform interface to live data about private companies
  • 3. TODAY • What a Data Scientist does • Paths to becoming a Data Scientist • Where to start • Navigating a path • Why you should run toward hard things
  • 4. WHAT DOES A DATA SCIENTIST DO? Data Scientists turn data into knowledge by answering the right questions Which is also predicated on asking the right questions
  • 5. HOW DO I BECOME A DATA SCIENTIST? the answer you don’t want… There’s no paved road, no one way
  • 6. PATHS 1. Get a Classic Masters from an accredited University <Warning> I have yet to see one that’s better than the OSDSM 2. Attend a Bootcamp or Academy • Zipfian Academy (SF) • Insight Data Science Fellows (Palo Alto, NYC) • Data Science Retreat (Berlin) 3. Self-Taught • The Open Source Data Science Masters
  • 7. THEORY & APPLICATION or, why universities haven’t figured this out yet Universities don’t focus on “Data Science” because it’s tightly bound to application. Universities develop theory. Businesses develop applications. The two exist symbiotically - they do need each other. The goals are simply very different.
  • 8. • Math • Computing • Algorithms • Distributed Computing • Databases • Data Mining • Machine Learning • Graph Theory • Natural Language Processing • Analysis • Visualization • Python (language & libraries) The Open Source Data Science Masters bit.ly/dsmasters The internet helps me curate - hence Open Source
  • 10. CLARE’S PATH Previously Product Designer, front end dev Transcript bit.ly/corthelldata 6 months of study Data Scientist & Machine Learning Developer at Mattermark My team builds domain-specific systems for classification, recommendation, prediction, crawling, fact extraction, and more languages Python SQL machine learning Scikit Learn data manipulation Pandas Numpy matplotlib NLTK design html/css/js
  • 11. 1. Get a goal 2. Get a plan 3. Get mentorship 4. Get a project
  • 12. 1. Get a goal What kind of “Data Scientist” do you want to be? Explore the different roles Pick something that sparks your interest Find out what those people do on a daily basis
  • 13. Rachel Schutt, Doing Data Science
  • 15. 2. Get a plan Figure out what skills you need to be minimally effective Design a Curriculum (fork the OSDSM!) Plan a schedule of study
  • 17. 3. Get mentorship Talk to people on twitter Ask to buy them coffee (with a specific need or question in hand) Get informational interviews (a lost art; they can turn into real interviews, but are low-pressure)
  • 18. 4. Get a question (make it a small question - don’t set yourself up for failure) Project Use real-world data to answer a question Who do iguana owners connect to on twitter? Work on a real business problem Help a non-profit* with data they don’t understand What channels of marketing are working for us? *Orgs that coordinate working with NGOs: Bayes Impact, DataKind
  • 19. Let’s talk about where this perfect plan gets really incredibly difficult (Let’s start with a tautology)
  • 20. HARD THINGS ARE HARD Hard things are hard because there are no easy answers or recipes. They are hard because your emotions are at odds with your logic. They are hard because you don’t know the answer and you cannot ask for help without showing weakness. Ben Horowitz The Hard Thing about Hard Things
  • 21. When something scares you run like hell right into it. The hardest things are things people avoid the most. That’s your marginal advantage. Maybe that’s why there aren’t enough Data Scientists. You will figure it out. It’s about ego management and problem solving.
  • 22. RUN TOWARD HARD THINGS Choosing what you want to do and what to work on Not knowing everything Being overwhelmed Time Management Math Coding
  • 23. Not knowing everything Being overwhelmed There are a million things you could learn and work on. That’s overwhelming. But you can’t afford to get overwhelmed. You won’t know everything. It’s impractical and impossible to know everything. Learn to say “I don’t know.” FYI Programmers don’t read books. They reference them as needed.
  • 24. Time Management How do I do all of this in a reasonable amount of time? - You don’t. - Be rigorous. Ask yourself: Will this directly help me achieve my goal? Refine your goals, focus your work. Don’t switch tasks. Focus on one thing at a time.
  • 25. Why is time management so hard? We’re used to other people telling us what to do; Teachers Managers Parents
  • 27. a hint for those new to programming google stackoverflow + problem
  • 29. HUMANS SHOULD BE HUMANS AND COMPUTERS SHOULD BE COMPUTERS. You must code. Because automation. And no, there is no shortcut.
  • 30. YOUR ADVANTAGE Self-study in Data Science is hard. But what you spend in energy and commitment to self-teaching is returned to you in: • Choice of professional focus • Respect from potential employers for managing yourself. You want to work with people who will respect and recognize that. • Skills that are tough to get from a university or employer • A path with no gatekeepers - no one will stop you.
  • 31. Take the first step.
  • 32. 1. Learn to code in Python. 2. Take Intro to Data Science (UW) 3. Go get a coffee 4. Ask one question
  • 33. i ♥ questions datasciencemasters.org clare@mattermark.com @clarecorthell