SlideShare ist ein Scribd-Unternehmen logo
1 von 28
A Hitchhiker’s Guide to
Data Science
sudeep das
Sudeep Das
Senior Machine Learning Researcher
@datamusing
My Journey
Ph. D. Astrophysics
Cosmic Microwave Background
Gravitational Lensing
Beats Music
Core Recommendation Systems Group
What do I do?
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
The Grand Innovation Workflow
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
yet in some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
At Netflix, this is broadly what I do
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Tools of the trade
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Matplotlib, Tableau, Vega, Plotly, custom javascript (d3)
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Hive, s3, APIs in Flask/Django/Java
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metricsPython, SciKit-learn, Jupyter notebooks,
TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ...
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Docker, company specific platforms
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Java, Scala, in some cases Python, company specific
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Types of Problems
● Personalization
● Search
● Object recognition
● Voice/speech recognition
● Pattern recognition
● Natural Language
Processing
● Trend prediction
● Segmentation/clustering
● Dynamic Pricing
● Optimization
● Outlier Detection
At Netflix, we do a bit of everything
Emergent Trends
Probabilistic Graphical Models -
Bayes Nets
Deep Learning
Causal
Inference
(Deep)
Reinforcement
Learning
What academia prepares you for
● Perseverance
● Ability to pick up new technical skills
● Presentation skills
● Some quantitative visualization skills
● Ability to distil technical research in related areas and adapt it to the problem at hand
● If you are from a quantitative and experimental field:
○ Mathematical abilities
○ Knowledge of Basic Statistics - error analysis, experiment design
○ Some parameter estimation, bayesian inference exposure
○ Some ability to write code
○ Some exposure to general machine learning
● Learning from failure: Most A/B tests fail - so do experiments in academia
● Writing papers/ technical blogs etc.
What academia doesn’t prepare you for
● Being a good listener
● Asking questions
● Understanding and articulating the business value of your technical pursuit
● Writing clean, maintainable code with documentation and unit tests
● Ability to collaborate across teams and cultures - cross-functionally
● Admitting that “Good enough” is better than perfect
● Coping with quick project timelines
● Documenting, sharing, getting early input on projects
● Dealing with live, large, and exceptionally dirty datasets.
● Understanding that research in Industry is results driven and not publication driven.
● Stepping out of your focus area and seeing your problem in the bigger context of where your
company is headed.
Marketing Yourself
Fill in your
basic skills
gaps
Databases, SQL,
Spark familiarity
Data Structures
Algo/CS 101
Get really strong
in one language -
highly
recommend
Python - pandas,
scikit ecosystem
Good coding
practices -
documentation,
modular code,
unit tests
Amp up
your ML
Knowledge
Create an
Online
Presence
Improve soft
skills
Interview
Prep
Your friends:
Online courses
and open
datasets!
Do mini projects
on ML, esp. Deep
Learning,
Reinforcement
Learning. Get
creative!
Get a rock solid
foundation in
basic stats.
Kaggle
Competitions
Github repo so
recruiters can look
at your code.
Put your hobby
projects online
Write a blog post
on something new
you learned
Follow/contribute
to Stackoverflow
Landing the First Job!
Identify
weakness in
communication
skills and work
on them.
Pick up speaking
engagements at
meetups, at your
university, and
conferences such
as PyData
Do collaborative
projects with
people who are
also transitioning
Practise whiteboarding,
collaborative coding on
CoderPad
Standard books like
Cracking the Coding
Interview, Glassdoor
Go for some “dry run”
interviews.
Do background research
on the company - be
inquisitive, ask
questions
Keep at it!
@datamusing

Weitere Àhnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in Recommendations
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
 
Session-Based Recommender Systems
Session-Based Recommender SystemsSession-Based Recommender Systems
Session-Based Recommender Systems
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemRecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 

Ähnlich wie Academia to Data Science - A Hitchhiker's Guide

OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databases
elliando dias
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
c.titus.brown
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 

Ähnlich wie Academia to Data Science - A Hitchhiker's Guide (20)

Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Datascience and Azure(v1.0)
Datascience and Azure(v1.0)
 
Data Science on Azure
Data Science on Azure Data Science on Azure
Data Science on Azure
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databases
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
 
AI-SDV 2020: Kairntech
AI-SDV 2020: KairntechAI-SDV 2020: Kairntech
AI-SDV 2020: Kairntech
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
3685807
36858073685807
3685807
 

KĂŒrzlich hochgeladen

âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men 🔝Mirzapur🔝 Escor...
âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men  🔝Mirzapur🔝   Escor...âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men  🔝Mirzapur🔝   Escor...
âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men 🔝Mirzapur🔝 Escor...
amitlee9823
 
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
ZurliaSoop
 
âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...
amitlee9823
 
Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
amitlee9823
 
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
only4webmaster01
 
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
amitlee9823
 
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
amitlee9823
 
Call Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 
Call Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 
Call Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night Stand
amitlee9823
 

KĂŒrzlich hochgeladen (20)

âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men 🔝Mirzapur🔝 Escor...
âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men  🔝Mirzapur🔝   Escor...âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men  🔝Mirzapur🔝   Escor...
âž„đŸ” 7737669865 đŸ”â–» Mirzapur Call-girls in Women Seeking Men 🔝Mirzapur🔝 Escor...
 
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
Jual obat aborsi Jakarta ( 085657271886 )Cytote pil telat bulan penggugur kan...
 
âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...
âž„đŸ” 7737669865 đŸ”â–» Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...
 
Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hosur Road Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Chikkabanavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Sa...
 
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
 
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negron
 
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
 
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...
 
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
 
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdfMiletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
 
Call Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Kengeri Satellite Town ☎ 7737669865 đŸ„” Book Your One night Stand
 
Call Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Madiwala ☎ 7737669865 đŸ„” Book Your One night Stand
 
Call Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Devanahalli ☎ 7737669865 đŸ„” Book Your One night Stand
 
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
 
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
 
Dubai Call Girls Kiki O525547819 Call Girls Dubai Koko
Dubai Call Girls Kiki O525547819 Call Girls Dubai KokoDubai Call Girls Kiki O525547819 Call Girls Dubai Koko
Dubai Call Girls Kiki O525547819 Call Girls Dubai Koko
 
Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.Brand Analysis for reggaeton artist Jahzel.
Brand Analysis for reggaeton artist Jahzel.
 

Academia to Data Science - A Hitchhiker's Guide

  • 1. A Hitchhiker’s Guide to Data Science sudeep das Sudeep Das Senior Machine Learning Researcher @datamusing
  • 3. Ph. D. Astrophysics Cosmic Microwave Background Gravitational Lensing
  • 5. What do I do?
  • 6. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics The Grand Innovation Workflow Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 7. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 8. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 9. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics yet in some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 10. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics At Netflix, this is broadly what I do Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 11. Tools of the trade
  • 12. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3 Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 13. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Matplotlib, Tableau, Vega, Plotly, custom javascript (d3) Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 14. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Hive, s3, APIs in Flask/Django/Java Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 15. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metricsPython, SciKit-learn, Jupyter notebooks, TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ... Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 16. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Docker, company specific platforms Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 17. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Java, Scala, in some cases Python, company specific Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 19. ● Personalization ● Search ● Object recognition ● Voice/speech recognition ● Pattern recognition ● Natural Language Processing ● Trend prediction ● Segmentation/clustering ● Dynamic Pricing ● Optimization ● Outlier Detection At Netflix, we do a bit of everything
  • 21. Probabilistic Graphical Models - Bayes Nets Deep Learning Causal Inference (Deep) Reinforcement Learning
  • 23. ● Perseverance ● Ability to pick up new technical skills ● Presentation skills ● Some quantitative visualization skills ● Ability to distil technical research in related areas and adapt it to the problem at hand ● If you are from a quantitative and experimental field: ○ Mathematical abilities ○ Knowledge of Basic Statistics - error analysis, experiment design ○ Some parameter estimation, bayesian inference exposure ○ Some ability to write code ○ Some exposure to general machine learning ● Learning from failure: Most A/B tests fail - so do experiments in academia ● Writing papers/ technical blogs etc.
  • 24. What academia doesn’t prepare you for
  • 25. ● Being a good listener ● Asking questions ● Understanding and articulating the business value of your technical pursuit ● Writing clean, maintainable code with documentation and unit tests ● Ability to collaborate across teams and cultures - cross-functionally ● Admitting that “Good enough” is better than perfect ● Coping with quick project timelines ● Documenting, sharing, getting early input on projects ● Dealing with live, large, and exceptionally dirty datasets. ● Understanding that research in Industry is results driven and not publication driven. ● Stepping out of your focus area and seeing your problem in the bigger context of where your company is headed.
  • 27. Fill in your basic skills gaps Databases, SQL, Spark familiarity Data Structures Algo/CS 101 Get really strong in one language - highly recommend Python - pandas, scikit ecosystem Good coding practices - documentation, modular code, unit tests Amp up your ML Knowledge Create an Online Presence Improve soft skills Interview Prep Your friends: Online courses and open datasets! Do mini projects on ML, esp. Deep Learning, Reinforcement Learning. Get creative! Get a rock solid foundation in basic stats. Kaggle Competitions Github repo so recruiters can look at your code. Put your hobby projects online Write a blog post on something new you learned Follow/contribute to Stackoverflow Landing the First Job! Identify weakness in communication skills and work on them. Pick up speaking engagements at meetups, at your university, and conferences such as PyData Do collaborative projects with people who are also transitioning Practise whiteboarding, collaborative coding on CoderPad Standard books like Cracking the Coding Interview, Glassdoor Go for some “dry run” interviews. Do background research on the company - be inquisitive, ask questions Keep at it!