(video summary at: https://conversionhotel.com/session/keynote-2019-data-science-demystified/)
In the conversion optimization industry most data analytics people are used to work with analytics software and generate insights for and results of optimization and validation efforts. A large group of people at #CH2019 have this job. They have been told that the future of analytics is in Data Science. But when is it exactly data science what they do? Do you need to be a coder to call yourself a data scientist, do you need an official degree or is having some technical implementation skills enough?
I recently bumped into Emily Robinson at a CRO conference in the US and liked her stage appearance. I learned that she is writing the book “Build A Career in Data Science”, with Jacqueline Nolis, to be published by in early 2020. She currently works at DataCamp as a Data Scientist on the growth team, where she built their experimentation analytics system. Previously, she was a Data Scientist at Etsy working with their search team to design, implement, and analyze experiments on the ranking algorithm, UI changes, and new features.
She regularly give talks on A/B testing, R programming, and data science career advice at conferences and meetups. I thought Emily would be the perfect person to demystify Data Science for us and explain how to make the first steps to build a career in that.
Enjoy her talk,
Ton Wesseling
Founder & host of The Conference formerly known as Conversion Hotel
15. Benefit of programming
Accessibility
Web APIs
SQL Databases
Historical Data
httr
DBI
dbply
r
Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
16. Code away repetitive
tasks
Code around obstacles
Limit Human Error
Benefit of programming
Efficiency
Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
18. Code away repetitive
tasks
Code around obstacles
Limit Human Error
Benefits of programming
Accessibility
Efficiency
Collaboration
Web APIs
SQL Databases
Historical Data
Increased Shareability
Communicable
Processes
Dependable Replicability
httr
DBI
dbply
r
Slide from https://www.youtube.com/watch?v=KreEV7p6dnU by Chris Cardillo
19. Mathematics & statistics
1. What techniques exists
• I need to group customers together -> I should try clustering
2. How to apply them
• How to do a k-means clustering in R/Python
3. How to choose which to try
• What clustering method will work best?
21. Statistics: going beyond the numbers
Who are the worst batters?
http://varianceexplained.org/r/credible_intervals_baseball/
Who are the best batters?
22. Statistics: going beyond the numbers
http://varianceexplained.org/r/credible_intervals_baseball/
23. How can we
split our
customers into
different groups
to market to?
How can we run
a clustering
algorithm to
segment
customer data?
Business
question
Data science
question
A k-means
clustering found
3 distinct
groups
Data science
answer
Business answer
Here are 3 types
of customers:
new, high
spending,
commercial
Domain knowledge
- Renee Teate, @BecomingDataSci
Skills:
• Communication
• Empathy
• Understanding your data (where it lives, built-in assumptions, edge cases)
25. Analytics
Pulled from Airbnb Careers
• Define and evaluate
key metrics
• Develop dashboards
• Communicate
analyses
• Comfortable in SQL
• Industry experience
26. Algorithms
From Airbnb Careers
• Deep Learning
techniques
• Natural language
processing
• Strong programming
skills
• Developing ML
models at scale in
27. Inference
From Airbnb Careers
• Run strategic
analysis
• Design experiments
• Improve statistical
methodology
• PhD in quantitative
field
28. Three completely different job descriptions
From Airbnb Careers
• Deep Learning
techniques
• Natural language
processing
• Strong programming
skills
• Developing ML
models at scale in
• Define and evaluate
key metrics
• Develop dashboards
• Communicate
analyses
• Comfortable in SQL
• Industry experience
• Run strategic
analysis
• Design experiments
• Improve statistical
methodology
• PhD in quantitative
field
34. Why?
“Don’t get stressed about keeping up with the
cutting edge of the field … You should start by
getting very comfortable transforming and
visualizing data, programming with a wide variety
of packages, and using statistical techniques like
hypothesis tests, classification, and regression.”
- David Robinson, Data Insights Engineering Manager at Flatiron Health, Chapter 4
39. Tip 1: Include visualizations
https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6
40. Tip 2: choose a topic you’re excited about
https://masalmon.eu/2018/01/01/sortinghat/
41. Tip 3: Limit your scope
https://kkulma.github.io/2017-08-13-friendships-among-top-r-twitterers/
42. Making progress
Inspired by bit.ly/drob-rstudio-2019
Less valuable More valuable
Idea Getting data Cleaning Exploratory Final resultModeling
Less valuable More valuable
Work only on
your computer
Work online
(GitHub, Blog, Kaggle)
How I used to think about analyses
How I think about analyses now
46. The potential future of data scientists
From https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists
“It wouldn’t surprise me if
the [data scientist] title
goes the way of the
‘webmaster’”
- Hilary Mason
47. Resources
• Day in the life of a data scientist webinar by David Robinson
• You’re not paid to model by Jacqueline Nolis
• Doing data science at Twitter by Robert Chang
• Succeeding as a data scientist in small companies/startups by Randy Au
• How to change careers and become a data scientist – one quant’s experience
by Rachel Thomas
• What data scientists really do, according to 35 data scientists by Hugo Bowne-
Anderson