SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Data Science on a Budget: 
Maximizing Insight and Impact 
Nicholas Arcolano, Ph.D. 
Senior Data Scientist 
@arcolano 
Photo by giuseppemilo / CC BY
A little background… 
• Spent 10 years at MIT Lincoln Laboratory 
working in ballistic missile defense and cyber 
security research 
• Areas of interest: statistics, machine learning, 
parallel computing, “big data” 
• Realized these things had been collectively 
re-branded as “data science” 
• Started calling myself a “data scientist” and 
joined a start-up 
Nicholas Arcolano – Data Science on 2 a Budget – November 2014
What does a data scientist do? 
Nicholas Arcolano – Data Science on 3 a Budget – November 2014
What does a data scientist do? 
• Something that happens at the intersection of 
statistics, machine learning, and computer science 
• Usually involves data (typically lots of it) 
• Actually, this isn’t the most critical question to be 
worrying about 
Nicholas Arcolano – Data Science on 4 a Budget – November 2014
A better question… 
• What does a data team do? 
• Basically, two things: 
1. Use data to help the rest of the company understand what our 
users are doing 
2. Help the rest of the company use this information to improve our 
product and our business 
Nicholas Arcolano – Data Science on 5 a Budget – November 2014
The Company 
• Started in 2008 
• Based in Boston 
• About 50 people 
• 4-person data team 
• 37 million users 
• 450 million fitness activities 
• 200 billion GPS points 
• 17 billion interactions and events 
Our Product 
The Data 
• RunKeeper app for GPS and manual tracking of running, walking, 
cycling, other activities 
• Long-term fitness goals, training plans, and performance insights 
• iOS, Android, web, 3rd party devices
PRODUCT SYSTEMS 
DATA 
MARKETING 
EXECUTIVE 
BUSINESS 
DEVELOPMENT 
USER 
EXPERIENCE 
QUALITY 
ASSURANCE 
• analytics and business 
intelligence 
• modeling and forecasting 
• data systems and archiving 
• user research and testing 
• data-driven features 
• data stories and 
visualizations 
7 
SUPPORT 
“DATA SCIENCE”
How can we accomplish all this, quickly and 
with a small team? 
It’s hard… but here are some steps to 
making it easier 
Nicholas Arcolano – Data Science on 8 a Budget – November 2014
Step 1: Communicate. A lot. 
Nicholas Arcolano – Data Science on 9 a Budget – November 2014
Step 1: Communicate. A lot. 
Nicholas Arcolano – Data Science on 10 a Budget – November 2014
Step 1: Communicate. A lot. 
• You have a lot to learn about the rest of the company 
– Every part of the company has its own blend of tools, systems, processes, 
environments 
– Every part has data it understands and cares about 
– Every part knows things that affect the data that you won’t see— 
user interviews, support feedback, product bugs, system failures 
• You also have a lot to teach people 
– What data we have 
– What it can—and can’t—do 
– Empower people to “think with data” 
Nicholas Arcolano – Data Science on 11 a Budget – November 2014
Step 1: Communicate. A lot. 
• Be patient—sometimes you 
have to say the same things 
many times 
• You may be the only one 
looking at certain data—if you 
see something, say something! 
Nicholas Arcolano – Data Science on 12 a Budget – November 2014
Setting expectations 
Things our data team will discover 
exciting new things things we already knew 
Anticipated impact 
of data exploration: 
Things our data team will discover 
bugs, missing data, 
and bad data 
things we already knew 
exciting new things 
Actual impact of 
data exploration: 
Nicholas Arcolano – Data Science on 13 a Budget – November 2014
Step 2: Move quickly but carefully. 
“Wisely and slow. They stumble that 
run fast.” 
– Friar Laurence, from 
Shakespeare’s Romeo and Juliet 
Nicholas Arcolano – Data Science on 14 a Budget – November 2014
Step 2: Move quickly but carefully. 
• On moving fast… 
– Data science can work well in an agile framework 
– Make assumptions, but understand them 
– Don’t be afraid to provide caveats 
• On being cautious… 
– Bad analysis is worse than no analysis 
– Make time for data QA 
– Use common sense—if it seems to good (or bad) to be true, it usually is 
Nicholas Arcolano – Data Science on 15 a Budget – November 2014
Step 3: Keep it simple. 
• Go for lots of small, quick wins 
• Learn and iterate 
• Resist the urge to show everyone 
how smart you are by doing 
something super complicated 
Nicholas Arcolano – Data Science on 16 a Budget – November 2014
Step 3: Keep it simple. 
• Do the “stupid thing” first 
– It helps build understanding 
– It helps uncover issues with the data 
– It may turn out that you’re not even solving the right problem 
– It may actually work pretty well 
• When in doubt, favor a simpler method that you understand better 
over a more complex one 
– Easier to implement 
– Easier to debug 
– Easier to explain to others 
Nicholas Arcolano – Data Science on 17 a Budget – November 2014
You don’t have to use all the data 
• Sometimes, using all the data is the right thing to do: 
SELECT COUNT(userid) FROM rk_user; 
• Sometimes, though, you can solve your problem entirely with a 
small data set 
• Benefits 
– Easier computation and data wrangling means faster results 
– “Curse of dimensionality” is a real thing 
– Mitigate bad assumptions (lack of stationarity, different product versions, 
changing environments, regional and seasonal effects, etc.) 
Nicholas Arcolano – Data Science on 18 a Budget – November 2014
Step 4: Use the right tools. 
• In any given scenario, the “right 
tool” is one of the following: 
– The tool you already know and are 
comfortable with 
– Something you don’t know but 
suspect would work really well 
– Something that doesn’t exist yet 
• It’s up to you to figure out which 
one it is 
Nicholas Arcolano – Data Science on 19 a Budget – November 2014
Languages and technologies I used 
during 10 years at my last job 
Languages and technologies I’ve used 
during 1 year at my current job 
Step 4: Use the right tools. 
• Be comfortable using a variety of tools 
• Make time to learn new ones 
• Build your own tools for repeatable 
analysis—once you know it’s worth it 
• Open source: take advantage of the hard 
work of others, but make sure you 
understand what you’re using 
• Give back 
Nicholas Arcolano – Data Science on 20 a Budget – November 2014
Step 4: Use the right tools. 
• Many of the same principles apply to your “analytical toolkit” 
• Try to learn when to stick with a well-worn approach and when 
to try something new 
• Be skeptical of the conventional wisdom 
– Just because a metric or analytical approach is common doesn’t mean it’s 
the right thing to do for your situation 
– Typical example: A/B testing 
Nicholas Arcolano – Data Science on 21 a Budget – November 2014
Hypothesis testing (“A/B testing”) 
GROUP A 
“Control” 
GROUP B 
“Treatment” 
USERS 
90% 
10% 
Standard 
flow 
Experimental 
flow 
Test 
statistic 
Nicholas Arcolano – Data Science on 22 a Budget – November 2014 
DECISION 
“reject/accept 
null hypothesis” 
# of successes, 
failures 
# of successes, 
failures 
“Null hypothesis”: treatment has no effect 
“Alternate hypothesis”: treatment has some effect
Thoughts about A/B testing 
• A/B testing is hard to do well 
– Need lots of data and good estimates of baseline rates to have a chance at significance 
– Need lots of data infrastructure to do it quickly on a large scale 
– Need to manage variables such multiple testing, changes in product and environment, 
interactions between tests, subjects 
– Need to make sure tests align with high-level vision and learning goals 
• An A/B test can help with one very specific decision, but typically will not... 
– Help you understand how multiple different factors interact 
– Predict long-term reactions (the “taste test” phenomenon)—need longitudinal study 
– Always give you the answer you want—results may be null or inconclusive 
– Tell you anything of any value whatsoever if you did it wrong 
Nicholas Arcolano – Data Science on 23 a Budget – November 2014
Thoughts about A/B testing 
Even when performed “correctly”, an A/B 
test may not tell you what you think it does
Step 5: Have faith and have fun 
• Don’t try to understand everything all at once—keep looking from multiple 
angles and trust that more understanding will come in time 
Nicholas Arcolano – Data Science on 25 a Budget – November 2014
Step 5: Have faith and have fun 
• Working data from millions of engaged users is awesome 
• Helping your company have a real impact on their lives is even 
more awesome 
• All the tools are available to do truly amazing things 
• Make sure everyone knows how much you love the data, and 
they will grow to love it too 
Nicholas Arcolano – Data Science on 26 a Budget – November 2014
Things we’re still working on 
• Synthesizing knowledge and communicating results 
• Data-driven products and features 
• Analytics and instrumentation 
• Giving back (open source, blogging, tutorials, talks) 
Nicholas Arcolano – Data Science on 27 a Budget – November 2014
Thanks for listening! Questions? 
nicholas.arcolano@runkeeper.com 
http://arcolano.com 
@arcolano 
http://www.runkeeper.com

Weitere ähnliche Inhalte

Was ist angesagt?

Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
Suman Banerjee
 

Was ist angesagt? (12)

Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...Information Technology - Discover the Root Cause and Develop a solution throu...
Information Technology - Discover the Root Cause and Develop a solution throu...
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Data_Scientist_Position_Description
Data_Scientist_Position_DescriptionData_Scientist_Position_Description
Data_Scientist_Position_Description
 
2009 Resource Planning Summit Presentation Charles Howell
2009 Resource Planning Summit Presentation Charles Howell2009 Resource Planning Summit Presentation Charles Howell
2009 Resource Planning Summit Presentation Charles Howell
 
Be Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data ScientistBe Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data Scientist
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records eHealth: Big Data, Sports Analysis & Clinical Records
eHealth: Big Data, Sports Analysis & Clinical Records
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
 
Challenges of managing Data Science Project
Challenges of managing Data Science ProjectChallenges of managing Data Science Project
Challenges of managing Data Science Project
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 

Ähnlich wie Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD

AMA Nebraska - SurveyMonkey (08-14)
AMA Nebraska  - SurveyMonkey (08-14)AMA Nebraska  - SurveyMonkey (08-14)
AMA Nebraska - SurveyMonkey (08-14)
Brent Chudoba
 
Baworld adapting to whats happening
Baworld adapting to whats happeningBaworld adapting to whats happening
Baworld adapting to whats happening
Dave Davis PMP, PgMP, PBA
 
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
CTLes
 
Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet
joshh12
 
Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11
MattLumley
 

Ähnlich wie Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD (20)

Research and Community Building with a Roadmap
Research and Community Building with a RoadmapResearch and Community Building with a Roadmap
Research and Community Building with a Roadmap
 
Decision making
Decision makingDecision making
Decision making
 
Managerial Decision-Making
Managerial Decision-MakingManagerial Decision-Making
Managerial Decision-Making
 
Managerial Decision Making
Managerial Decision MakingManagerial Decision Making
Managerial Decision Making
 
AMA Nebraska - SurveyMonkey (08-14)
AMA Nebraska  - SurveyMonkey (08-14)AMA Nebraska  - SurveyMonkey (08-14)
AMA Nebraska - SurveyMonkey (08-14)
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics Overview
 
Baworld adapting to whats happening
Baworld adapting to whats happeningBaworld adapting to whats happening
Baworld adapting to whats happening
 
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...Dennis Massie, OCLC, USA   Come for the free analysis, stay for the community...
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet Unit 1- Research Methods Worksheet
Unit 1- Research Methods Worksheet
 
An Introduction to Monitoring & Evaluation
An Introduction to Monitoring & EvaluationAn Introduction to Monitoring & Evaluation
An Introduction to Monitoring & Evaluation
 
Program Evaluation Basics - Center for Nonprofit Success slides
Program Evaluation Basics - Center for Nonprofit Success slidesProgram Evaluation Basics - Center for Nonprofit Success slides
Program Evaluation Basics - Center for Nonprofit Success slides
 
Lesson 2 audience and research
Lesson 2 audience and researchLesson 2 audience and research
Lesson 2 audience and research
 
Guerrilla (or Agile) Evaluation for Learning
Guerrilla (or Agile) Evaluation for LearningGuerrilla (or Agile) Evaluation for Learning
Guerrilla (or Agile) Evaluation for Learning
 
Data and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentivesData and communication of research: incentives and disincentives
Data and communication of research: incentives and disincentives
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
 
Small Data Assessment and Action Research
Small Data Assessment and Action ResearchSmall Data Assessment and Action Research
Small Data Assessment and Action Research
 
Data Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better BusinessData Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better Business
 
Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11Unit 1 research methods worksheet y11
Unit 1 research methods worksheet y11
 

Mehr von freshdatabos

Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
freshdatabos
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
freshdatabos
 

Mehr von freshdatabos (9)

An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Thinking in Data Workshop
Thinking in Data WorkshopThinking in Data Workshop
Thinking in Data Workshop
 
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
Big But Personal Data: How Human Behavior Bounds Privacy and What We Can We D...
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
 
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
In Defense of Imprecision: Why Traditional Approaches to Data Visualization a...
 
Vector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhDVector Space Word Representations - Rani Nelken PhD
Vector Space Word Representations - Rani Nelken PhD
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
 
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival - You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
You Have the Data, Now What? (Chris Lynch) - 2014 Boston Data Festival -
 

Kürzlich hochgeladen

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Kürzlich hochgeladen (20)

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

Data Science on a Budget: Maximizing Insight and Impact - Nicholas Arcolano PhD

  • 1. Data Science on a Budget: Maximizing Insight and Impact Nicholas Arcolano, Ph.D. Senior Data Scientist @arcolano Photo by giuseppemilo / CC BY
  • 2. A little background… • Spent 10 years at MIT Lincoln Laboratory working in ballistic missile defense and cyber security research • Areas of interest: statistics, machine learning, parallel computing, “big data” • Realized these things had been collectively re-branded as “data science” • Started calling myself a “data scientist” and joined a start-up Nicholas Arcolano – Data Science on 2 a Budget – November 2014
  • 3. What does a data scientist do? Nicholas Arcolano – Data Science on 3 a Budget – November 2014
  • 4. What does a data scientist do? • Something that happens at the intersection of statistics, machine learning, and computer science • Usually involves data (typically lots of it) • Actually, this isn’t the most critical question to be worrying about Nicholas Arcolano – Data Science on 4 a Budget – November 2014
  • 5. A better question… • What does a data team do? • Basically, two things: 1. Use data to help the rest of the company understand what our users are doing 2. Help the rest of the company use this information to improve our product and our business Nicholas Arcolano – Data Science on 5 a Budget – November 2014
  • 6. The Company • Started in 2008 • Based in Boston • About 50 people • 4-person data team • 37 million users • 450 million fitness activities • 200 billion GPS points • 17 billion interactions and events Our Product The Data • RunKeeper app for GPS and manual tracking of running, walking, cycling, other activities • Long-term fitness goals, training plans, and performance insights • iOS, Android, web, 3rd party devices
  • 7. PRODUCT SYSTEMS DATA MARKETING EXECUTIVE BUSINESS DEVELOPMENT USER EXPERIENCE QUALITY ASSURANCE • analytics and business intelligence • modeling and forecasting • data systems and archiving • user research and testing • data-driven features • data stories and visualizations 7 SUPPORT “DATA SCIENCE”
  • 8. How can we accomplish all this, quickly and with a small team? It’s hard… but here are some steps to making it easier Nicholas Arcolano – Data Science on 8 a Budget – November 2014
  • 9. Step 1: Communicate. A lot. Nicholas Arcolano – Data Science on 9 a Budget – November 2014
  • 10. Step 1: Communicate. A lot. Nicholas Arcolano – Data Science on 10 a Budget – November 2014
  • 11. Step 1: Communicate. A lot. • You have a lot to learn about the rest of the company – Every part of the company has its own blend of tools, systems, processes, environments – Every part has data it understands and cares about – Every part knows things that affect the data that you won’t see— user interviews, support feedback, product bugs, system failures • You also have a lot to teach people – What data we have – What it can—and can’t—do – Empower people to “think with data” Nicholas Arcolano – Data Science on 11 a Budget – November 2014
  • 12. Step 1: Communicate. A lot. • Be patient—sometimes you have to say the same things many times • You may be the only one looking at certain data—if you see something, say something! Nicholas Arcolano – Data Science on 12 a Budget – November 2014
  • 13. Setting expectations Things our data team will discover exciting new things things we already knew Anticipated impact of data exploration: Things our data team will discover bugs, missing data, and bad data things we already knew exciting new things Actual impact of data exploration: Nicholas Arcolano – Data Science on 13 a Budget – November 2014
  • 14. Step 2: Move quickly but carefully. “Wisely and slow. They stumble that run fast.” – Friar Laurence, from Shakespeare’s Romeo and Juliet Nicholas Arcolano – Data Science on 14 a Budget – November 2014
  • 15. Step 2: Move quickly but carefully. • On moving fast… – Data science can work well in an agile framework – Make assumptions, but understand them – Don’t be afraid to provide caveats • On being cautious… – Bad analysis is worse than no analysis – Make time for data QA – Use common sense—if it seems to good (or bad) to be true, it usually is Nicholas Arcolano – Data Science on 15 a Budget – November 2014
  • 16. Step 3: Keep it simple. • Go for lots of small, quick wins • Learn and iterate • Resist the urge to show everyone how smart you are by doing something super complicated Nicholas Arcolano – Data Science on 16 a Budget – November 2014
  • 17. Step 3: Keep it simple. • Do the “stupid thing” first – It helps build understanding – It helps uncover issues with the data – It may turn out that you’re not even solving the right problem – It may actually work pretty well • When in doubt, favor a simpler method that you understand better over a more complex one – Easier to implement – Easier to debug – Easier to explain to others Nicholas Arcolano – Data Science on 17 a Budget – November 2014
  • 18. You don’t have to use all the data • Sometimes, using all the data is the right thing to do: SELECT COUNT(userid) FROM rk_user; • Sometimes, though, you can solve your problem entirely with a small data set • Benefits – Easier computation and data wrangling means faster results – “Curse of dimensionality” is a real thing – Mitigate bad assumptions (lack of stationarity, different product versions, changing environments, regional and seasonal effects, etc.) Nicholas Arcolano – Data Science on 18 a Budget – November 2014
  • 19. Step 4: Use the right tools. • In any given scenario, the “right tool” is one of the following: – The tool you already know and are comfortable with – Something you don’t know but suspect would work really well – Something that doesn’t exist yet • It’s up to you to figure out which one it is Nicholas Arcolano – Data Science on 19 a Budget – November 2014
  • 20. Languages and technologies I used during 10 years at my last job Languages and technologies I’ve used during 1 year at my current job Step 4: Use the right tools. • Be comfortable using a variety of tools • Make time to learn new ones • Build your own tools for repeatable analysis—once you know it’s worth it • Open source: take advantage of the hard work of others, but make sure you understand what you’re using • Give back Nicholas Arcolano – Data Science on 20 a Budget – November 2014
  • 21. Step 4: Use the right tools. • Many of the same principles apply to your “analytical toolkit” • Try to learn when to stick with a well-worn approach and when to try something new • Be skeptical of the conventional wisdom – Just because a metric or analytical approach is common doesn’t mean it’s the right thing to do for your situation – Typical example: A/B testing Nicholas Arcolano – Data Science on 21 a Budget – November 2014
  • 22. Hypothesis testing (“A/B testing”) GROUP A “Control” GROUP B “Treatment” USERS 90% 10% Standard flow Experimental flow Test statistic Nicholas Arcolano – Data Science on 22 a Budget – November 2014 DECISION “reject/accept null hypothesis” # of successes, failures # of successes, failures “Null hypothesis”: treatment has no effect “Alternate hypothesis”: treatment has some effect
  • 23. Thoughts about A/B testing • A/B testing is hard to do well – Need lots of data and good estimates of baseline rates to have a chance at significance – Need lots of data infrastructure to do it quickly on a large scale – Need to manage variables such multiple testing, changes in product and environment, interactions between tests, subjects – Need to make sure tests align with high-level vision and learning goals • An A/B test can help with one very specific decision, but typically will not... – Help you understand how multiple different factors interact – Predict long-term reactions (the “taste test” phenomenon)—need longitudinal study – Always give you the answer you want—results may be null or inconclusive – Tell you anything of any value whatsoever if you did it wrong Nicholas Arcolano – Data Science on 23 a Budget – November 2014
  • 24. Thoughts about A/B testing Even when performed “correctly”, an A/B test may not tell you what you think it does
  • 25. Step 5: Have faith and have fun • Don’t try to understand everything all at once—keep looking from multiple angles and trust that more understanding will come in time Nicholas Arcolano – Data Science on 25 a Budget – November 2014
  • 26. Step 5: Have faith and have fun • Working data from millions of engaged users is awesome • Helping your company have a real impact on their lives is even more awesome • All the tools are available to do truly amazing things • Make sure everyone knows how much you love the data, and they will grow to love it too Nicholas Arcolano – Data Science on 26 a Budget – November 2014
  • 27. Things we’re still working on • Synthesizing knowledge and communicating results • Data-driven products and features • Analytics and instrumentation • Giving back (open source, blogging, tutorials, talks) Nicholas Arcolano – Data Science on 27 a Budget – November 2014
  • 28. Thanks for listening! Questions? nicholas.arcolano@runkeeper.com http://arcolano.com @arcolano http://www.runkeeper.com