SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
A Statistician Walks into a Tech Company
R at a rapidly scaling healthcare technology startup
Sandy Griffith
Twitter: @sgrifter
sgriffith@flatiron.com
www.flatiron.com
My story
Academic biostatistics
© 2016 Flatiron Health, Inc. Proprietary and confidential.
My story
3
Academic biostatistics Healthcare tech
© 2016 Flatiron Health, Inc. Proprietary and confidential. 4
Flatiron’s mission is to serve cancer patients and our
partners by dramatically improving treatment and
accelerating research.
Our Mission
Flatiron Processes EHR Data At Scale
© 2016 Flatiron Health, Inc. Proprietary and confidential. 5
Research-
Grade Data
Demographics
Diagnosis
Visits
Labs
e-Prescribing
Pathology
Report
Discharge
Notes
Radiology
Report
Physician
Notes
Electronic Health
Record
Structured Data Unstructured Data Outside
Practice
Hospital
Lab
Structured Data
Processing
Unstructured
Data
Processing
Standard EHR Data
Rapidly Scaling
January 2015
Flatiron: ~140
Software Engineers: ~50
Quantitative Sciences team: 1
6© 2016 Flatiron Health, Inc. Proprietary and confidential.
Now: We are a team of 262
7
We include

All Flatiron data and tools are collaboratively built, implemented and maintained by a
cross-disciplinary team that includes oncology, engineering, and quantitative sciences
We come from

9 Medical oncologists and nurses
70 Software engineers
10 Quantitative scientists
5 Medical informaticists
+ more!
© 2016 Flatiron Health, Inc. Proprietary and confidential.
Primary Language: time of hire
© 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R: time of hire
9© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
10© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
11© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
12© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
13© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R
14© 2016 Flatiron Health, Inc. Proprietary and confidential.
Time of hire Now
Now we have R users, but when should we use R?
Three scenarios:
1. R for prototyping → !R in production
2. R as a long-term solution
3. R and !R in parallel
15© 2016 Flatiron Health, Inc. Proprietary and confidential.
R for prototyping → !R in production
16© 2016 Flatiron Health, Inc. Proprietary and confidential.
Prototype
● One-time linkage
● Small cohort (10s of thousands)
● RecordLinkage R package
● Probabilistic linkage method using
EM algorithm
Production
● Repeated daily at scale
● Large cohort (~5 million patients)
● Code maintained by different team
● Deterministic logic in SQL
Example: Linking external mortality data
R for prototyping → !R in production
Why this made sense:
● Stable method -- No longer needed rapid iteration
● Tuning parameters
● Similar performance, more transparency
● No R users on team that would be maintaining code
17© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Linking external mortality data
R as a long-term solution
Early version (Jan 2015)
18© 2016 Flatiron Health, Inc. Proprietary and confidential.
● bash commands for extracting data
run from R script using ETL tool
● R script run via command line
● parameters in metafiles manually
updated
● Runs a series of Rmd files and
renders HTML output
Current Version (April 2016)
Example: Rmarkdown QA report
● linked to data pipeline maintained
by software engineering
● metafile generated dynamically
● Plotly survival curves
● Flatly bootstrap theme
● Plan to continue using R
indefinitely
R as a long-term solution
19© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Rmarkdown QA report
Why this made sense:
● Mature product and team
● Quantitative science members remain embedded in team
● Strong support and collaboration with software engineering
● Requirements are dynamic -- continued need for rapid
prototyping
R and !R in parallel
● Specific research questions
● 2 people code independently in Python/SQL and R
● Compare results
● Language sometimes incidental, more about 2 different perspectives
Why this made sense:
● High stakes or low error tolerance
● Complicated concepts
● Custom projects often involve novel problems
20© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Some external collaborations
Thank you
● Melissa Curtis
● Josh Kraut
● Kathi Seidl-Rathkopf
● Cindy Revol
● Rachael Sorg
● Jay Rughani
21© 2016 Flatiron Health, Inc. Proprietary and confidential.
● Paul You
● Aracelis Torres
● Alphan Kirayoglu
● Ben Birnbaum
● Ann Jaskiw
● James Gippetti
Join our Team!
Drop me a note at sgriffith@flatiron.com, @sgrifter,
or visit flatiron.com/careers

Weitere Àhnliche Inhalte

Was ist angesagt?

High-Performance Python
High-Performance PythonHigh-Performance Python
High-Performance PythonWork-Bench
 
#rstats lessons for #measure
#rstats lessons for #measure#rstats lessons for #measure
#rstats lessons for #measureMark Edmondson
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
 
Agile Data
Agile DataAgile Data
Agile Dataodsc
 
Using airflow for tools development
Using airflow for tools developmentUsing airflow for tools development
Using airflow for tools developmentiblaine
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data ScienceDhiana Deva
 
Web Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLWeb Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLRoy Derks
 
ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014
ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014
ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014Satoshi Kitajima
 
The crusade for big data in the AAL domain
The crusade for big data in the AAL domainThe crusade for big data in the AAL domain
The crusade for big data in the AAL domainAALForum
 
How to do Keyword Research: 7 Techniques & Tools
How to do Keyword Research: 7 Techniques & ToolsHow to do Keyword Research: 7 Techniques & Tools
How to do Keyword Research: 7 Techniques & ToolsAffilorama
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the mythsChris Swan
 
Julia + R for Data Science
Julia + R for Data ScienceJulia + R for Data Science
Julia + R for Data ScienceWork-Bench
 
Continuous Integration - NoVA CodeCamp 2014-10-11
Continuous Integration - NoVA CodeCamp 2014-10-11Continuous Integration - NoVA CodeCamp 2014-10-11
Continuous Integration - NoVA CodeCamp 2014-10-11Stephen Ritchie
 
MLconf NYC Josh Wills
MLconf NYC Josh WillsMLconf NYC Josh Wills
MLconf NYC Josh WillsMLconf
 
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...DevSecCon
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Databricks
 

Was ist angesagt? (16)

High-Performance Python
High-Performance PythonHigh-Performance Python
High-Performance Python
 
#rstats lessons for #measure
#rstats lessons for #measure#rstats lessons for #measure
#rstats lessons for #measure
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Agile Data
Agile DataAgile Data
Agile Data
 
Using airflow for tools development
Using airflow for tools developmentUsing airflow for tools development
Using airflow for tools development
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Web Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLWeb Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQL
 
ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014
ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014
ćˆ†æžé©ć‘œăŒă‚‚ăŸă‚‰ă™ăƒ“ăƒƒă‚°ăƒ‡ăƒŒă‚żăźäž–ç•Œ@Cloudera World Tokyo 2014
 
The crusade for big data in the AAL domain
The crusade for big data in the AAL domainThe crusade for big data in the AAL domain
The crusade for big data in the AAL domain
 
How to do Keyword Research: 7 Techniques & Tools
How to do Keyword Research: 7 Techniques & ToolsHow to do Keyword Research: 7 Techniques & Tools
How to do Keyword Research: 7 Techniques & Tools
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the myths
 
Julia + R for Data Science
Julia + R for Data ScienceJulia + R for Data Science
Julia + R for Data Science
 
Continuous Integration - NoVA CodeCamp 2014-10-11
Continuous Integration - NoVA CodeCamp 2014-10-11Continuous Integration - NoVA CodeCamp 2014-10-11
Continuous Integration - NoVA CodeCamp 2014-10-11
 
MLconf NYC Josh Wills
MLconf NYC Josh WillsMLconf NYC Josh Wills
MLconf NYC Josh Wills
 
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...
 

Andere mochten auch

Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and RWork-Bench
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at AirbnbWork-Bench
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R ConsortiumWork-Bench
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social PenumbrasWork-Bench
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCWork-Bench
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationWork-Bench
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit DataWork-Bench
 
The Feels
The FeelsThe Feels
The FeelsWork-Bench
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesWork-Bench
 
Data Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program AnalysisData Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program AnalysisWork-Bench
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceWork-Bench
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesWork-Bench
 
R for Everything
R for EverythingR for Everything
R for EverythingWork-Bench
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big DataWork-Bench
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionWork-Bench
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT GraphicsWork-Bench
 

Andere mochten auch (16)

Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at Airbnb
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
The Feels
The FeelsThe Feels
The Feels
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
Data Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program AnalysisData Science Challenges in Personal Program Analysis
Data Science Challenges in Personal Program Analysis
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
R for Everything
R for EverythingR for Everything
R for Everything
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament edition
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
 

Ähnlich wie A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup

The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...Adrian Olszewski
 
Data mining with Rattle For R
Data mining with Rattle For RData mining with Rattle For R
Data mining with Rattle For RAkhil Anil
 
Introduction To R
Introduction To RIntroduction To R
Introduction To RSpotle.ai
 
GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineAdrian Olszewski
 
MongoDB_Talk_ValidatingAnOpenSociety_112916_Final
MongoDB_Talk_ValidatingAnOpenSociety_112916_FinalMongoDB_Talk_ValidatingAnOpenSociety_112916_Final
MongoDB_Talk_ValidatingAnOpenSociety_112916_FinalJennifer Shin
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
 
Executive Intro to R
Executive Intro to RExecutive Intro to R
Executive Intro to RWilliam M. Cohee
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcastinside-BigData.com
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...IT Arena
 
R Vs Python – The most trending debate of aspiring Data Scientists
R Vs Python – The most trending debate of aspiring Data ScientistsR Vs Python – The most trending debate of aspiring Data Scientists
R Vs Python – The most trending debate of aspiring Data Scientistsabhishekdf3
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in RRevolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING RUmair Shafique
 
Introducing The R Software
Introducing The R Software  Introducing The R Software
Introducing The R Software Kamarul Imran
 
OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...
OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...
OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...vasuballa
 

Ähnlich wie A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup (20)

The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...
 
Data mining with Rattle For R
Data mining with Rattle For RData mining with Rattle For R
Data mining with Rattle For R
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
Reason To learn & use r
Reason To learn & use rReason To learn & use r
Reason To learn & use r
 
GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based Medicine
 
MongoDB_Talk_ValidatingAnOpenSociety_112916_Final
MongoDB_Talk_ValidatingAnOpenSociety_112916_FinalMongoDB_Talk_ValidatingAnOpenSociety_112916_Final
MongoDB_Talk_ValidatingAnOpenSociety_112916_Final
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
 
Executive Intro to R
Executive Intro to RExecutive Intro to R
Executive Intro to R
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
 
R Vs Python – The most trending debate of aspiring Data Scientists
R Vs Python – The most trending debate of aspiring Data ScientistsR Vs Python – The most trending debate of aspiring Data Scientists
R Vs Python – The most trending debate of aspiring Data Scientists
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING R
 
Introducing The R Software
Introducing The R Software  Introducing The R Software
Introducing The R Software
 
OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...
OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...
OOW16 - Oracle E-Business Suite 12 Upgrade Experience for a 14 TB Oracle E-Bu...
 

Mehr von Work-Bench

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise AlmanacWork-Bench
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersWork-Bench
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessWork-Bench
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedWork-Bench
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBWork-Bench
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseWork-Bench
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the EnterpriseWork-Bench
 
Playing the Marketing Long Game
Playing the Marketing Long GamePlaying the Marketing Long Game
Playing the Marketing Long GameWork-Bench
 

Mehr von Work-Bench (8)

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
 
Playing the Marketing Long Game
Playing the Marketing Long GamePlaying the Marketing Long Game
Playing the Marketing Long Game
 

KĂŒrzlich hochgeladen

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 

KĂŒrzlich hochgeladen (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 

A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup

  • 1. A Statistician Walks into a Tech Company R at a rapidly scaling healthcare technology startup Sandy Griffith Twitter: @sgrifter sgriffith@flatiron.com www.flatiron.com
  • 2. My story Academic biostatistics © 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 4. © 2016 Flatiron Health, Inc. Proprietary and confidential. 4 Flatiron’s mission is to serve cancer patients and our partners by dramatically improving treatment and accelerating research. Our Mission
  • 5. Flatiron Processes EHR Data At Scale © 2016 Flatiron Health, Inc. Proprietary and confidential. 5 Research- Grade Data Demographics Diagnosis Visits Labs e-Prescribing Pathology Report Discharge Notes Radiology Report Physician Notes Electronic Health Record Structured Data Unstructured Data Outside Practice Hospital Lab Structured Data Processing Unstructured Data Processing Standard EHR Data
  • 6. Rapidly Scaling January 2015 Flatiron: ~140 Software Engineers: ~50 Quantitative Sciences team: 1 6© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 7. Now: We are a team of 262 7 We include
 All Flatiron data and tools are collaboratively built, implemented and maintained by a cross-disciplinary team that includes oncology, engineering, and quantitative sciences We come from
 9 Medical oncologists and nurses 70 Software engineers 10 Quantitative scientists 5 Medical informaticists + more! © 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 8. Primary Language: time of hire © 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 9. Proficiency with R: time of hire 9© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 10. A decision point early on 10© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 11. A decision point early on 11© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 12. Cultivate R culture 1. Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 12© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 13. Cultivate R culture 1. Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 13© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 14. Proficiency with R 14© 2016 Flatiron Health, Inc. Proprietary and confidential. Time of hire Now
  • 15. Now we have R users, but when should we use R? Three scenarios: 1. R for prototyping → !R in production 2. R as a long-term solution 3. R and !R in parallel 15© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 16. R for prototyping → !R in production 16© 2016 Flatiron Health, Inc. Proprietary and confidential. Prototype ● One-time linkage ● Small cohort (10s of thousands) ● RecordLinkage R package ● Probabilistic linkage method using EM algorithm Production ● Repeated daily at scale ● Large cohort (~5 million patients) ● Code maintained by different team ● Deterministic logic in SQL Example: Linking external mortality data
  • 17. R for prototyping → !R in production Why this made sense: ● Stable method -- No longer needed rapid iteration ● Tuning parameters ● Similar performance, more transparency ● No R users on team that would be maintaining code 17© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Linking external mortality data
  • 18. R as a long-term solution Early version (Jan 2015) 18© 2016 Flatiron Health, Inc. Proprietary and confidential. ● bash commands for extracting data run from R script using ETL tool ● R script run via command line ● parameters in metafiles manually updated ● Runs a series of Rmd files and renders HTML output Current Version (April 2016) Example: Rmarkdown QA report ● linked to data pipeline maintained by software engineering ● metafile generated dynamically ● Plotly survival curves ● Flatly bootstrap theme ● Plan to continue using R indefinitely
  • 19. R as a long-term solution 19© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Rmarkdown QA report Why this made sense: ● Mature product and team ● Quantitative science members remain embedded in team ● Strong support and collaboration with software engineering ● Requirements are dynamic -- continued need for rapid prototyping
  • 20. R and !R in parallel ● Specific research questions ● 2 people code independently in Python/SQL and R ● Compare results ● Language sometimes incidental, more about 2 different perspectives Why this made sense: ● High stakes or low error tolerance ● Complicated concepts ● Custom projects often involve novel problems 20© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Some external collaborations
  • 21. Thank you ● Melissa Curtis ● Josh Kraut ● Kathi Seidl-Rathkopf ● Cindy Revol ● Rachael Sorg ● Jay Rughani 21© 2016 Flatiron Health, Inc. Proprietary and confidential. ● Paul You ● Aracelis Torres ● Alphan Kirayoglu ● Ben Birnbaum ● Ann Jaskiw ● James Gippetti Join our Team! Drop me a note at sgriffith@flatiron.com, @sgrifter, or visit flatiron.com/careers