SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins
references: bit.ly/icerm
data science: mindset & toolset
drew conway, 2010
references: bit.ly/icerm
modern history:
2009
references: bit.ly/icerm
modern history:
2009
references: bit.ly/icerm
data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins
references: bit.ly/icerm
biology: 1892 vs. 1995
references: bit.ly/icerm
biology: 1892 vs. 1995
references: bit.ly/icerm
new toolset, new mindset
1851
references: bit.ly/icerm
news: 20th century
church state
references: bit.ly/icerm
church
references: bit.ly/icerm
church
references: bit.ly/icerm
church
news: 20th century
church state
references: bit.ly/icerm
news: 21st century
church state
engineering
references: bit.ly/icerm
news: 21st century
church state
engineering: DSE
references: bit.ly/icerm
1851 1996
newspapering: 1851 vs. 1996
references: bit.ly/icerm
example:
millions of views per hour2015
references: bit.ly/icerm
references: bit.ly/icerm
"...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’
references: bit.ly/icerm
"...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’ - J Chambers, Bell Labs,1993
data science: the web
references: bit.ly/icerm
data science: the web
is your “online presence”
references: bit.ly/icerm
data science: the web
is a microscope
references: bit.ly/icerm
data science: the web
is an experimental tool
references: bit.ly/icerm
data science: the web
is an optimization tool
references: bit.ly/icerm
1851 1996
newspapering: 1851 vs. 1996 vs. 2008
2008
references: bit.ly/icerm
“a startup is a temporary organization in search of a
repeatable and scalable business model” —Steve Blank
references: bit.ly/icerm
every publisher is now a startup
references: bit.ly/icerm
every publisher is now a startup
news: 21st century
church state
engineering
references: bit.ly/icerm
news: 21st century
church state
engineering
references: bit.ly/icerm
learnings
references: bit.ly/icerm
learnings
- predictive analytics
- descriptive analytics
- prescriptive analytics
references: bit.ly/icerm
learnings
- predictive analytics
- descriptive analytics
- prescriptive analytics
cf. modelingsocialdata.org
references: bit.ly/icerm
predictive analytics, e.g.,
cf. modelingsocialdata.org
predictive analytics, e.g.,
“the funnel”
cf. modelingsocialdata.org
interpretable predictive analytics
supercoolstuff
cf. modelingsocialdata.org
interpretable predictive analytics
supercoolstuff
cf. modelingsocialdata.org
arxiv.org/abs/q-bio/0701021
optimization & learning, e.g.,
“How The New York Times Works “popular mechanics, 2015
optimization & prediction, e.g.,
“How The New York Times Works “popular mechanics, 2015
(some models)
(somemoneys)
recommendation as predictive analytics
recommendation as predictive analytics
bit.ly/AlexCTM
descriptive analytics, e.g,
cf. daeilkim.com ; import bnpy
prescriptive analytics, e.g,
prescriptive analytics, e.g,
prescriptive analytics, e.g,
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
common requirements in
data science:
common requirements in
data science:
1. people
2. ideas
3. things
cf. USAF
things:
what does DS team deliver?
things:
what does DS team deliver?
- build data prototypes
- build APIs
- impact roadmaps
- build data prototypes
- build data prototypes
cf. daeilkim.com
- build data prototypes
cf. daeilkim.com
- in puppet, w/python2.7
- collaboration w/pers. team
- build APIs
- impact roadmaps
flickr/McJex
data science: ideas
data skills
- data engineering
- data science
- data visualization
- data product
- data multiliteracies
- data embeds
cf. “data scientists at work”, ch 1
data skills
- data engineering
- data science
- data visualization
- data product
- data multiliteracies
- data embeds
cf. “data scientists at work”, ch 1
data skills
- data engineering
- data science
- data visualization
- data product
- data multiliteracies
- data embeds
cf. “data scientists at work”, ch 1
data science: people
- new mindset > new toolset
data science: people
summary:
pay attention to:
1. people
2. ideas
3. things
cf. USAF
thanks to the data science team!
data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins

Weitere ähnliche Inhalte

Was ist angesagt?

Computer assisted research and reporting
Computer assisted research and reportingComputer assisted research and reporting
Computer assisted research and reporting
peterverweij
 
Privately Obscure
Privately ObscurePrivately Obscure
Privately Obscure
lisafilipek
 

Was ist angesagt? (20)

Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingStorytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
 
Data journalism
Data journalism Data journalism
Data journalism
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Maps and data esri health care 2012
Maps and data   esri health care 2012Maps and data   esri health care 2012
Maps and data esri health care 2012
 
The s+a3 project: leveraging analytic resources
The s+a3 project: leveraging analytic resourcesThe s+a3 project: leveraging analytic resources
The s+a3 project: leveraging analytic resources
 
Big Data and the Social Sciences
Big Data and the Social SciencesBig Data and the Social Sciences
Big Data and the Social Sciences
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
 
Statistics in Journalism Sheffield 2014
Statistics in Journalism Sheffield 2014Statistics in Journalism Sheffield 2014
Statistics in Journalism Sheffield 2014
 
It's the people's data
It's the people's dataIt's the people's data
It's the people's data
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 
Dharmendra Rama
Dharmendra RamaDharmendra Rama
Dharmendra Rama
 
Computer assisted research and reporting
Computer assisted research and reportingComputer assisted research and reporting
Computer assisted research and reporting
 
Data journalism presentation
Data journalism presentationData journalism presentation
Data journalism presentation
 
It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media Organizations
 
10 ways AI can be used for investigations
10 ways AI can be used for investigations10 ways AI can be used for investigations
10 ways AI can be used for investigations
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
Privately Obscure
Privately ObscurePrivately Obscure
Privately Obscure
 
The Management Accountant in a Digital World The interface of strategy, tech...
The Management Accountant in a Digital World  The interface of strategy, tech...The Management Accountant in a Digital World  The interface of strategy, tech...
The Management Accountant in a Digital World The interface of strategy, tech...
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 

Ähnlich wie DataEngConf: Data Science at the New York Times by Chris Wiggins

Ähnlich wie DataEngConf: Data Science at the New York Times by Chris Wiggins (10)

850 keynote siegel
850 keynote siegel850 keynote siegel
850 keynote siegel
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
Elsevier/Maryland Publishing Connect - 14_0331 (pdf)
 
[데이터야 놀자 2018] 뉴욕에서 만난 데이터 사이언스
[데이터야 놀자 2018] 뉴욕에서 만난 데이터 사이언스[데이터야 놀자 2018] 뉴욕에서 만난 데이터 사이언스
[데이터야 놀자 2018] 뉴욕에서 만난 데이터 사이언스
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 
Event Sourcing 101 (Vienna, 21. 2. 2019)
Event Sourcing 101 (Vienna, 21. 2. 2019)Event Sourcing 101 (Vienna, 21. 2. 2019)
Event Sourcing 101 (Vienna, 21. 2. 2019)
 
Data Anayltics: How to predict anything
Data Anayltics: How to predict anythingData Anayltics: How to predict anything
Data Anayltics: How to predict anything
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013
 
Why, When, and How You Should Update Your Content (Raffaele Asquer, SearchLov...
Why, When, and How You Should Update Your Content (Raffaele Asquer, SearchLov...Why, When, and How You Should Update Your Content (Raffaele Asquer, SearchLov...
Why, When, and How You Should Update Your Content (Raffaele Asquer, SearchLov...
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 

Mehr von Hakka Labs

Mehr von Hakka Labs (20)

Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceDataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data Science
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 

Kürzlich hochgeladen

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Kürzlich hochgeladen (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 

DataEngConf: Data Science at the New York Times by Chris Wiggins