SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Data Scientist – The New Data Analyst
    Josh Wills, Senior Director of Data Science




1
About Me




2
What Do Data Scientists Do?




3
What I Think I Do




4
What Other People Think I Do




5
What I Actually Do




6
Think Like a Data Scientist




7
Solving Problems vs. Finding Insights




8
Parallelize Everything




9
Abundance vs. Scarcity




10
Building Data Products




11
Create a Data Science Team




12
Choose Good Problems




13
Design the Model




14
Mind the Gap




15
Amortize Costs




16
Measure Everything




17
Rinse and Repeat




18
Work Like a Data Scientist




19
Introduction to Data Science:
     Building Recommender Systems
       December 12-14, New York, NY
       http://university.cloudera.com/




20
Thank you!
Josh Wills, Director of Data Science, Cloudera   @josh_wills

Weitere ähnliche Inhalte

Was ist angesagt?

otto_presentation
otto_presentationotto_presentation
otto_presentation
Tyler Otto
 

Was ist angesagt? (20)

5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
5. Workshop Responsible Data Science - Discussion on Transparency in data sci...
 
Building an Insight Machine - Strata DDBD 2015
Building an Insight Machine - Strata DDBD 2015Building an Insight Machine - Strata DDBD 2015
Building an Insight Machine - Strata DDBD 2015
 
otto_presentation
otto_presentationotto_presentation
otto_presentation
 
2018 02 converged it
2018 02 converged it2018 02 converged it
2018 02 converged it
 
Wingu -- Pharma-in-a-Box
Wingu -- Pharma-in-a-BoxWingu -- Pharma-in-a-Box
Wingu -- Pharma-in-a-Box
 
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
 
Decision Analysis
Decision AnalysisDecision Analysis
Decision Analysis
 
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
 
Tableau - Make your SEO data work for you!
Tableau - Make your SEO data work for you!Tableau - Make your SEO data work for you!
Tableau - Make your SEO data work for you!
 
How to Become a Data Science Company instead of a company with Data Scientist...
How to Become a Data Science Company instead of a company with Data Scientist...How to Become a Data Science Company instead of a company with Data Scientist...
How to Become a Data Science Company instead of a company with Data Scientist...
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital Enterprise
 
Publications
PublicationsPublications
Publications
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
Steeping into the shoes of data scientist
Steeping into the shoes of data scientistSteeping into the shoes of data scientist
Steeping into the shoes of data scientist
 
Security Administration Vii 2 Statistical Analysis
Security Administration Vii 2 Statistical AnalysisSecurity Administration Vii 2 Statistical Analysis
Security Administration Vii 2 Statistical Analysis
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data Science
 
Data driven omaha_12112014
Data driven omaha_12112014Data driven omaha_12112014
Data driven omaha_12112014
 
S1240226 assigment c
S1240226 assigment cS1240226 assigment c
S1240226 assigment c
 
Beauty of data visualization
Beauty of data visualizationBeauty of data visualization
Beauty of data visualization
 
Digital Economics
Digital EconomicsDigital Economics
Digital Economics
 

Andere mochten auch

Statistical quality control
Statistical quality controlStatistical quality control
Statistical quality control
Anubhav Grover
 

Andere mochten auch (19)

Datascience
DatascienceDatascience
Datascience
 
Design Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopDesign Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoop
 
Data Analyst Role
Data Analyst RoleData Analyst Role
Data Analyst Role
 
Cost of Quality How to Save Money
Cost of Quality How to Save MoneyCost of Quality How to Save Money
Cost of Quality How to Save Money
 
5 Key Hacks for Breakthrough Innovation
5 Key Hacks for Breakthrough Innovation 5 Key Hacks for Breakthrough Innovation
5 Key Hacks for Breakthrough Innovation
 
Strategic Cost Management
Strategic Cost ManagementStrategic Cost Management
Strategic Cost Management
 
Be Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data ScientistBe Data Informed Without Being a Data Scientist
Be Data Informed Without Being a Data Scientist
 
Impact Hiring: How Data Will Transform Youth Employment
Impact Hiring: How Data Will Transform Youth EmploymentImpact Hiring: How Data Will Transform Youth Employment
Impact Hiring: How Data Will Transform Youth Employment
 
Statistical quality control
Statistical quality controlStatistical quality control
Statistical quality control
 
Quality by Design
Quality by DesignQuality by Design
Quality by Design
 
Quality
QualityQuality
Quality
 
The Personality of a Scientist
The Personality of a Scientist The Personality of a Scientist
The Personality of a Scientist
 
Project quality management - PMI PMBOK Knowledge Area
Project quality management - PMI PMBOK Knowledge AreaProject quality management - PMI PMBOK Knowledge Area
Project quality management - PMI PMBOK Knowledge Area
 
Statistical Quality Control.
Statistical Quality Control.Statistical Quality Control.
Statistical Quality Control.
 
Creativity & innovation
Creativity & innovationCreativity & innovation
Creativity & innovation
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
SXSW 2016: The Need To Knows
SXSW 2016: The Need To KnowsSXSW 2016: The Need To Knows
SXSW 2016: The Need To Knows
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 

Ähnlich wie Data Science Day New York: Data Scientist - The New Data Analyst

Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
Jordan Engbers
 

Ähnlich wie Data Science Day New York: Data Scientist - The New Data Analyst (20)

50 Years of Data Science
50 Years of Data Science50 Years of Data Science
50 Years of Data Science
 
Data Science
Data ScienceData Science
Data Science
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Vikrant data scientist
Vikrant data scientistVikrant data scientist
Vikrant data scientist
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
How Your Data Can Predict The Future
How Your Data Can Predict The FutureHow Your Data Can Predict The Future
How Your Data Can Predict The Future
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Leland Lockhart - SXSW Intro to Data Science
Leland Lockhart -  SXSW Intro to Data ScienceLeland Lockhart -  SXSW Intro to Data Science
Leland Lockhart - SXSW Intro to Data Science
 
Design and Data Processes  Unified -  3rd Corner View
Design and Data Processes  Unified -  3rd Corner ViewDesign and Data Processes  Unified -  3rd Corner View
Design and Data Processes  Unified -  3rd Corner View
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
Accretive Health - Quality Management in Health Care
Accretive Health - Quality Management in Health CareAccretive Health - Quality Management in Health Care
Accretive Health - Quality Management in Health Care
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
Lecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptxLecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptx
 
2011 ATE Conference PreConference Workshop C
2011 ATE Conference PreConference Workshop C2011 ATE Conference PreConference Workshop C
2011 ATE Conference PreConference Workshop C
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Data Science Day New York: Data Scientist - The New Data Analyst

Hinweis der Redaktion

  1. 90% of industrial machine learning is feature extraction. What I really do is ETL.
  2. Drive revenue and new business. Data is too important to be left to business people. The data warehouse is where data goes to die-- to be used in operational reporting or diagnosing problems. When we’re talking about data products, we’re talking about creating new revenue streams, optimizing existing ones, and solving problems for customers and for the business.
  3. 1) Means I collect everything– I don’t want to waste time getting data from operational systems every time I need something new. 2) Means I keep all of the phases of data available to me– from the raw stuff, to cleansed stuff, to joined stuff. 3) Implications for denormalization-- 1) we go beyond dimensional modeling to full on denormalization, usually along the lines of one of our conformed dimensions (product, customer, etc.)
  4. Similar to the EDW team. For a small datawarehouse/datamart, the DW architect is the ETL developer, the DBA, the dashboard builder, and the business analyst all rolled in to one. When we are talking about data products– classifiers, recommenders, interactive or real-time data tools, we need to bring in the ability to take things to production.
  5. Most important decision: the metrics you’re going to use to measure performance. It is an anti-pattern to solve a problem exactly once. You should either solve a problem 0 times or N times.
  6. Time is money. Your time costs a lot more than the cost of data storage. Data acquisition, data processing, reuse code. All things you do to save money over the long term.