SlideShare ist ein Scribd-Unternehmen logo
1 von 32
The Three Sexy Skills of Data Scientists (& Data-Driven Startups) Michael Driscoll | Metamarkets IA Ventures Big Data Conference | Oct 2010 = + For print version:  http://www.dataspora.com/blog
I. THE PROMISE OF BIG DATA
What is Big Data? Data that is distributed.
Attack of the Exponentials 1
Attack of the Exponentials 2
Attack of the Exponentials 3
Economics of Data Processing $ extract monetize BIG DATA FEATURES ECONOMIC VALUE
Economic Value > Extraction Cost
II. 3 SEXY SKILLS OF DATA SCIENTISTS… … & DATA-DRIVEN  START-UPS
=suffering
+ = if ($foo =~  / {2,3}([A-Z]{5,7}) {2,5}/)
Examples of Data Munging Start-ups r
=statistics
data model 1000 bytes 2 bytes
Examples of Statistical Data Products at Start-ups
=storytelling
Exploratory  Visualization
Narrative Visualization Source: NYT, inspired by Wattenberg & Bryon, http://www.leebyron.com/else/streamgraph/
Examples of Data Visualization Start-ups
III. THE BIG DATA ECOSYSTEM
Actions Products (APIs, Dashboards, Tools) Analytics (R, SPSS, SAS, SAP) Insights Data Hadoop, Parallel RDBMS  Data
THANKS! Michael Driscoll mike@metamarketsgroup.com @dataspora
The Three Sexy Skills of Data Scientists (& Data-Driven Startups) Michael Driscoll | Metamarkets IA Ventures Big Data Conference | Oct 2010 = +
EXTRAS
WHAT IS DATA  SCIENCE?
3 Sexy Skills of Data Scientists

Weitere ähnliche Inhalte

Andere mochten auch

One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media MarketsMichael Driscoll
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)Prof. Dr. Diego Kuonen
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)Prof. Dr. Diego Kuonen
 
A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)Prof. Dr. Diego Kuonen
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)Prof. Dr. Diego Kuonen
 
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)Prof. Dr. Diego Kuonen
 
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...Prof. Dr. Diego Kuonen
 
A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)Prof. Dr. Diego Kuonen
 
A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)Prof. Dr. Diego Kuonen
 
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...Prof. Dr. Diego Kuonen
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)Prof. Dr. Diego Kuonen
 
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...Prof. Dr. Diego Kuonen
 
Using Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingUsing Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingAnidata
 
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Prof. Dr. Diego Kuonen
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)Prof. Dr. Diego Kuonen
 
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Prof. Dr. Diego Kuonen
 

Andere mochten auch (17)

One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media Markets
 
Making Sense of Data
Making Sense of DataMaking Sense of Data
Making Sense of Data
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
 
A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
 
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
 
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
 
A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)
 
A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)
 
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
 
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
 
Using Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingUsing Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human Trafficking
 
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
 

Ähnlich wie 3 Sexy Skills of Data Scientists

The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyNeo4j
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyGreta Workman
 
Predition Model for Stock Price on Big Data Analytics
Predition Model for Stock Price on Big Data AnalyticsPredition Model for Stock Price on Big Data Analytics
Predition Model for Stock Price on Big Data Analyticsijtsrd
 
IRJET- Building a Big Data Provenance with its Applications for Smart Cities
IRJET- Building a Big Data Provenance with its Applications for Smart CitiesIRJET- Building a Big Data Provenance with its Applications for Smart Cities
IRJET- Building a Big Data Provenance with its Applications for Smart CitiesIRJET Journal
 
從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用Pei-Syuan Li
 
The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)Ben Siscovick
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxcalf_ville86
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data ScientistDATAVERSITY
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesBen Siscovick
 
Labour supply and demand forecasts final
Labour supply and demand forecasts   finalLabour supply and demand forecasts   final
Labour supply and demand forecasts finalChrisFerris
 
Dr. dzaharudin mansor microsoft
Dr. dzaharudin mansor   microsoftDr. dzaharudin mansor   microsoft
Dr. dzaharudin mansor microsoftSoo Chin Hock
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolLaurent Kinet
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 
RMDS data science ecosystem approach
RMDS data science ecosystem approachRMDS data science ecosystem approach
RMDS data science ecosystem approachAlex Liu
 

Ähnlich wie 3 Sexy Skills of Data Scientists (20)

The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
Predition Model for Stock Price on Big Data Analytics
Predition Model for Stock Price on Big Data AnalyticsPredition Model for Stock Price on Big Data Analytics
Predition Model for Stock Price on Big Data Analytics
 
IRJET- Building a Big Data Provenance with its Applications for Smart Cities
IRJET- Building a Big Data Provenance with its Applications for Smart CitiesIRJET- Building a Big Data Provenance with its Applications for Smart Cities
IRJET- Building a Big Data Provenance with its Applications for Smart Cities
 
從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用
 
The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data Scientist
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA Ventures
 
Data_Mining.ppt
Data_Mining.pptData_Mining.ppt
Data_Mining.ppt
 
Math2015
Math2015Math2015
Math2015
 
Labour supply and demand forecasts final
Labour supply and demand forecasts   finalLabour supply and demand forecasts   final
Labour supply and demand forecasts final
 
Dr. dzaharudin mansor microsoft
Dr. dzaharudin mansor   microsoftDr. dzaharudin mansor   microsoft
Dr. dzaharudin mansor microsoft
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
 
Bigdata
Bigdata Bigdata
Bigdata
 
RMDS data science ecosystem approach
RMDS data science ecosystem approachRMDS data science ecosystem approach
RMDS data science ecosystem approach
 

Kürzlich hochgeladen

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Kürzlich hochgeladen (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

3 Sexy Skills of Data Scientists

Hinweis der Redaktion

  1. I’ve added an addendum to this talk – These skills aren’t just sexy for individualsStart-ups with these skills in-house are also sexy investments – we wouldn’t be meeting here today if that weren’t the case.The motivation for this talk was Hal Varian’s quip that Statisticians are the Sexy Profession of the Next Decade. I thought how I could mash up data with sexy, and this is what I got.
  2. Let’s set the stage. Joe Hellerstein has said that we’re living in the Industrial Revolution of Data.Big Data means.
  3. An important note: big data is not just about volume, it’s about velocity.Systems must be dramatically re-architected when they shift from monolithic to modular: unicellular to multicellular.Most of the additional complexity goes into interfaces between the pieces.Regardless, I define Big Data as data that is distributed.Transition: how did we get here, to a world chock full of exabytes?
  4. Attack of the exponentials.
  5. This is what’s happened in the last four decades.These four factors also happen to be inputs for data generation processes. So what happen
  6. Kurzweil reference, I call this the data singularity.CPU cost and storage costs have fallen faster than network and disk IO have risen – meaning more data can be stored & processed locally than can be shipped around. This has strategic implications: data is heavy, and hard to move once it lands somewhere. This puts Amazon, for instance, at enormous competitive advantage over its cloud computing peers.Data is heavy. Strategic implications.Things can be explode.
  7. Kurzweil reference, I call this the data singularity.CPU cost and storage costs have fallen faster than network and disk IO have risen – meaning more data can be stored & processed locally than can be shipped around. This has strategic implications: data is heavy, and hard to move once it lands somewhere. This puts Amazon, for instance, at enormous competitive advantage over its cloud computing peers.Data is heavy. Strategic implications.Things can be explode.
  8. Athabasca Sands of Canada. There are parallels; mining value from these tar sands illustrates the point that these efforts were only worthwhile once value of oil extracted exceeded cost of extraction. The same holds true for data.Where are the Athabasca Tar sands of data?(Graphic showing value > cost threshold with example data)The economics of data aggregation and analysis have shifted dramatically: compelling (i) new categories of data to be stored & collected, (ii) re-examination of already collected but frequently disposed dataIn either case, the criteria is the same: economic value > cost of analysisBut the process of capitalizing on these emerging opportunities, of converting data volumes into value, requires a unique skill set.When concentrated in a single individual or within a start-up, they are a powerful cocktail – sexy to employers and investors alikeThese are the three sexy skills I discuss nextNot all data is worth keeping / aggregating / analyzing.Formerly rehabilitate data that wasn’t meritorious.Amazon stock chart as punchline.So few people had access to these tools. The scientist moniker is almost counter to what we traditionally as scientist. Call out that hacker ethos of the data scientist.
  9. Few individuals have all these skills concentrated in one. That is, after all, the advantage of a start-up – where talents can compliment one another.
  10. It is a painful process.Transition: Most of us are used to confronting files that look like:
  11. Grab a screendump from the Oracle database scrape from 10 years of advertising data from a London publishing partner of ours.
  12. Datamunging is a labor intensive and painful process; often 80% of time in an analysis project can be spent on this pieceThe tools used are typically high-level scripting languages like Python, Ruby, Perl If you want to know more about munging, we have two world-class data mungers are here with us today, Pete Skomoroch & Flip Kromer. Pete built a site that mines Wikipedia’s edit logs for trending news topics, and Flip is the force behind InfoChimps, and has written more parsers than almost the rest of us combined.
  13. Abstraction, symbology, ontology…
  14. Statistics is the grammar of data science. For those who feel that stats is dominated by old white dead men…
  15. That because it is. But these old dead white men have some powerful ideas.
  16. Statistics allows us to provided reduced descriptions of the world, in the form of models.In this way, they are reductive: models capture only the essential features of the data.
  17. Statistical or machine-learning based data product are a staple of nearly every data-driven start-up in town. Here are just a few.Both in the process of developing a data product, data visualization plays an important role.
  18. Our eyes are the highest capacity bandwidth channel we have.Visualization is a means of surfacing otherwise intangibly large data sets.Two broad classes: exploratory, audience of 1 or 2, characterized by rapid iterations, local development, not in printNarrative: a point of view has been established and viz is supposed to help drive the story forward.
  19. Tukey
  20. Wattenberg stream graphs
  21. Storytelling. Human-size for human decision makers – telling stories with the data, through visualization, to communicate massive scales to people that execute and make decisions.
  22. Good luck. Tableau is desktop.
  23. This is an open source stack, and this vibrant big data hacker community actively building these tools.Specifically how its manifesting that we’re using in our country; he’s where we’re paying and here’s where we not. Here’s the solution interim. The stack is loosely coupled: right tool for the right job. The need for a dedicated analytics RDBMSYou know who sits on the top of that stack? We do. That’s why storytelling is such an important skill.Commoditization moves from the bottom up.
  24. I’ve added an addendum to this talk – These skills aren’t just sexy for individualsStart-ups with these skills in-house are also sexy investments – we wouldn’t be meeting here today if that weren’t the case.The motivation for this talk was Hal Varian’s quip that Statisticians are the Sexy Profession of the Next Decade. I thought how I could mash up data with sexy, and this is what I got.
  25. I’m defining data Science is: applying tools to data to answer questions. It is at the intersection of these tools. And it is a growing field, because data is getting bigger, and our tools are getting better. (Suffice to say, the questions we ask have been around since time immemorial: whoAnother word for questions is hypotheses.
  26. There’s been a lot of talk about Big Data in the past year. Articles and conferences.