SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
From Lab to FactoryFrom Lab to Factory
Orhowtoturndataintovalue?Orhowtoturndataintovalue?
PyData track at PyCon Ireland
Late October 2015
peadarcoyle@googlemail.com
All opinions my own
Who am I?Who am I?
Type (A) data scientist - focused on analysis - c.f.
Masters in Mathematics
Industry for nearly 3 years
Specialized in Statistics and Machine Learning
Passionate about turning data into products
Occasional contributor to OSS - Pandas and PyMC3
Speak and teach at PyData, PyCon and EuroSciPy
@springcoil
Chang
Aims of this talkAims of this talk
"We need more success stories" - Ian Ozsvald
Lessons on how to deliver value quickly in a project
Solutions to the last mile problem of delivering value
What IS a Data Scientist?What IS a Data Scientist?
I think a data scientist is someone with enough programming ability
to leverage their mathematical skills and domain speciïŹc knowledge
to turn data into solutions.
The solution should ideally be a product
However even powerpoint can be the perfect delivery mechanism
What do Data Scientists talk about?What do Data Scientists talk about?
Based on my Interview series!Dataconomy
Some NLP on the Interviews!Some NLP on the Interviews!
HT: Sean J. Taylor and Hadley Wickham
How do I bring value as a data geek?How do I bring value as a data geek?
Getting models used is a hard problem (trust me :) )
How do we turn insight into action?
How do we train people to trust models?
Visualise ALL THE THINGS!!Visualise ALL THE THINGS!!
(Relay foods dataset - HT Greg Reda)
Consumer behaviour at a Fast Food Restaurant per year in the USA
What projects work?What projects work?
Explaining existing data (visualization!)
Automate repetitive/ slow processes
Augment data to make new data (Search engines, ML models)
Predict the future (do something more accurately than gut feel )
Simulate using statistics :) (Rugby models, A/B testing)
Data Science projects are risky!Data Science projects are risky!
Many stakeholders think that data science is just an engineering problem,
but research is high risk and high reward
Derisking the project - how? Send me examples :)
https://github.com/ianozsvald/data_science_delivered
(HT: The Yhat people - )www.yhathq.com
What are the blockers?What are the blockers?
Domain knowledge and understanding - can't be
faked :)
DiïŹƒcult to extract information and produce good
visualizations without engineering and business.
Example - it took me months to be able to do
good correlation analysis of Energy markets
"You need data ïŹrst" - Peadar Coyle"You need data ïŹrst" - Peadar Coyle
Copying and pasting PDF/PNG data
Messy csv ïŹles and ERP output
Scale?
Getting data in some areas is hard!!
Months for extraction!!!
Some tools for web data extraction
Messy APIs without documentation :(
Augmenting data and using API'sAugmenting data and using API's
Sentiment analysis
Improving risk models with
data from other sources
like Quandl
Air TraïŹƒc data blend - many many API's.
Simulate: Six Nations with MCMCSimulate: Six Nations with MCMC
(PyMC3)(PyMC3)
Machine Learning
(HT: Ian Ozsvald)(HT: Ian Ozsvald)
Models are a small part of a problemModels are a small part of a problem
Only 1% of your time will be spent modelling
Stakeholder engagement, managing people and projects
Data pipelines and your infrastructure matters -
How is your model used? How do you get adoption?
Eoin Brazil Talk
Lessons learned from Lab to FactoryLessons learned from Lab to Factory
1. The 'magic quickly' problem is a big problem in any data science project
- our understanding of time frames and risk is unrealistic :)
2. Lack of a shared language between software engineers and data
scientists - but investing in the right tooling by using open standards
allows success.
3. To help data scientists and analysts succeed your business needs to be
prepared to invest in tooling
4. Often you're working with other teams who use diïŹ€erent languages -
so micro services can be a good idea
How to deploy a modelHow to deploy a model??
Palladium (Otto Group)
Azure
Flask Microservice
Docker
Invest in toolingInvest in tooling
For your analysts and data scientists to succeed you need to invest in
infrastructure to empower them.
Think carefully how you want your company to spend its innovation
tokens and take advantage of the excellent tools available like
and AWS.
I think there is great scope for entrepreneurs to take advantage of this
arbitrage opportunity and build good tooling to empower data
scientists by building platforms.
Data scientists need better tools :) For all parts of the process :)
ScienceOps
Data Product DevelopmentData Product Development
Software Engineers aren't data scientists and shouldn't be expected to
write models in code.
A high value use of models is having them in production
Getting information from stakeholders is really valuable in improving
models.
(I gave a talk on using Yhat tech)Data Science models in Production
Use small data where possible!!Use small data where possible!!
Small problems with clean data are more important - (Ian Ozsvald)
Amazon machine with many Xeons and 244GB of RAM is less than 3
euros per hour. - (Ian Ozsvald)
Blaze, Xray, Dask, Ibis, etc etc -
"The mean size of a cluster will remain 1" - Matt Rocklin
PyData Bikeshed
Closing remarksClosing remarks
Dirty data stops projects
There are some good projects like Icy, Luigi, etc for transforming data
and improving data extraction
These tools are still not perfect, and they only cover a small amount of
problems
Stakeholder management is a challenge too
Come speak in
It isn't what you know it is who you know...
On I did a series of interviews with Data Scientists
Send me your dirty data and data deployment stories :)
Luxembourg
Dataconomy
My website
What is the Data Science process?What is the Data Science process?
Obtain
Scrub
Explore
Model
Interpret
Communicate (or Deploy)
A famous 'data product' - Recommendation engines
From Lab to Factory: Or how to turn data into value

Weitere Àhnliche Inhalte

Was ist angesagt?

Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data ScientistNarong Intiruk
 
How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6Zhihao Lin
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Edureka!
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science teamAshish Bansal
 
Data Science 101
Data Science 101Data Science 101
Data Science 101odsc
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Datasreekanthricky
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional careerDavid Rostcheck
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist Manjunath Sindagi
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and FutureGregory Piatetsky-Shapiro
 

Was ist angesagt? (20)

Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
 
How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 

Andere mochten auch

Big Data and Internet of Things for Managers
Big Data and Internet of Things for ManagersBig Data and Internet of Things for Managers
Big Data and Internet of Things for ManagersPeadar Coyle
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Peadar Coyle
 
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013Josh Wills, MLconf 2013
Josh Wills, MLconf 2013MLconf
 
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15MLconf
 
Consulting Skills for Data Scientists
Consulting Skills for Data ScientistsConsulting Skills for Data Scientists
Consulting Skills for Data ScientistsPeadar Coyle
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData StackPeadar Coyle
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringHakka Labs
 
How can Data Science benefit your business?
How can Data Science benefit your business?How can Data Science benefit your business?
How can Data Science benefit your business?Peadar Coyle
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in PythonPeadar Coyle
 
From Lab to Factory: Creating value with data
From Lab to Factory: Creating value with dataFrom Lab to Factory: Creating value with data
From Lab to Factory: Creating value with dataPeadar Coyle
 

Andere mochten auch (10)

Big Data and Internet of Things for Managers
Big Data and Internet of Things for ManagersBig Data and Internet of Things for Managers
Big Data and Internet of Things for Managers
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.
 
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
 
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
Josh Wills, Director of Data Science, Cloudera at MLconf SEA - 5/01/15
 
Consulting Skills for Data Scientists
Consulting Skills for Data ScientistsConsulting Skills for Data Scientists
Consulting Skills for Data Scientists
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
How can Data Science benefit your business?
How can Data Science benefit your business?How can Data Science benefit your business?
How can Data Science benefit your business?
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in Python
 
From Lab to Factory: Creating value with data
From Lab to Factory: Creating value with dataFrom Lab to Factory: Creating value with data
From Lab to Factory: Creating value with data
 

Ähnlich wie From Lab to Factory: Or how to turn data into value

The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teamsVenkatesh Umaashankar
 
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...Dataconomy Media
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedStanford University
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
1 data science with python
1 data science with python1 data science with python
1 data science with pythonVishal Sathawane
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1ISSIP
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Scienceds4good
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...Big Data Week
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 

Ähnlich wie From Lab to Factory: Or how to turn data into value (20)

The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
Big Data Oslo v 4 Sci Code: "Current Industry Projects within AI and the Best...
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
1 data science with python
1 data science with python1 data science with python
1 data science with python
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 

KĂŒrzlich hochgeladen

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

KĂŒrzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

From Lab to Factory: Or how to turn data into value

  • 1. From Lab to FactoryFrom Lab to Factory Orhowtoturndataintovalue?Orhowtoturndataintovalue? PyData track at PyCon Ireland Late October 2015 peadarcoyle@googlemail.com All opinions my own
  • 2. Who am I?Who am I? Type (A) data scientist - focused on analysis - c.f. Masters in Mathematics Industry for nearly 3 years Specialized in Statistics and Machine Learning Passionate about turning data into products Occasional contributor to OSS - Pandas and PyMC3 Speak and teach at PyData, PyCon and EuroSciPy @springcoil Chang
  • 3. Aims of this talkAims of this talk "We need more success stories" - Ian Ozsvald Lessons on how to deliver value quickly in a project Solutions to the last mile problem of delivering value
  • 4. What IS a Data Scientist?What IS a Data Scientist? I think a data scientist is someone with enough programming ability to leverage their mathematical skills and domain speciïŹc knowledge to turn data into solutions. The solution should ideally be a product However even powerpoint can be the perfect delivery mechanism
  • 5. What do Data Scientists talk about?What do Data Scientists talk about? Based on my Interview series!Dataconomy
  • 6. Some NLP on the Interviews!Some NLP on the Interviews!
  • 7. HT: Sean J. Taylor and Hadley Wickham
  • 8. How do I bring value as a data geek?How do I bring value as a data geek? Getting models used is a hard problem (trust me :) ) How do we turn insight into action? How do we train people to trust models?
  • 9.
  • 10. Visualise ALL THE THINGS!!Visualise ALL THE THINGS!! (Relay foods dataset - HT Greg Reda) Consumer behaviour at a Fast Food Restaurant per year in the USA
  • 11. What projects work?What projects work? Explaining existing data (visualization!) Automate repetitive/ slow processes Augment data to make new data (Search engines, ML models) Predict the future (do something more accurately than gut feel ) Simulate using statistics :) (Rugby models, A/B testing)
  • 12. Data Science projects are risky!Data Science projects are risky! Many stakeholders think that data science is just an engineering problem, but research is high risk and high reward Derisking the project - how? Send me examples :) https://github.com/ianozsvald/data_science_delivered
  • 13. (HT: The Yhat people - )www.yhathq.com
  • 14. What are the blockers?What are the blockers? Domain knowledge and understanding - can't be faked :) DiïŹƒcult to extract information and produce good visualizations without engineering and business. Example - it took me months to be able to do good correlation analysis of Energy markets
  • 15. "You need data ïŹrst" - Peadar Coyle"You need data ïŹrst" - Peadar Coyle Copying and pasting PDF/PNG data Messy csv ïŹles and ERP output Scale? Getting data in some areas is hard!! Months for extraction!!! Some tools for web data extraction Messy APIs without documentation :(
  • 16. Augmenting data and using API'sAugmenting data and using API's Sentiment analysis Improving risk models with data from other sources like Quandl Air TraïŹƒc data blend - many many API's.
  • 17. Simulate: Six Nations with MCMCSimulate: Six Nations with MCMC (PyMC3)(PyMC3)
  • 18. Machine Learning (HT: Ian Ozsvald)(HT: Ian Ozsvald)
  • 19. Models are a small part of a problemModels are a small part of a problem Only 1% of your time will be spent modelling Stakeholder engagement, managing people and projects Data pipelines and your infrastructure matters - How is your model used? How do you get adoption? Eoin Brazil Talk
  • 20. Lessons learned from Lab to FactoryLessons learned from Lab to Factory 1. The 'magic quickly' problem is a big problem in any data science project - our understanding of time frames and risk is unrealistic :) 2. Lack of a shared language between software engineers and data scientists - but investing in the right tooling by using open standards allows success. 3. To help data scientists and analysts succeed your business needs to be prepared to invest in tooling 4. Often you're working with other teams who use diïŹ€erent languages - so micro services can be a good idea
  • 21. How to deploy a modelHow to deploy a model?? Palladium (Otto Group) Azure Flask Microservice Docker
  • 22. Invest in toolingInvest in tooling For your analysts and data scientists to succeed you need to invest in infrastructure to empower them. Think carefully how you want your company to spend its innovation tokens and take advantage of the excellent tools available like and AWS. I think there is great scope for entrepreneurs to take advantage of this arbitrage opportunity and build good tooling to empower data scientists by building platforms. Data scientists need better tools :) For all parts of the process :) ScienceOps
  • 23. Data Product DevelopmentData Product Development Software Engineers aren't data scientists and shouldn't be expected to write models in code. A high value use of models is having them in production Getting information from stakeholders is really valuable in improving models. (I gave a talk on using Yhat tech)Data Science models in Production
  • 24. Use small data where possible!!Use small data where possible!! Small problems with clean data are more important - (Ian Ozsvald) Amazon machine with many Xeons and 244GB of RAM is less than 3 euros per hour. - (Ian Ozsvald) Blaze, Xray, Dask, Ibis, etc etc - "The mean size of a cluster will remain 1" - Matt Rocklin PyData Bikeshed
  • 25. Closing remarksClosing remarks Dirty data stops projects There are some good projects like Icy, Luigi, etc for transforming data and improving data extraction These tools are still not perfect, and they only cover a small amount of problems Stakeholder management is a challenge too Come speak in It isn't what you know it is who you know... On I did a series of interviews with Data Scientists Send me your dirty data and data deployment stories :) Luxembourg Dataconomy My website
  • 26.
  • 27. What is the Data Science process?What is the Data Science process? Obtain Scrub Explore Model Interpret Communicate (or Deploy)
  • 28. A famous 'data product' - Recommendation engines