SlideShare ist ein Scribd-Unternehmen logo
Data Collaboration Stack:
From DataOps to MLOps
Pierre Brunelle, 2022 Rev3 - NYC @pjlbrunelle
|
Cloud-based technologies centered on data to empower users to explore and use data…
1 Ideal Modern Data Stack
Building Blocks & Best-of-Breed Approach
… BI
[Reverse]
E(L)TL
Workspace
No-code
Catalog &
Governance
Modeling
Warehouse,
Lake, &
Mesh...
Spreadsheet …
Feature Metrics
|
2 Reality Check: Modern Data Stack
|
4 Most painful issues when interacting with data, by order of priority
Data Quality Issues
Difficulty accessing data and insufficient quantity
Explainability
Lack of ETL Automation / Data Warehousing Issues
Convincing Stakeholders
Reproducibility
Insufficient Hardware
Unsure of best approach or technique to use…?
Need to be able to iterate quickly
|
5 What the Data Collaboration Stack addresses:
Data Quality Issues
Difficulty accessing data and insufficient quantity
Explainability
Lack of ETL Automation / Data Warehousing Issues
Convincing Stakeholders
Reproducibility
Insufficient Hardware
Unsure of best approach or technique to use…?
Need to be able to iterate quickly
|
4 Head Full of Fresh Ops: Smooth out the Data Workflows and Processes
|
4 A Simplified Data Science Workflow
Feature Engineering
Preparation
Selection
Modeling
Data Cleaning and
Labeling
Data Collection
Optimization
Ensembling
Validation
Improvement
Monitoring
Deployment
Productionization
Code is merely 5-10% of any machine learning solution.
|
4 Addressing the Skill Gap for Data Science through Collaboration
8
Analytics &
Visualization
Statistics &
Mathematics
Computer
Science
Domain
Expertise
Machine
Learning
Analyst Data Scientist Engineer Researcher PM/Business
|
4 A Typical Data Science Project
https://arxiv.org/pdf/2001.06684.pdf
|
4 Collaboration at Amazon Core AI (Amazon Artificial Intelligence Group)
● Price Elasticities
● Economic Impact of Abusing Behavior
● Deep learning to describe products
and services
● Debiasing techniques
● Demand across geography to minimize
transportation costs
● “Image” scanning with Optical
Character Recognition (OCR)
● Multi-arm bandit algorithm to improve
predicted revenue
● Methods for inventory management
Data
Engineers
Product
Customer
Decision
Makers
Partners
Legal
Economists
Data
Scientists
Data
Workflow
|
4 Data-Centric AI (Garbage I/O) and Bayesian Networks require Collaboration
● A visualization of the structure of the
model and motivate the design of new
models.
● Insights into the presence and
absence of the relationships between
random variables.
● A way to structure complex
probability calculations.
● What are the random variables in
the problem?
● What are the conditional
relationships between the variables?
● What are the probability
distributions for each variable?
Subject-Matter Experts (SMEs) are integral
to the development process.
Provides Requires
|
4 Collaboration is required at every single step
Data (science) teams are extremely
collaborative and work with a variety
of stakeholders and tools
|
5 Examples: How Does Collaboration Take Place?
Data Scientist: Having members of the same team work simultaneously on the same
notebook document
Finance Analyst: Having versioned reports that can be re-usable by others
Data Engineer: Having running job status to be communicated to many stakeholders
and shareable
Data Scientist: Having a notion of ownership around artifacts (data, code, and models)
Data Scientist: Having the ability to rapidly clone and reproduce experiments
ML Engineer: Having the ability to search, browse, and organize code, data, and models
Collaboration as Simple Rules
Pierre Brunelle, 2022 Rev3 - NYC
|
4 From Data To Wisdom
Any Data Workflow…
Gather
Clean
Transform
Explore
Represent
Prescribe
Present
Decide
Data Information Knowledge Insight Wisdom
|
4 Collaborative Data Workflows
● Data Engineering
● Data Analytics
● Data Science
● Data Visualization
Collaboration in…
Gather
Clean
Transform
Explore
Represent
Prescribe
Present
Decide
Data
Workflow
|
4 Collaborative Data Ecosystem
Team B
Team C
Team A
Team D
Team E
Team F
Maintainers Producers Consumers @kafonek
|
4 Pierre’s Collaborative Modern Data Stack
● Discover Data
● Share Across
● Secure Governance
● Control Workflows
● Personalized Views
Eliminate Data Silos
Infrastructure
Infrastructure
Infrastructure
Storage, Access,
& Transformation
Management,
Governance, &
Observability
Infrastructure
Explore, Analyze, &
Publish
|
5 Collaboration as Simple Rules
● Import & Export
● Search & Navigation
● Annotation (e.g. Comment, Tagging…)
● User Segmentation
● Support (at least) asynchronous teamwork
● Content Management & Sharing (e.g. Version Control, Change…)
Key Elements
|
5 What Modern Data Stack is it?
Infrastructure
Storage, Access, & Transformation
Management, Governance, &
Observability
Explore, Analyze, & Publish
|
5 Example: Pierre’s Online E-Commerce Modern Data Stack
|
5 Want to read about Data Collaboration…
“Companies that are in control of their own data generation are those who can get the quickest benefit out
of that data collaboration” - Blake Burch, CEO at Shipyard
“tools empowering data collaboration would come in handy.” - Eti Gwirtz, VP Product at GigaSpaces.
“readiness to experiment and engaging with multiple stakeholders across the organization with specific
roles but ones that need collaboration” - Akhilesh Ayer, EVP and Global Head at WNS Triange.
“It isn’t so much a matter of which industries stand to gain from data collaboration, but that most
businesses can optimize their performance and accuracy by embracing data collaboration” - James
Shalhoub, CEO at Finn
Organizations can solve these challenges by improving cross-functional collaboration between team leaders
and their data team to make insights accessible to the broader team while also shining a light on the most
important metrics to analyze” - Ryan G. Smith, CEO at LeafLink
“For companies with one data person, the collaboration is happening with non-data people, so more of the
data collaboration would likely be around communications of the insights and actions that need to be taken.
Whereas in a technical organization, data collaboration may mean that team members are sharing a GitHub
account and sharing code, as well as putting the code through a review process. The data professionals in
these two instances have very different challenges to face” - Emad Hasan, CEO at Retina
Q&A
@pjlbrunelle
Pierre Brunelle, 2022 Rev3 - NYC

Weitere ähnliche Inhalte

Was ist angesagt?

Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
An introduction to Business intelligence
An introduction to Business intelligenceAn introduction to Business intelligence
An introduction to Business intelligence
Hadi Fadlallah
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
ChrisFord803185
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Cathrine Wilhelmsen
 
Data Architecture PowerPoint Presentation Slides
Data Architecture PowerPoint Presentation SlidesData Architecture PowerPoint Presentation Slides
Data Architecture PowerPoint Presentation Slides
SlideTeam
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data model
DATAVERSITY
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
SOMASUNDARAM T
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Etl design document
Etl design documentEtl design document
Etl design document
sgyazuddin
 
Data mesh
Data meshData mesh
Data mesh
ManojKumarR41
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics
amorshed
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Juhi Mahajan
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
Maggie Hays
 

Was ist angesagt? (20)

Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
An introduction to Business intelligence
An introduction to Business intelligenceAn introduction to Business intelligence
An introduction to Business intelligence
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
 
Data Architecture PowerPoint Presentation Slides
Data Architecture PowerPoint Presentation SlidesData Architecture PowerPoint Presentation Slides
Data Architecture PowerPoint Presentation Slides
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data model
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Etl design document
Etl design documentEtl design document
Etl design document
 
Data mesh
Data meshData mesh
Data mesh
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
 

Ähnlich wie Data Collaboration Stack

Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
shashanksalunkhe12
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
DATAVERSITY
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
Data Blueprint
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
Alan Morrison
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
Dell World
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
Qubole
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
DATAVERSITY
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
Precisely
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Chris Dagdigian
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Denodo
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptxExplorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
windu19
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
Neo4j
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
adamkraut
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data Blueprint
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
DATAVERSITY
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
Thinkful
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
Betacowork
 

Ähnlich wie Data Collaboration Stack (20)

Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptxExplorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 

Kürzlich hochgeladen

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 

Kürzlich hochgeladen (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 

Data Collaboration Stack

  • 1. Data Collaboration Stack: From DataOps to MLOps Pierre Brunelle, 2022 Rev3 - NYC @pjlbrunelle
  • 2. | Cloud-based technologies centered on data to empower users to explore and use data… 1 Ideal Modern Data Stack Building Blocks & Best-of-Breed Approach … BI [Reverse] E(L)TL Workspace No-code Catalog & Governance Modeling Warehouse, Lake, & Mesh... Spreadsheet … Feature Metrics
  • 3. | 2 Reality Check: Modern Data Stack
  • 4. | 4 Most painful issues when interacting with data, by order of priority Data Quality Issues Difficulty accessing data and insufficient quantity Explainability Lack of ETL Automation / Data Warehousing Issues Convincing Stakeholders Reproducibility Insufficient Hardware Unsure of best approach or technique to use…? Need to be able to iterate quickly
  • 5. | 5 What the Data Collaboration Stack addresses: Data Quality Issues Difficulty accessing data and insufficient quantity Explainability Lack of ETL Automation / Data Warehousing Issues Convincing Stakeholders Reproducibility Insufficient Hardware Unsure of best approach or technique to use…? Need to be able to iterate quickly
  • 6. | 4 Head Full of Fresh Ops: Smooth out the Data Workflows and Processes
  • 7. | 4 A Simplified Data Science Workflow Feature Engineering Preparation Selection Modeling Data Cleaning and Labeling Data Collection Optimization Ensembling Validation Improvement Monitoring Deployment Productionization Code is merely 5-10% of any machine learning solution.
  • 8. | 4 Addressing the Skill Gap for Data Science through Collaboration 8 Analytics & Visualization Statistics & Mathematics Computer Science Domain Expertise Machine Learning Analyst Data Scientist Engineer Researcher PM/Business
  • 9. | 4 A Typical Data Science Project https://arxiv.org/pdf/2001.06684.pdf
  • 10. | 4 Collaboration at Amazon Core AI (Amazon Artificial Intelligence Group) ● Price Elasticities ● Economic Impact of Abusing Behavior ● Deep learning to describe products and services ● Debiasing techniques ● Demand across geography to minimize transportation costs ● “Image” scanning with Optical Character Recognition (OCR) ● Multi-arm bandit algorithm to improve predicted revenue ● Methods for inventory management Data Engineers Product Customer Decision Makers Partners Legal Economists Data Scientists Data Workflow
  • 11. | 4 Data-Centric AI (Garbage I/O) and Bayesian Networks require Collaboration ● A visualization of the structure of the model and motivate the design of new models. ● Insights into the presence and absence of the relationships between random variables. ● A way to structure complex probability calculations. ● What are the random variables in the problem? ● What are the conditional relationships between the variables? ● What are the probability distributions for each variable? Subject-Matter Experts (SMEs) are integral to the development process. Provides Requires
  • 12. | 4 Collaboration is required at every single step Data (science) teams are extremely collaborative and work with a variety of stakeholders and tools
  • 13. | 5 Examples: How Does Collaboration Take Place? Data Scientist: Having members of the same team work simultaneously on the same notebook document Finance Analyst: Having versioned reports that can be re-usable by others Data Engineer: Having running job status to be communicated to many stakeholders and shareable Data Scientist: Having a notion of ownership around artifacts (data, code, and models) Data Scientist: Having the ability to rapidly clone and reproduce experiments ML Engineer: Having the ability to search, browse, and organize code, data, and models
  • 14. Collaboration as Simple Rules Pierre Brunelle, 2022 Rev3 - NYC
  • 15. | 4 From Data To Wisdom Any Data Workflow… Gather Clean Transform Explore Represent Prescribe Present Decide Data Information Knowledge Insight Wisdom
  • 16. | 4 Collaborative Data Workflows ● Data Engineering ● Data Analytics ● Data Science ● Data Visualization Collaboration in… Gather Clean Transform Explore Represent Prescribe Present Decide Data Workflow
  • 17. | 4 Collaborative Data Ecosystem Team B Team C Team A Team D Team E Team F Maintainers Producers Consumers @kafonek
  • 18. | 4 Pierre’s Collaborative Modern Data Stack ● Discover Data ● Share Across ● Secure Governance ● Control Workflows ● Personalized Views Eliminate Data Silos Infrastructure Infrastructure Infrastructure Storage, Access, & Transformation Management, Governance, & Observability Infrastructure Explore, Analyze, & Publish
  • 19. | 5 Collaboration as Simple Rules ● Import & Export ● Search & Navigation ● Annotation (e.g. Comment, Tagging…) ● User Segmentation ● Support (at least) asynchronous teamwork ● Content Management & Sharing (e.g. Version Control, Change…) Key Elements
  • 20. | 5 What Modern Data Stack is it? Infrastructure Storage, Access, & Transformation Management, Governance, & Observability Explore, Analyze, & Publish
  • 21. | 5 Example: Pierre’s Online E-Commerce Modern Data Stack
  • 22. | 5 Want to read about Data Collaboration… “Companies that are in control of their own data generation are those who can get the quickest benefit out of that data collaboration” - Blake Burch, CEO at Shipyard “tools empowering data collaboration would come in handy.” - Eti Gwirtz, VP Product at GigaSpaces. “readiness to experiment and engaging with multiple stakeholders across the organization with specific roles but ones that need collaboration” - Akhilesh Ayer, EVP and Global Head at WNS Triange. “It isn’t so much a matter of which industries stand to gain from data collaboration, but that most businesses can optimize their performance and accuracy by embracing data collaboration” - James Shalhoub, CEO at Finn Organizations can solve these challenges by improving cross-functional collaboration between team leaders and their data team to make insights accessible to the broader team while also shining a light on the most important metrics to analyze” - Ryan G. Smith, CEO at LeafLink “For companies with one data person, the collaboration is happening with non-data people, so more of the data collaboration would likely be around communications of the insights and actions that need to be taken. Whereas in a technical organization, data collaboration may mean that team members are sharing a GitHub account and sharing code, as well as putting the code through a review process. The data professionals in these two instances have very different challenges to face” - Emad Hasan, CEO at Retina

Hinweis der Redaktion

  1. Data Ops: Data + DevOps - A set of practices to improve the quality and reduce the cycle time of data analytics. The main tasks in DataOps include data tagging, data testing, data pipeline orchestration, data versioning and data monitoring. ML Ops: ML + DevOps - A set of practices to design, build and manage reproducible, testable and sustainable ML-powered software AI Ops: AI + DevOps
  2. Including SMEs who actually understand how to label and curate your data in the loop allows data scientists to inject domain expertise directly into the model. Once done, this expert knowledge can be codified and deployed for programmatic supervision.
  3. Efficiency Increase Time Savings Reproducibility Community Building Onboarding
  4. 70% of respondents to a recent Harvard Business Review survey acknowledged they were not very effective at data sharing.1 Organizations that share data externally with their partners generate three times more measurable economic benefits than their counterparts that do not.2