SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
1
Overcoming
DataOps hurdles for
ML in Production
August 2020
SANDEEP UTTAMCHANDANI
CHIEF DATA OFFICER and VP OF ENGINEERING
sandeep@unraveldata.com
2
Behind the scenes of a ML Model in Production
3
DATA ML Model in
Production
Discover Prep Build Operationalize
DataOps
4
Top 10 DataOps Battlescars
Levels of
Automation
Gather technical metadata
Gather operational metadata
Aggregate tribal
knowledge
1. “I thought the attribute means something else”
Battlescar:
Incorrect assumptions about the meaning of attributes, whether it is the
source of truth, owner/common users, versioning, whether dataset is
trustworthy?
Metric:
Time to
Interpret
Building a Self-Service Metadata Catalog
1. “I thought the attribute means something else?”
Battlescar:
Incorrect assumptions about the meaning of attributes, whether it is the
source of truth, owner/common users, versioning, whether dataset is
trustworthy?
Metric:
Time to
Interpret
Building a Self-Service Metadata Catalog
Intuit
7
2. “Where is the dataset I need for my model?”
Battlescar:
Building a customer support forecasting model. Data was silo’ed across
business units. 4+ months of connecting to data stewards to locate the data
attributes required for building the model
Building a Self-Service Search Service
Levels of
Automation
Indexing of datasets &
artifacts
Search Relevance ranking
Access control of
search results
Metric:
Time to
Find
8
Battlescar:
Building a customer support forecasting model. Data was silo’ed across
business units. 4+ months of connecting to data stewards to locate the data
attributes required for building the model
Building a Self-Service Search Service
Metric:
Time to
Find
2. “Where is the dataset I need for my model?”
9
3. “1000 rows in source database -- why only 50 rows in
data lake?”
Battlescar:
Issues in correctness, completeness, timeliness in moving data
daily/hourly from transactional datastores to centralized data lake
Metric:
Time to
Move
Building a Self-Service Data Movement service
Data Ingestion Configuration
Data Transformation
Change Mgt
Levels of
Automation
10
4. “Job completed but dashboard graphs have data missing?”
Battlescar:
Jobs are orchestrated using schedulers (such as Airflow, Oozie). Several
times, the job dependencies are incorrect, leading to reporting or model
training jobs to be triggered prematurely.
Metric:
Time to
Orchestrate
Building a Self-Service orchestration Service
Levels of
Automation
Defining Job Dependencies
Robust Job Execution
Production
Monitoring
11
5. “Data processing was supposed to complete at 8 am. Its 4pm
and my model retraining job is still waiting?”
Battlescar:
Writing efficient Big Data processing applications is non-trivial. With
plethora of technologies, gaining broad expertise is difficult even for
expert data engineers.
Metric:
Time to
Optimize
Building a Self-Service query optimization Service
Levels of
Automation
Aggregating query, cluster,
resource Stats
Analyzing & correlating
stats
Tuning Jobs
12
6. “Customer changed preference to no marketing emails. Why are
we still including in email campaigns?”
Battlescar:
Without a consistent primary key to identify the customer across data
silos, where recurring issues arise. Emerging Data Rights such as
GDPR, CCPA, require complying with customer preferences on what
data is collected, how it is used, deleted on request.
Metric:
Time to
Comply
Building a Self-Service data rights governance Service
Levels of
Automation
Tracking customer data lifecycle
and preferences
Executing customer’s
data rights requests
Use-case
based access
control
13
7. “Job pipeline ran for 15 hours and now we detect data
quality issue upon completion -- could we be proactive?”
Battlescar:
Data issues in a long running business critical job leads to missing
insights. Only when results don't look correct that we realize there is an
issue.
Metric:
Time to
Insights
Quality
Building a Self-Service data observability Service
Levels of
Automation
Verify accuracy of data
Detect anomalies
Avoid data
quality issues
14
8. “Using the best polyglot datastores -- how do I now write
queries effectively across this data?”
Battlescar:
Significant time spent in planning, design, and writing queries that
process data across datastores
Metric:
Time to
Virtualize
Datastores
Building a Self-Service data virtualization Service
Levels of
Automation
Automatic query routing
Managing datastore
specific queries
Joining across
transactional
sources
15
9. “I ran a A/B experiment -- need to build time-consuming
data pipelines to now analyze the data”
Battlescar:
Analyzing experimental results in a consistent fashion is a nightmare. No
consistent definitions between metrics used for experimental analysis
and business reporting
Metric:
Time to A/B
Test
Building a Self-Service A/B Testing Service
Levels of
Automation
16
10. “Data processing jobs last week cost us 30% more. Why?”
Battlescar:
Especially in the cloud, $ cost is linear to usage. Tracking budgets and
spend to effectively optimize requires non-trivial effort.
Metric:
Time to
Cost
Governance
Building a Self-Service cost governance Service
Levels of
Automation
Expenditure Observability
Matching
Supply-Demand
Continuous Cost
Optimization
17
Wrap up: Advice on Managing your DataOps
18
People
Process Technology
DataOps hurdles vary and depends on...
19
Self-Service has levels (not binary)
20
Discover Prep Build Operationalize
TIME-TO-INSIGHT
Measuring Current DataOps:
Time-to-Insight Metric
DATA
21
Discover Prep Build Operationalize
Time-to-Insight Scorecard
22
Discover Prep Build Operationalize
Creating Your Time-to-Insight Scorecard
WeeksDaysHoursLegend:
23
Call for Action: Making DataOps Self-Service
1. Measure
Create your
Time-to-Insight Scorecard
Self-Service
DataOps
2. Learn
Shortlist 1-2 scorecard
metrics to improve level
of automation
3. Build
Implement well-known
design patterns in your
data platform to make the
metrics self-service
24
Upcoming Book: The Self-Service Data Roadmap
Available Sept’20
Early Release Available on O’Reilly:
https://www.oreilly.com/library/view/the-self-service-data/9781492075240/
25
CONTACT US TO SCHEDULE A DATA OPERATIONS HEALTH CHECK TODAY
hello@unraveldata.com

Weitere ähnliche Inhalte

Was ist angesagt?

The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcarePerficient, Inc.
 
Webinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data IntegrationWebinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data IntegrationSnapLogic
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseTalend
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Talend
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
 
Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Michelle Ufford
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningTalend
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 
5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend ConnectTalend
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceEric Kavanagh
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationSnapLogic
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Sri Ambati
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendTalend
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicRadovan Baćović
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Data Con LA
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationEric Kavanagh
 
Cloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIsCloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIsSnapLogic
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 

Was ist angesagt? (20)

The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Webinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data IntegrationWebinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data Integration
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - Talend
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovic
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Cloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIsCloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIs
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 

Ähnlich wie Overcoming DataOps hurdles for ML in Production

Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
StreamCentral for the IT Professional
StreamCentral for the IT ProfessionalStreamCentral for the IT Professional
StreamCentral for the IT ProfessionalRaheel Retiwalla
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Build it…will they come by Shawn Trainer
 Build it…will they come by Shawn Trainer Build it…will they come by Shawn Trainer
Build it…will they come by Shawn TrainerData Con LA
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMProduct School
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationInside Analysis
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfssuserd23711
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Technologies
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDataStax
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloudredmondpulver
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Memoori
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 

Ähnlich wie Overcoming DataOps hurdles for ML in Production (20)

Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
StreamCentral for the IT Professional
StreamCentral for the IT ProfessionalStreamCentral for the IT Professional
StreamCentral for the IT Professional
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Build it…will they come by Shawn Trainer
 Build it…will they come by Shawn Trainer Build it…will they come by Shawn Trainer
Build it…will they come by Shawn Trainer
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce Deck
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 

Kürzlich hochgeladen

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 

Kürzlich hochgeladen (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Overcoming DataOps hurdles for ML in Production

  • 1. 1 Overcoming DataOps hurdles for ML in Production August 2020 SANDEEP UTTAMCHANDANI CHIEF DATA OFFICER and VP OF ENGINEERING sandeep@unraveldata.com
  • 2. 2 Behind the scenes of a ML Model in Production
  • 3. 3 DATA ML Model in Production Discover Prep Build Operationalize DataOps
  • 4. 4 Top 10 DataOps Battlescars
  • 5. Levels of Automation Gather technical metadata Gather operational metadata Aggregate tribal knowledge 1. “I thought the attribute means something else” Battlescar: Incorrect assumptions about the meaning of attributes, whether it is the source of truth, owner/common users, versioning, whether dataset is trustworthy? Metric: Time to Interpret Building a Self-Service Metadata Catalog
  • 6. 1. “I thought the attribute means something else?” Battlescar: Incorrect assumptions about the meaning of attributes, whether it is the source of truth, owner/common users, versioning, whether dataset is trustworthy? Metric: Time to Interpret Building a Self-Service Metadata Catalog Intuit
  • 7. 7 2. “Where is the dataset I need for my model?” Battlescar: Building a customer support forecasting model. Data was silo’ed across business units. 4+ months of connecting to data stewards to locate the data attributes required for building the model Building a Self-Service Search Service Levels of Automation Indexing of datasets & artifacts Search Relevance ranking Access control of search results Metric: Time to Find
  • 8. 8 Battlescar: Building a customer support forecasting model. Data was silo’ed across business units. 4+ months of connecting to data stewards to locate the data attributes required for building the model Building a Self-Service Search Service Metric: Time to Find 2. “Where is the dataset I need for my model?”
  • 9. 9 3. “1000 rows in source database -- why only 50 rows in data lake?” Battlescar: Issues in correctness, completeness, timeliness in moving data daily/hourly from transactional datastores to centralized data lake Metric: Time to Move Building a Self-Service Data Movement service Data Ingestion Configuration Data Transformation Change Mgt Levels of Automation
  • 10. 10 4. “Job completed but dashboard graphs have data missing?” Battlescar: Jobs are orchestrated using schedulers (such as Airflow, Oozie). Several times, the job dependencies are incorrect, leading to reporting or model training jobs to be triggered prematurely. Metric: Time to Orchestrate Building a Self-Service orchestration Service Levels of Automation Defining Job Dependencies Robust Job Execution Production Monitoring
  • 11. 11 5. “Data processing was supposed to complete at 8 am. Its 4pm and my model retraining job is still waiting?” Battlescar: Writing efficient Big Data processing applications is non-trivial. With plethora of technologies, gaining broad expertise is difficult even for expert data engineers. Metric: Time to Optimize Building a Self-Service query optimization Service Levels of Automation Aggregating query, cluster, resource Stats Analyzing & correlating stats Tuning Jobs
  • 12. 12 6. “Customer changed preference to no marketing emails. Why are we still including in email campaigns?” Battlescar: Without a consistent primary key to identify the customer across data silos, where recurring issues arise. Emerging Data Rights such as GDPR, CCPA, require complying with customer preferences on what data is collected, how it is used, deleted on request. Metric: Time to Comply Building a Self-Service data rights governance Service Levels of Automation Tracking customer data lifecycle and preferences Executing customer’s data rights requests Use-case based access control
  • 13. 13 7. “Job pipeline ran for 15 hours and now we detect data quality issue upon completion -- could we be proactive?” Battlescar: Data issues in a long running business critical job leads to missing insights. Only when results don't look correct that we realize there is an issue. Metric: Time to Insights Quality Building a Self-Service data observability Service Levels of Automation Verify accuracy of data Detect anomalies Avoid data quality issues
  • 14. 14 8. “Using the best polyglot datastores -- how do I now write queries effectively across this data?” Battlescar: Significant time spent in planning, design, and writing queries that process data across datastores Metric: Time to Virtualize Datastores Building a Self-Service data virtualization Service Levels of Automation Automatic query routing Managing datastore specific queries Joining across transactional sources
  • 15. 15 9. “I ran a A/B experiment -- need to build time-consuming data pipelines to now analyze the data” Battlescar: Analyzing experimental results in a consistent fashion is a nightmare. No consistent definitions between metrics used for experimental analysis and business reporting Metric: Time to A/B Test Building a Self-Service A/B Testing Service Levels of Automation
  • 16. 16 10. “Data processing jobs last week cost us 30% more. Why?” Battlescar: Especially in the cloud, $ cost is linear to usage. Tracking budgets and spend to effectively optimize requires non-trivial effort. Metric: Time to Cost Governance Building a Self-Service cost governance Service Levels of Automation Expenditure Observability Matching Supply-Demand Continuous Cost Optimization
  • 17. 17 Wrap up: Advice on Managing your DataOps
  • 20. 20 Discover Prep Build Operationalize TIME-TO-INSIGHT Measuring Current DataOps: Time-to-Insight Metric DATA
  • 21. 21 Discover Prep Build Operationalize Time-to-Insight Scorecard
  • 22. 22 Discover Prep Build Operationalize Creating Your Time-to-Insight Scorecard WeeksDaysHoursLegend:
  • 23. 23 Call for Action: Making DataOps Self-Service 1. Measure Create your Time-to-Insight Scorecard Self-Service DataOps 2. Learn Shortlist 1-2 scorecard metrics to improve level of automation 3. Build Implement well-known design patterns in your data platform to make the metrics self-service
  • 24. 24 Upcoming Book: The Self-Service Data Roadmap Available Sept’20 Early Release Available on O’Reilly: https://www.oreilly.com/library/view/the-self-service-data/9781492075240/
  • 25. 25 CONTACT US TO SCHEDULE A DATA OPERATIONS HEALTH CHECK TODAY hello@unraveldata.com