SlideShare ist ein Scribd-Unternehmen logo
1 von 63
DataOps, the secret weapon for
delivering AI, data science, and business
intelligence value at speed
Harvinder Atwal
// Harvinder Atwal
MoneySuperMarket
// Web
dunnhumby
{"previous" : "Insight Director, Tesco Clubcard"}
LLOYDS BANKING GROUP
{"previous" : "Senior Manager, Customer Strategy and Insight"}
{"Current" : "Interim Chief Data Officer"}
@harvindersatwal
BRITISH AIRWAYS
{"previous" : "Senior Operational Research Analyst"}
{"about" : "me"}
@gmail.com
£2bn
SAVINGS
2019 estimate total of household savings
1993 80% 13.1 million $2 billion 989
We started life
as mortgages
2000
of UK Online
Adults visit one
of our websites
each year
MoneySuperMarket
Active users
2019
Market cap
2020
Product
Providers
3 major ways Data Science can help the
organisation
Product
Creation
Customer
Experience
Business
efficiency
Applications aren’t in short supply
Demand Forecasting
Capacity Forecasting
Marketing automation
Supply chain management, automatic ordering
Automatic scaling of infrastructure
Document Classification
Image Annotation
Customer Service
Machine Translation
Anomaly Detection
Product Recommendation
Fraud Detection
Image Selection
Text Generation
Predictive Maintenance
Automated Pricing
Automated routing
Medical diagnosis
What does the data about data say?
There’s
a
big
problem
Just 7.3% of organisations say the state of
their data and analytics is excellent*
*New Vantage Partners Big Data and AI Executive Survey 2020
Only 22% of companies are currently
seeing a significant return from data
science expenditures*
*Obligatory conference presentation quote from GartnerForresterMcKinsey Consulting. Sorry.
Kaggle’s The State of Data & Machine Learning Survey
What do people talk about?
London Strata
Technology is less important than you
think, because the data says so
19.1%
9.1%
2018 2020
*New Vantage Partners Big Data and AI Executive Survey 2020
"Principal Challenge to Becoming Data-Driven is
Technology"*
Thinking real-life Data Science is a Kaggle
competition
Algorithms are becoming less important
as AutoML improves
model.fit(X_train, y_train)
is actually the easiest part
Data Governance
Data Quality
Data Security
Test Data Management Version Control
Access
Control
Team
Organisation
Stakeholder
Buy-in
Outcome
Measurement
No one wants to talk about lack of value,
dirty data, people, processes and culture
HIRE DATA
SCIENTISTS
How businesses think they become data-
driven
1 2 3
MONEY
FLOWS
HOARD
DATA
4
Phrases I would like to ban #92
"Actionable
Insight"
Gartner has predicted that, “through 2022, only 20%
of analytic insights will deliver business outcomes.”
Step #1 Focus on Organisational Objectives
and Outcomes, not Data Outputs
Resources Activities Outputs Outcomes Impact
The Program Logic Model
Success does not start with data, data scientists, models, insight, or
technology, it literally ends with them.
Success starts with the impacts and outcomes you want and works back
from there to make them happen.
We need a completely different approach
to delivering outcomes from data
Historic (Telco) usage of data
Rigorous system architecture, development methods,
requirements gathering, testing and code reusability.
Computing was expensive
Some basic reporting capability
– BI was born
Data has little/no value once process
complete
Data sharing limited to applications
requiring it for specific business processes
Big Design Up Front (BDUF) Data
Warehouses to meet specific
requirements
Network
Mobile data
Messages
Clickstream
Social Media
Billing
Call Details
Multiple sources of
data
Storage and Compute are cheap
CRM
Viewing
Multiple data
formats
Semi Structured
Unstructured
Structured
Multiple data
silos
Inventory
Human Resources
Data Warehouses Cubes
and Marts
Operational Data Stores
Transactional Sources
File Systems
Big Data
We can no longer apply steam-age thinking to
data
Data Sources
Analytics
Tools
Customer
Lifetime Value
Modelling
Churn modelling
Financial
Forecasting
Fraud detection
Regulatory
Feeds
Offer
prioritisation
Next Best
Action
Sentiment
Analysis
Segmentation
Product Affinity
Cross-sell
modelling
Financial
Modelling
Cohort Analysis
Product
Forecasting
Strategy
Planning
Marketing
Effectiveness
AB Testing
Reporting
Cubes
Data needs to be shared and combined across many systems to
support multiple and sometimes complex analysis
Example Analytics Outputs
Analytics
Outputs
Campaign
Management
PersonalisationPricingMarketing Mix
Product
Propositions
Promotions
Website/App
optimisation
Dashboards/
BI
Network
Planning
Manpower
Planning
Data is a critical asset for analytical decision-making
long after operational processes complete
Example Business Use-cases
Financial
Forecasting …..
Using data is a competitive requirement. Using data
better than rivals is a source of competitive advantage.
Data is no longer an application by-product,
it is a Product
DATA PRODUCTS
“A PRODUCT THAT FACILITATES AN END GOAL
THROUGH THE USE OF DATA”
- DJ PATIL, FORMER US CHIEF DATA SCIENTIST
#2 Think Product not Project
Data Analytics is complex manufacturing
Data storage and Databases
Cloud file storage, NoSQL DB,Distributed file system, RDBMS, Analytical DB
Compute infrastructure and Query execution engines
VMs, Container services, and Distributed compute frameworks
Distributed SQL execution engines
Development tools, workspaces and software libraries
Data Analytics
Data exploration, Data
visualization, Data analysis,
Data science, Machine
learning, Deep learning
Reproducibility, Deployment, Orchestration and Monitoring
Output files
BI Tools
Interactive
dashboards
Web Apps
APIs
Product
creation,
Customer
experience
and Business
efficiency
Data
Ingestion
Data
Transformation
Data
Analytics
Data
Products
Use Cases
Data integration and Data processing
pipelines
ETL/ELT tools, Stream processing
MDM,Data unification, and Data preparation
Data management
Product Development System
Production System
Agile DevOps
Lean
Thinking
DataOps
applies proven
methodologies
to improve the
quality and
speed of data
analytics
Data
Analytics
Eliminate waste, improve quality
#3 Apply Lean thinking
The Optimist The Pessimist The Lean Thinker
THE GLASS IS
HALF FULL
THE GLASS IS
HALF EMPTY
WHY IS THE GLASS
TWICE AS BIG AS IT
SHOULD BE?
Measure data cycles to eliminate
bottlenecks and poor quality
Development and orchestration of
production pipelines is hard
Data Sources
and Formats
Multiple Data
Products
Correctly coded logic will work correctly
Data pipelines will break the second you
put them into production
Often there is more complexity in data
than the code
Monitoring and testing is needed to trust
pipelines and keep them healthy
Integrity checks
Data Completeness Check
Data Versioning
Data Classification
Data Lineage Tracking
Data Cleansing
Watermarking
Quality Checks
File validation
Data Correctness Check
Data Accuracy CheckData Consistency Check
Data Uniformity Check
ETL Performance Testing End User Testing
Regression Testing
Metadata Testing Transformation Testing
Integration Testing
Accept the delivery pipeline is governed
by rules and constraints
Trust people with data
Identity and
Access
Management
Custom role
permissions
Audit trail
logs
Data Loss
Prevention
Encryption
of Data at
Rest
Encryption
of Data in
Motion
Resource
Monitoring
Firewall
rules
Resource
and
Object
Isolation
Penetration
Testing
Code
Encryption
and
Backup
Segregation
of Duties
Authorisation
protocols
Data
Access and
Privacy
Policy
Metadata
Management
Data
Cataloging
Data
Stewards
and
Owners
#4 Delivering outcomes from data products
requires adaptability to change
Agile requires the ability to make
frequent small changes to reduce risk,
increases feedback and results in
greater value
#5 Embrace
Development
best practise in
Data Analytics
Version Control, Configuration
management, Continuous
Integration, Continuous
Deployment
Automated reproducibility is a must
Configuration Management
For consistently reproducible computational
environments
Continuous Integration: Commit Code
Regularly
Data Cleaning Master
Data Cleaning
Dev Branch
Feature Extraction Dev
Feature Extraction
Master
Model Train Master
Model Train Dev Branch
Machine Learning Pipeline
Product Development (e.g. App, Website, Marketing system, Operational System, Dashboard, etc.)
Test data management is super
important!
#6 Organise for success –
Conway's Law isn't academic
Microsoft's research found organisational structure predicted code quality better than
other measurable factors such as Code Churn, Code Complexity, Dependencies, Code
Coverage or Pre-Release Bugs
Nearly 60 percent of breakaway organizations use
cross-functional teams, versus less than a third
of the remaining respondents that do so.
Core
Personas
Data Engineer Data AnalystData Scientist Team Lead
Data Platform
Administration
ML
Engineer
Supporting
Personas
Solutions Architect
DBA
Security Expert
Specialist Tester
Technical Lead Designer
There’s a problem with cross-
functional teams
Silos within teams
Cross-skill team members
Dash-Shaped
(Generalist)
Capable in a lot of things
but not expert in any
No I (or -) in teams
I-Shaped
(Specialist)
Expert at one thing
Poor Better Good Best
Breadth of knowledge
Depth of knowledge
T-Shaped
(Generalising Specialist)
Capable in a lot of things
and expert in one.
Pi-Shaped
(Multi-skilled)
M-Shaped
(Poly-skilled)
Analytics
Specialists
and Centre of
Excellence
Source data
system
owners
Data Management and Platform teams
(Databases, data storage, compute infrastructure, analytical tools, data governance and security,
master data management, operations, etc.)
Domain use
cases
Cross-functional domain team
(Data engineers, Data scientists, Data analysts, Stakeholder, etc.)
Cross-functional domain team
(Data engineers, Data scientists, Data analysts, Stakeholder, etc.)
Cross-functional domain team
(Data engineers, Data scientists, Data analysts, Stakeholder, etc.)
Domain use
cases
Domain use
cases
Self-service access Productionise
Domain-orientated Teams (optimised for speed)
Team Customer
Data Product
Service
Is our product
healthy
(Monitoring)
Is the product
meeting our
objectives?
(Benefit
Measurement)
Is our team and
its processes
healthy?
(Retrospectives)
Is our internal
service delivery
fit for purpose?
(Service Delivery
Review)
Viewpoint
Concern
#7 Measure and act on feedback
Source: Matt Philips
Just as DevOps is more than Chef, Puppet
and Ansible
DataOps is more than tools
DataOps can't be delivered by a monolithic
solution, it requires multiple technologies
// Harvinder Atwal // Web
var current: {
companyName : "MoneySuperMarket",
position : “Interim Chief Data Officer"
};
var previous1: {
companyName : "Dunnhumby",
position : "Insight Director,"
+ "Tesco Clubcard"
};
var previous2: {
companyName : "Lloyds Banking Group",
position : "Senior Manager"
};
var previous3: {
companyName : "British Airways",
position : "Senior Operational Research Analyst"
};
{"about" : "me"}
var username = "harvindersatwal";
var linkedIn = "/in/" + username;
var twitter = "@" + username;
var email = username + "@gmail.com";

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
DAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data ArchitectureDAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data Architecture
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
How to Streamline DataOps on AWS
How to Streamline DataOps on AWSHow to Streamline DataOps on AWS
How to Streamline DataOps on AWS
 
Cloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case StudyCloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case Study
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 

Ähnlich wie DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal

Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
IO Journey All Up
IO Journey All UpIO Journey All Up
IO Journey All Up
baselsss
 
Day 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressDay 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_press
IntelAPAC
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
BSP Media Group
 

Ähnlich wie DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal (20)

IBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big DataIBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big Data
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018
 
Make Smarter Decisions with WISEMINER
Make Smarter Decisions with WISEMINERMake Smarter Decisions with WISEMINER
Make Smarter Decisions with WISEMINER
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
 
Financial Analytics pafp 11-21-13
Financial Analytics   pafp 11-21-13Financial Analytics   pafp 11-21-13
Financial Analytics pafp 11-21-13
 
IO Journey All Up
IO Journey All UpIO Journey All Up
IO Journey All Up
 
Day 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressDay 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_press
 
Cloud Analytics Playbook
Cloud Analytics PlaybookCloud Analytics Playbook
Cloud Analytics Playbook
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Peopleware. Introduction to Enterprise DataMashups
Peopleware. Introduction to Enterprise DataMashupsPeopleware. Introduction to Enterprise DataMashups
Peopleware. Introduction to Enterprise DataMashups
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025
 
Data Engineering Proposal for Homerunner.pptx
Data Engineering Proposal for Homerunner.pptxData Engineering Proposal for Homerunner.pptx
Data Engineering Proposal for Homerunner.pptx
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 

Mehr von Harvinder Atwal

Effective report writing
Effective report writingEffective report writing
Effective report writing
Harvinder Atwal
 

Mehr von Harvinder Atwal (7)

AI is a Team Sport
AI is a Team SportAI is a Team Sport
AI is a Team Sport
 
Data leaders summit 2019
Data leaders summit 2019Data leaders summit 2019
Data leaders summit 2019
 
Data Leaders Summit Barcelona 2018
Data Leaders Summit Barcelona 2018Data Leaders Summit Barcelona 2018
Data Leaders Summit Barcelona 2018
 
Machine learning - What they don't teach you on Coursera ODSC London 2016
Machine learning - What they don't teach you on Coursera ODSC London 2016Machine learning - What they don't teach you on Coursera ODSC London 2016
Machine learning - What they don't teach you on Coursera ODSC London 2016
 
Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017
 
Data visualisation
Data visualisationData visualisation
Data visualisation
 
Effective report writing
Effective report writingEffective report writing
Effective report writing
 

Kürzlich hochgeladen

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal

  • 1. DataOps, the secret weapon for delivering AI, data science, and business intelligence value at speed Harvinder Atwal
  • 2. // Harvinder Atwal MoneySuperMarket // Web dunnhumby {"previous" : "Insight Director, Tesco Clubcard"} LLOYDS BANKING GROUP {"previous" : "Senior Manager, Customer Strategy and Insight"} {"Current" : "Interim Chief Data Officer"} @harvindersatwal BRITISH AIRWAYS {"previous" : "Senior Operational Research Analyst"} {"about" : "me"} @gmail.com
  • 3. £2bn SAVINGS 2019 estimate total of household savings 1993 80% 13.1 million $2 billion 989 We started life as mortgages 2000 of UK Online Adults visit one of our websites each year MoneySuperMarket Active users 2019 Market cap 2020 Product Providers
  • 4. 3 major ways Data Science can help the organisation Product Creation Customer Experience Business efficiency
  • 5. Applications aren’t in short supply Demand Forecasting Capacity Forecasting Marketing automation Supply chain management, automatic ordering Automatic scaling of infrastructure Document Classification Image Annotation Customer Service Machine Translation Anomaly Detection Product Recommendation Fraud Detection Image Selection Text Generation Predictive Maintenance Automated Pricing Automated routing Medical diagnosis
  • 6. What does the data about data say?
  • 8. Just 7.3% of organisations say the state of their data and analytics is excellent* *New Vantage Partners Big Data and AI Executive Survey 2020
  • 9. Only 22% of companies are currently seeing a significant return from data science expenditures* *Obligatory conference presentation quote from GartnerForresterMcKinsey Consulting. Sorry.
  • 10. Kaggle’s The State of Data & Machine Learning Survey
  • 11. What do people talk about? London Strata
  • 12. Technology is less important than you think, because the data says so 19.1% 9.1% 2018 2020 *New Vantage Partners Big Data and AI Executive Survey 2020 "Principal Challenge to Becoming Data-Driven is Technology"*
  • 13. Thinking real-life Data Science is a Kaggle competition
  • 14. Algorithms are becoming less important as AutoML improves
  • 15. model.fit(X_train, y_train) is actually the easiest part Data Governance Data Quality Data Security Test Data Management Version Control Access Control Team Organisation Stakeholder Buy-in Outcome Measurement
  • 16. No one wants to talk about lack of value, dirty data, people, processes and culture
  • 17. HIRE DATA SCIENTISTS How businesses think they become data- driven 1 2 3 MONEY FLOWS HOARD DATA 4
  • 18. Phrases I would like to ban #92 "Actionable Insight"
  • 19. Gartner has predicted that, “through 2022, only 20% of analytic insights will deliver business outcomes.”
  • 20. Step #1 Focus on Organisational Objectives and Outcomes, not Data Outputs Resources Activities Outputs Outcomes Impact The Program Logic Model Success does not start with data, data scientists, models, insight, or technology, it literally ends with them. Success starts with the impacts and outcomes you want and works back from there to make them happen.
  • 21. We need a completely different approach to delivering outcomes from data
  • 23. Rigorous system architecture, development methods, requirements gathering, testing and code reusability. Computing was expensive
  • 24. Some basic reporting capability – BI was born
  • 25. Data has little/no value once process complete Data sharing limited to applications requiring it for specific business processes
  • 26. Big Design Up Front (BDUF) Data Warehouses to meet specific requirements
  • 27. Network Mobile data Messages Clickstream Social Media Billing Call Details Multiple sources of data Storage and Compute are cheap CRM Viewing Multiple data formats Semi Structured Unstructured Structured Multiple data silos Inventory Human Resources Data Warehouses Cubes and Marts Operational Data Stores Transactional Sources File Systems Big Data We can no longer apply steam-age thinking to data
  • 28. Data Sources Analytics Tools Customer Lifetime Value Modelling Churn modelling Financial Forecasting Fraud detection Regulatory Feeds Offer prioritisation Next Best Action Sentiment Analysis Segmentation Product Affinity Cross-sell modelling Financial Modelling Cohort Analysis Product Forecasting Strategy Planning Marketing Effectiveness AB Testing Reporting Cubes Data needs to be shared and combined across many systems to support multiple and sometimes complex analysis Example Analytics Outputs
  • 29. Analytics Outputs Campaign Management PersonalisationPricingMarketing Mix Product Propositions Promotions Website/App optimisation Dashboards/ BI Network Planning Manpower Planning Data is a critical asset for analytical decision-making long after operational processes complete Example Business Use-cases Financial Forecasting ….. Using data is a competitive requirement. Using data better than rivals is a source of competitive advantage.
  • 30. Data is no longer an application by-product, it is a Product
  • 31. DATA PRODUCTS “A PRODUCT THAT FACILITATES AN END GOAL THROUGH THE USE OF DATA” - DJ PATIL, FORMER US CHIEF DATA SCIENTIST #2 Think Product not Project
  • 32. Data Analytics is complex manufacturing Data storage and Databases Cloud file storage, NoSQL DB,Distributed file system, RDBMS, Analytical DB Compute infrastructure and Query execution engines VMs, Container services, and Distributed compute frameworks Distributed SQL execution engines Development tools, workspaces and software libraries Data Analytics Data exploration, Data visualization, Data analysis, Data science, Machine learning, Deep learning Reproducibility, Deployment, Orchestration and Monitoring Output files BI Tools Interactive dashboards Web Apps APIs Product creation, Customer experience and Business efficiency Data Ingestion Data Transformation Data Analytics Data Products Use Cases Data integration and Data processing pipelines ETL/ELT tools, Stream processing MDM,Data unification, and Data preparation Data management Product Development System Production System
  • 33. Agile DevOps Lean Thinking DataOps applies proven methodologies to improve the quality and speed of data analytics Data Analytics
  • 34. Eliminate waste, improve quality #3 Apply Lean thinking The Optimist The Pessimist The Lean Thinker THE GLASS IS HALF FULL THE GLASS IS HALF EMPTY WHY IS THE GLASS TWICE AS BIG AS IT SHOULD BE?
  • 35. Measure data cycles to eliminate bottlenecks and poor quality
  • 36. Development and orchestration of production pipelines is hard Data Sources and Formats Multiple Data Products
  • 37. Correctly coded logic will work correctly
  • 38. Data pipelines will break the second you put them into production Often there is more complexity in data than the code
  • 39. Monitoring and testing is needed to trust pipelines and keep them healthy Integrity checks Data Completeness Check Data Versioning Data Classification Data Lineage Tracking Data Cleansing Watermarking Quality Checks File validation Data Correctness Check Data Accuracy CheckData Consistency Check Data Uniformity Check ETL Performance Testing End User Testing Regression Testing Metadata Testing Transformation Testing Integration Testing
  • 40. Accept the delivery pipeline is governed by rules and constraints
  • 41. Trust people with data Identity and Access Management Custom role permissions Audit trail logs Data Loss Prevention Encryption of Data at Rest Encryption of Data in Motion Resource Monitoring Firewall rules Resource and Object Isolation Penetration Testing Code Encryption and Backup Segregation of Duties Authorisation protocols Data Access and Privacy Policy Metadata Management Data Cataloging Data Stewards and Owners
  • 42. #4 Delivering outcomes from data products requires adaptability to change
  • 43. Agile requires the ability to make frequent small changes to reduce risk, increases feedback and results in greater value
  • 44. #5 Embrace Development best practise in Data Analytics Version Control, Configuration management, Continuous Integration, Continuous Deployment
  • 46. Configuration Management For consistently reproducible computational environments
  • 47. Continuous Integration: Commit Code Regularly Data Cleaning Master Data Cleaning Dev Branch Feature Extraction Dev Feature Extraction Master Model Train Master Model Train Dev Branch Machine Learning Pipeline Product Development (e.g. App, Website, Marketing system, Operational System, Dashboard, etc.)
  • 48. Test data management is super important!
  • 49. #6 Organise for success – Conway's Law isn't academic Microsoft's research found organisational structure predicted code quality better than other measurable factors such as Code Churn, Code Complexity, Dependencies, Code Coverage or Pre-Release Bugs
  • 50. Nearly 60 percent of breakaway organizations use cross-functional teams, versus less than a third of the remaining respondents that do so.
  • 51. Core Personas Data Engineer Data AnalystData Scientist Team Lead Data Platform Administration ML Engineer Supporting Personas Solutions Architect DBA Security Expert Specialist Tester Technical Lead Designer
  • 52. There’s a problem with cross- functional teams
  • 55. Dash-Shaped (Generalist) Capable in a lot of things but not expert in any No I (or -) in teams I-Shaped (Specialist) Expert at one thing Poor Better Good Best Breadth of knowledge Depth of knowledge T-Shaped (Generalising Specialist) Capable in a lot of things and expert in one. Pi-Shaped (Multi-skilled) M-Shaped (Poly-skilled)
  • 56. Analytics Specialists and Centre of Excellence Source data system owners Data Management and Platform teams (Databases, data storage, compute infrastructure, analytical tools, data governance and security, master data management, operations, etc.) Domain use cases Cross-functional domain team (Data engineers, Data scientists, Data analysts, Stakeholder, etc.) Cross-functional domain team (Data engineers, Data scientists, Data analysts, Stakeholder, etc.) Cross-functional domain team (Data engineers, Data scientists, Data analysts, Stakeholder, etc.) Domain use cases Domain use cases Self-service access Productionise Domain-orientated Teams (optimised for speed)
  • 57. Team Customer Data Product Service Is our product healthy (Monitoring) Is the product meeting our objectives? (Benefit Measurement) Is our team and its processes healthy? (Retrospectives) Is our internal service delivery fit for purpose? (Service Delivery Review) Viewpoint Concern #7 Measure and act on feedback Source: Matt Philips
  • 58. Just as DevOps is more than Chef, Puppet and Ansible DataOps is more than tools
  • 59. DataOps can't be delivered by a monolithic solution, it requires multiple technologies
  • 60.
  • 61.
  • 62.
  • 63. // Harvinder Atwal // Web var current: { companyName : "MoneySuperMarket", position : “Interim Chief Data Officer" }; var previous1: { companyName : "Dunnhumby", position : "Insight Director," + "Tesco Clubcard" }; var previous2: { companyName : "Lloyds Banking Group", position : "Senior Manager" }; var previous3: { companyName : "British Airways", position : "Senior Operational Research Analyst" }; {"about" : "me"} var username = "harvindersatwal"; var linkedIn = "/in/" + username; var twitter = "@" + username; var email = username + "@gmail.com";

Hinweis der Redaktion

  1. Technology has actually more than kept up data challenges But that's where the money and attention is going
  2.   The problem in Data Science is an overemphasis on machine learning and especially among junior data scientists that belief that accuracy score(s) on a test dataset is the definition of success. This seems a very strange definition of success to me.   A perfect model that never goes into production is no better than a model that doesn't exist.   Don't get me wrong there are domains where model accuracy is extremely important like healthcare, fraud detection and adTech but these are minority compared to applications where doing anything is a big step up from doing nothing.
  3. https://hbr.org/2020/02/use-data-to-answer-your-key-business-questions
  4. https://hbr.org/2020/02/use-data-to-answer-your-key-business-questions
  5. Edison’s “jumbo dynamo” at the wold’s first power station in Lower Manhattan.
  6. Methodologies that were invented for Software Development and Product Development apply to Data too.
  7. This is sometimes really hard for Data Scientists who experiment with data on laptops to accept. If you want data Analytics go faster we need to accept that data Analytics pipelines need brakes in the form of rules and constraints to build trust.
  8. This may seem obvious if you're a developer but most data scientists and analysts are not trained in development and devops best practices.   However, adopting these approaches will lead to a step change in productivity.
  9. Reproducibility is a critical requirement for DataOps and version control is the foundation upon which a lot of the delivery is built.   Version control makes it possible to maintain an archived version of the code used to produce a particular result. The most common software being Git using services like github gitlab or bitbucket. Version control is the foundation upon which a lot of delivery is built. At a minimum, reviewers of a publication and future researchers should be able to: 1) Download all data and software used to generate the results. 2) Run tests and review source code to verify correctness. 3) Run a build process to execute the computation. Version control makes it possible to maintain an archived version of the code used to produce a particular result.  Examples include Git and Subversion. 3) Automated build systems document the high-level structure of a computation: which programs process which data, what outputs they produce, etc.  Examples include Make and Ant.
  10. It's not just code and he's proved useful but also the analytical environment, including packages and libraries, versions of languages and system level software. Popular solutions include package environment managers like Conda, containers solutions like Docker or virtual machines containing an entire operating system plus a specific environment. Configuration management tools document the details of the computational environment where the result was produced, including the programming languages, libraries, and system-level software the results depend on.  Examples include package managers like Conda that document a set of packages, containers like Docker that also document system software, and virtual machines that actually contain the entire environment needed to run a computation.
  11. The pipeline for data pipelines can be broken and multiple steps. Here’s an example of machine learning pipeline.   This means you can work faster because different people can work on different parts of a pipeline in parallel.   But this can cause problems during integration so the solution is to continuously integrate code is often you can into the master branches. In an enterprise setting where multiple data scientists could be working on a single project, the first step to doing data science work that scales is implementing version control, whether that’s GitHub, GitLab, Bitbucket, or another solution. Once your team has the ability to track code changes, the next step is to create a process in which they regularly commit their code to the master branch of your repository.
  12. “organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.” — M. Conway
  13. Personas are not people or job titles Not all roles fit into cross functional teams, e.g. architect.
  14. Work comes to the teams
  15. Matt Philips
  16. Which brings me on to tools Just as chemistry is not about the tubes but the process of experimentation. DataOps is not tied to a particular technology, architecture, tool, language or framework. However, some tools are better at supporting DataOps collaboration, orchestration, agility, quality, security, access and ease of use.
  17. Long gone are the days when monolithic solutions worked Previous stack, one vendor for data formats, data storage, query interface, language, and functions. Low interoperability. E.g. SAS Now many technologies in an ecosystem. More data science looks much more like software development than the data warehousing or BI of old.