DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal

DataOps, the secret weapon for
delivering AI, data science, and business
intelligence value at speed
Harvinder Atwal

// Harvinder Atwal
MoneySuperMarket
// Web
dunnhumby
{"previous" : "Insight Director, Tesco Clubcard"}
LLOYDS BANKING GROUP
{"previous" : "Senior Manager, Customer Strategy and Insight"}
{"Current" : "Interim Chief Data Officer"}
@harvindersatwal
BRITISH AIRWAYS
{"previous" : "Senior Operational Research Analyst"}
{"about" : "me"}
@gmail.com

£2bn
SAVINGS
2019 estimate total of household savings
1993 80% 13.1 million $2 billion 989
We started life
as mortgages
2000
of UK Online
Adults visit one
of our websites
each year
MoneySuperMarket
Active users
2019
Market cap
2020
Product
Providers

3 major ways Data Science can help the
organisation
Product
Creation
Customer
Experience
Business
efficiency

Applications aren’t in short supply
Demand Forecasting
Capacity Forecasting
Marketing automation
Supply chain management, automatic ordering
Automatic scaling of infrastructure
Document Classification
Image Annotation
Customer Service
Machine Translation
Anomaly Detection
Product Recommendation
Fraud Detection
Image Selection
Text Generation
Predictive Maintenance
Automated Pricing
Automated routing
Medical diagnosis

What does the data about data say?

Just 7.3% of organisations say the state of
their data and analytics is excellent*
*New Vantage Partners Big Data and AI Executive Survey 2020

Only 22% of companies are currently
seeing a significant return from data
science expenditures*
*Obligatory conference presentation quote from GartnerForresterMcKinsey Consulting. Sorry.

Kaggle’s The State of Data & Machine Learning Survey

What do people talk about?
London Strata

Technology is less important than you
think, because the data says so
19.1%
9.1%
2018 2020
*New Vantage Partners Big Data and AI Executive Survey 2020
"Principal Challenge to Becoming Data-Driven is
Technology"*

Thinking real-life Data Science is a Kaggle
competition

Algorithms are becoming less important
as AutoML improves

model.fit(X_train, y_train)
is actually the easiest part
Data Governance
Data Quality
Data Security
Test Data Management Version Control
Access
Control
Team
Organisation
Stakeholder
Buy-in
Outcome
Measurement

No one wants to talk about lack of value,
dirty data, people, processes and culture

HIRE DATA
SCIENTISTS
How businesses think they become data-
driven
1 2 3
MONEY
FLOWS
HOARD
DATA
4

Phrases I would like to ban #92
"Actionable
Insight"

Gartner has predicted that, “through 2022, only 20%
of analytic insights will deliver business outcomes.”

Step #1 Focus on Organisational Objectives
and Outcomes, not Data Outputs
Resources Activities Outputs Outcomes Impact
The Program Logic Model
Success does not start with data, data scientists, models, insight, or
technology, it literally ends with them.
Success starts with the impacts and outcomes you want and works back
from there to make them happen.

We need a completely different approach
to delivering outcomes from data

Historic (Telco) usage of data

Rigorous system architecture, development methods,
requirements gathering, testing and code reusability.
Computing was expensive

Some basic reporting capability
– BI was born

Data has little/no value once process
complete
Data sharing limited to applications
requiring it for specific business processes

Big Design Up Front (BDUF) Data
Warehouses to meet specific
requirements

Network
Mobile data
Messages
Clickstream
Social Media
Billing
Call Details
Multiple sources of
data
Storage and Compute are cheap
CRM
Viewing
Multiple data
formats
Semi Structured
Unstructured
Structured
Multiple data
silos
Inventory
Human Resources
Data Warehouses Cubes
and Marts
Operational Data Stores
Transactional Sources
File Systems
Big Data
We can no longer apply steam-age thinking to
data

Data Sources
Analytics
Tools
Customer
Lifetime Value
Modelling
Churn modelling
Financial
Forecasting
Fraud detection
Regulatory
Feeds
Offer
prioritisation
Next Best
Action
Sentiment
Analysis
Segmentation
Product Affinity
Cross-sell
modelling
Financial
Modelling
Cohort Analysis
Product
Forecasting
Strategy
Planning
Marketing
Effectiveness
AB Testing
Reporting
Cubes
Data needs to be shared and combined across many systems to
support multiple and sometimes complex analysis
Example Analytics Outputs

Analytics
Outputs
Campaign
Management
PersonalisationPricingMarketing Mix
Product
Propositions
Promotions
Website/App
optimisation
Dashboards/
BI
Network
Planning
Manpower
Planning
Data is a critical asset for analytical decision-making
long after operational processes complete
Example Business Use-cases
Financial
Forecasting …..
Using data is a competitive requirement. Using data
better than rivals is a source of competitive advantage.

Data is no longer an application by-product,
it is a Product

DATA PRODUCTS
“A PRODUCT THAT FACILITATES AN END GOAL
THROUGH THE USE OF DATA”
- DJ PATIL, FORMER US CHIEF DATA SCIENTIST
#2 Think Product not Project

Data Analytics is complex manufacturing
Data storage and Databases
Cloud file storage, NoSQL DB,Distributed file system, RDBMS, Analytical DB
Compute infrastructure and Query execution engines
VMs, Container services, and Distributed compute frameworks
Distributed SQL execution engines
Development tools, workspaces and software libraries
Data Analytics
Data exploration, Data
visualization, Data analysis,
Data science, Machine
learning, Deep learning
Reproducibility, Deployment, Orchestration and Monitoring
Output files
BI Tools
Interactive
dashboards
Web Apps
APIs
Product
creation,
Customer
experience
and Business
efficiency
Data
Ingestion
Data
Transformation
Data
Analytics
Data
Products
Use Cases
Data integration and Data processing
pipelines
ETL/ELT tools, Stream processing
MDM,Data unification, and Data preparation
Data management
Product Development System
Production System

Agile DevOps
Lean
Thinking
DataOps
applies proven
methodologies
to improve the
quality and
speed of data
analytics
Data
Analytics

Eliminate waste, improve quality
#3 Apply Lean thinking
The Optimist The Pessimist The Lean Thinker
THE GLASS IS
HALF FULL
THE GLASS IS
HALF EMPTY
WHY IS THE GLASS
TWICE AS BIG AS IT
SHOULD BE?

Measure data cycles to eliminate
bottlenecks and poor quality

Development and orchestration of
production pipelines is hard
Data Sources
and Formats
Multiple Data
Products

Correctly coded logic will work correctly

Data pipelines will break the second you
put them into production
Often there is more complexity in data
than the code

Monitoring and testing is needed to trust
pipelines and keep them healthy
Integrity checks
Data Completeness Check
Data Versioning
Data Classification
Data Lineage Tracking
Data Cleansing
Watermarking
Quality Checks
File validation
Data Correctness Check
Data Accuracy CheckData Consistency Check
Data Uniformity Check
ETL Performance Testing End User Testing
Regression Testing
Metadata Testing Transformation Testing
Integration Testing

Accept the delivery pipeline is governed
by rules and constraints

Trust people with data
Identity and
Access
Management
Custom role
permissions
Audit trail
logs
Data Loss
Prevention
Encryption
of Data at
Rest
Encryption
of Data in
Motion
Resource
Monitoring
Firewall
rules
Resource
and
Object
Isolation
Penetration
Testing
Code
Encryption
and
Backup
Segregation
of Duties
Authorisation
protocols
Data
Access and
Privacy
Policy
Metadata
Management
Data
Cataloging
Data
Stewards
and
Owners

#4 Delivering outcomes from data products
requires adaptability to change

Agile requires the ability to make
frequent small changes to reduce risk,
increases feedback and results in
greater value

#5 Embrace
Development
best practise in
Data Analytics
Version Control, Configuration
management, Continuous
Integration, Continuous
Deployment

Automated reproducibility is a must

Configuration Management
For consistently reproducible computational
environments

Continuous Integration: Commit Code
Regularly
Data Cleaning Master
Data Cleaning
Dev Branch
Feature Extraction Dev
Feature Extraction
Master
Model Train Master
Model Train Dev Branch
Machine Learning Pipeline
Product Development (e.g. App, Website, Marketing system, Operational System, Dashboard, etc.)

Test data management is super
important!

#6 Organise for success –
Conway's Law isn't academic
Microsoft's research found organisational structure predicted code quality better than
other measurable factors such as Code Churn, Code Complexity, Dependencies, Code
Coverage or Pre-Release Bugs

Nearly 60 percent of breakaway organizations use
cross-functional teams, versus less than a third
of the remaining respondents that do so.

Core
Personas
Data Engineer Data AnalystData Scientist Team Lead
Data Platform
Administration
ML
Engineer
Supporting
Personas
Solutions Architect
DBA
Security Expert
Specialist Tester
Technical Lead Designer

There’s a problem with cross-
functional teams

Dash-Shaped
(Generalist)
Capable in a lot of things
but not expert in any
No I (or -) in teams
I-Shaped
(Specialist)
Expert at one thing
Poor Better Good Best
Breadth of knowledge
Depth of knowledge
T-Shaped
(Generalising Specialist)
Capable in a lot of things
and expert in one.
Pi-Shaped
(Multi-skilled)
M-Shaped
(Poly-skilled)

Analytics
Specialists
and Centre of
Excellence
Source data
system
owners
Data Management and Platform teams
(Databases, data storage, compute infrastructure, analytical tools, data governance and security,
master data management, operations, etc.)
Domain use
cases
Cross-functional domain team
(Data engineers, Data scientists, Data analysts, Stakeholder, etc.)
Domain use
cases
Domain use
cases
Self-service access Productionise
Domain-orientated Teams (optimised for speed)

Team Customer
Data Product
Service
Is our product
healthy
(Monitoring)
Is the product
meeting our
objectives?
(Benefit
Measurement)
Is our team and
its processes
healthy?
(Retrospectives)
Is our internal
service delivery
fit for purpose?
(Service Delivery
Review)
Viewpoint
Concern
#7 Measure and act on feedback
Source: Matt Philips

Just as DevOps is more than Chef, Puppet
and Ansible
DataOps is more than tools

DataOps can't be delivered by a monolithic
solution, it requires multiple technologies

// Harvinder Atwal // Web
var current: {
companyName : "MoneySuperMarket",
position : “Interim Chief Data Officer"
};
var previous1: {
companyName : "Dunnhumby",
position : "Insight Director,"
+ "Tesco Clubcard"
};
var previous2: {
companyName : "Lloyds Banking Group",
position : "Senior Manager"
};
var previous3: {
companyName : "British Airways",
position : "Senior Operational Research Analyst"
};
{"about" : "me"}
var username = "harvindersatwal";
var linkedIn = "/in/" + username;
var twitter = "@" + username;
var email = username + "@gmail.com";

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal

Ähnlich wie DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal (20)

Mehr von Harvinder Atwal

Mehr von Harvinder Atwal (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal

Hinweis der Redaktion