SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Data and Analytics at
Holland & Barrett
Building a "3-Michelin-star" data platform on AWS
to power insights at the speed of thought
Dobo Radichkov
Chief Data Officer
7 June 2023
About Holland & Barrett
Founded in 1870, we
exist to make health
and wellness a way of
life for everyone.
3
The Holland & Barrett Data & Analytics vision
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
4
The Holland & Barrett Data & Analytics vision
Data platform
Single source of truth
Analytics & BI
Personalisation
Data Science & ML
Health analytics
Analytics in the field
(stores & suppliers)
Data monetisation
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐
⭐
⭐
⭐
⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
5
We are now in ‘Phase II’ of this journey
▪ Complete core reporting
▪ Self-service BI
▪ Functional analytics
▪ Analytics in the field
▪ Data science & ML
BUILD NEW
FOUNDATION
SCALE OPERA-
TING MODEL
DRIVE VALUE &
INNOVATION
▪ Data strategy & vision
▪ Set up data teams
▪ AWS-centric data lake
▪ Redshift data warehouse
▪ Metabase BI platform
▪ Data as driver of value:
– Increase revenue
– Reduce costs
– Improve UX
– Optimise processes
▪ Data as driver of
innovation
2022 2023 2024+
CRAWL METAMORPHOSE WALK FLY TRANSCEND
I II III
6
The H&B data organisation
§ Data lake &
governance
§ Source
system
integration
§ Data services
§ Data
modelling &
transforma-
tions
§ Single source
of truth for
reporting &
analytics
§ Management
reporting
§ Operational
reporting
§ Data
visualisation
§ Data science
and applied
machine
learning
§ Forecasting &
optimisation
§ Personali-
sation
§ Product
squad
analytics
§ Product
experimen-
tation
§ Digital trade
analytics
§ Performance
marketing
analytics
§ CRM
analytics
1 2 3 4 5
DATA
ENGINEERING
DATA
WAREHOUSE
BUSINESS
INTELLIEGENCE
DATA
SCIENCE
WEB & APP
ANALYTICS
DIGITAL
ANALYTICS
6
7
“3-Michelin-star” data platform 😋
Operational master data
(customers, products, orders, stock, etc.)
BI & Core
Reporting
Data Science /
Applied ML
Product &
Digital Analytics
DATA
WAREHOUSE
Raw systems data
(security, data governance, access control)
DATA LAKE
Supply Chain
Retail Ops
Commercial
Customer
Finance
“Raw
ingredients
& food
storage”
“The
kitchen &
cooking
process”
“The
finished
meals &
service”
AS400
(until
demise)
Oracle
(until
demise)
GA4 …
Till
system
Order
mgmt.
system
Single
view of
stock
Production systems & services
8
Data lake architecture
AS400 Oracle
Amazon
Aurora
Amazon
RDS
On-premise DBs
(legacy estate)
Cloud DBs
…
Kafka Connect
(Amazon MSK)
APIs &
SaaS
DynamoDB
tables
Scraper
(in-house crawler)
Katalog UI Katalog DB
(Aurora PgSQL)
Right to erase
/ access
Eraser /
Accessor Success
Data lake
(Amazon S3)
▪ 5,000 datasets
▪ 98k fields
▪ 10.4M files
Data lake
S3 buckets
▪ JSON*
▪ Parquet
▪ AVRO
▪ CSV
GOVERNANCE
INGEST
Data lake index
(DynamoDB)
Airflow Airflow
1 2 3
4
5
9
Data warehouse architecture
4 x ra3.16xlarge
Data warehouse
(Amazon Redshift)
Data lake
(Amazon S3)
ELT orchestration
COPY
(data ingest)
External tables
(Amazon Redshift Spectrum)
APIs &
SaaS
▪ 2,670 tables
▪ 2m queries / month
▪ Layered data architecture
▪ Raw data stored
in SUPER columns
▪ Hourly ELT with
idempotent pipelines
Cache
(Amazon Aurora)
Foreign data wrapper
(pg_cron for scheduling)
External schema
(live federated queries)
▪ Used as fast storage
layer for data apps
▪ Serves raw data
for ELT data pipelines
1
2
3
10
New Redshift features we are excited about
▪ Long-awaited
improvement that
help us efficiently
generate large pre-
aggregated multi-
dimensional cubes
▪ Great in combination
with HLL functions for
fast unique counts
▪ MERGE to simplify our
incremental data
pipelines
▪ S3 auto-copy to
simplify data lake
ingest pipelines
▪ Aurora zero-ETL
integration to simplify
CDC pipelines
▪ Create ”masked”
versions of tables to
improve data privacy
and governance
▪ Eliminates overhead
of maintaining
multiple versions /
slices of the data
ROLLUP / CUBE
1 DATA MASKING
2 OTHER
3
11
BI & Analytics architecture
Data warehouse
(Amazon Redshift)
Raw data layer
Operational data layer
BI data layer
Cubes
Consumers
Raw unmodified data from source
systems – ELT from data lake
Clean, transformed, disaggregated
entity relationship model – starting
point for all reporting & analytics
Customer, orders, product, stores,
warehouse, stock master data
Semi-aggregated datasets to
enable fast reporting & analytics.
Includes pre-computed
HLL sketches for efficient
unique counts.
Multi-dimensional ROLAP cubes
delivering pre-aggregated metrics
along pre-defined dimensions.
Best practice: CUBE/ROLLUP on
top of pre-computed HLL sketches
Data IDEs
(JDBC)
Data sharing
Athena
One-stop shop
analytics
APIs
1
2
3
4
5
12
Redshift enables all reporting & analytics use cases
▪ Official reporting
built by central BI
team
▪ Self-service
analytics done
autonomously
within teams
▪ Field analytics
embedded in
customer-facing
apps
Registered users (self-service analytics)
13
Data Science & ML architecture
Develop Train Serve
Amazon Athena Amazon Redshift
Amazon EC2 AWS Batch
Aurora / RDS
DynamoDB
API Gateway AWS Lambda
R / Python
Notebooks
Feature engineering
Model development Model training
Amazon Redshift
Feature extraction pipelines
Amazon Athena
EC2 instances
ML data layer
Serverless
1 2 3
14
H&B data drives core business value & innovation
✓ Unit economics
✓ Store network planning
✓ Competitor intelligence
✓ Promo effectiveness
✓ Econometrics / MMM
✓ Space & range analytics
Commercial
Finance
Wellness
Supply chain
✓ Daily / weekly / monthly
management reporting
✓ Operational trade reporting
✓ Intraday / peak reporting
✓ Exception reporting
✓ Single view of stock
✓ Forecasting & replenishment
✓ Fulfilment analytics
✓ Stock availability
✓ Clearance / overstock analytics
✓ Supplier analytics
✓ Diagnostics
✓ Health analytics
✓ Personalised wellness
✓ Behavioural engine
Customer Digital
✓ Single customer view
✓ Customer lifecycle
management
✓ eCRM enablement
✓ Customer lifetime value
✓ Digi marketing measurement
✓ Personalisation & search
✓ OKRs
✓ UX / funnel analytics
✓ Experimentation platform
✓ Web / app event tracking
✓ SEO analytics

Weitere ähnliche Inhalte

Was ist angesagt?

Successful Data Governance Models and Frameworks
Successful Data Governance Models and FrameworksSuccessful Data Governance Models and Frameworks
Successful Data Governance Models and FrameworksDATAVERSITY
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
 
DMPs are Dead. Welcome to the CDP Era.
DMPs are Dead. Welcome to the CDP Era.DMPs are Dead. Welcome to the CDP Era.
DMPs are Dead. Welcome to the CDP Era.mParticle
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
Analytics ROI Best Practices
Analytics ROI Best PracticesAnalytics ROI Best Practices
Analytics ROI Best PracticesDATAVERSITY
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDriving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDATAVERSITY
 
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...DATAVERSITY
 
Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyNeo4j
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying dataHans Verstraeten
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
How to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that LastsHow to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that LastsDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 

Was ist angesagt? (20)

Successful Data Governance Models and Frameworks
Successful Data Governance Models and FrameworksSuccessful Data Governance Models and Frameworks
Successful Data Governance Models and Frameworks
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
DMPs are Dead. Welcome to the CDP Era.
DMPs are Dead. Welcome to the CDP Era.DMPs are Dead. Welcome to the CDP Era.
DMPs are Dead. Welcome to the CDP Era.
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Analytics ROI Best Practices
Analytics ROI Best PracticesAnalytics ROI Best Practices
Analytics ROI Best Practices
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDriving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
 
Modern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph TechnologyModern Data Challenges require Modern Graph Technology
Modern Data Challenges require Modern Graph Technology
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
 
Data Governance
Data GovernanceData Governance
Data Governance
 
How to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that LastsHow to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that Lasts
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 

Ähnlich wie Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Lucas Jellema
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsAmazon Web Services
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarPeter Ward
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product IntroTapdata
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataMarco Silva
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationPerficient, Inc.
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptShiv Bharti
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offeringsSandeep Vyas
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsSonata Software
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Nicolas Georgeault
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overviewcglylesu
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 

Ähnlich wie Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS (20)

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
The Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a ServiceThe Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a Service
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big Data
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offerings
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overview
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS

  • 1. Data and Analytics at Holland & Barrett Building a "3-Michelin-star" data platform on AWS to power insights at the speed of thought Dobo Radichkov Chief Data Officer 7 June 2023
  • 2. About Holland & Barrett Founded in 1870, we exist to make health and wellness a way of life for everyone.
  • 3. 3 The Holland & Barrett Data & Analytics vision For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers
  • 4. 4 The Holland & Barrett Data & Analytics vision Data platform Single source of truth Analytics & BI Personalisation Data Science & ML Health analytics Analytics in the field (stores & suppliers) Data monetisation For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐ ⭐ ⭐ ⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
  • 5. 5 We are now in ‘Phase II’ of this journey ▪ Complete core reporting ▪ Self-service BI ▪ Functional analytics ▪ Analytics in the field ▪ Data science & ML BUILD NEW FOUNDATION SCALE OPERA- TING MODEL DRIVE VALUE & INNOVATION ▪ Data strategy & vision ▪ Set up data teams ▪ AWS-centric data lake ▪ Redshift data warehouse ▪ Metabase BI platform ▪ Data as driver of value: – Increase revenue – Reduce costs – Improve UX – Optimise processes ▪ Data as driver of innovation 2022 2023 2024+ CRAWL METAMORPHOSE WALK FLY TRANSCEND I II III
  • 6. 6 The H&B data organisation § Data lake & governance § Source system integration § Data services § Data modelling & transforma- tions § Single source of truth for reporting & analytics § Management reporting § Operational reporting § Data visualisation § Data science and applied machine learning § Forecasting & optimisation § Personali- sation § Product squad analytics § Product experimen- tation § Digital trade analytics § Performance marketing analytics § CRM analytics 1 2 3 4 5 DATA ENGINEERING DATA WAREHOUSE BUSINESS INTELLIEGENCE DATA SCIENCE WEB & APP ANALYTICS DIGITAL ANALYTICS 6
  • 7. 7 “3-Michelin-star” data platform 😋 Operational master data (customers, products, orders, stock, etc.) BI & Core Reporting Data Science / Applied ML Product & Digital Analytics DATA WAREHOUSE Raw systems data (security, data governance, access control) DATA LAKE Supply Chain Retail Ops Commercial Customer Finance “Raw ingredients & food storage” “The kitchen & cooking process” “The finished meals & service” AS400 (until demise) Oracle (until demise) GA4 … Till system Order mgmt. system Single view of stock Production systems & services
  • 8. 8 Data lake architecture AS400 Oracle Amazon Aurora Amazon RDS On-premise DBs (legacy estate) Cloud DBs … Kafka Connect (Amazon MSK) APIs & SaaS DynamoDB tables Scraper (in-house crawler) Katalog UI Katalog DB (Aurora PgSQL) Right to erase / access Eraser / Accessor Success Data lake (Amazon S3) ▪ 5,000 datasets ▪ 98k fields ▪ 10.4M files Data lake S3 buckets ▪ JSON* ▪ Parquet ▪ AVRO ▪ CSV GOVERNANCE INGEST Data lake index (DynamoDB) Airflow Airflow 1 2 3 4 5
  • 9. 9 Data warehouse architecture 4 x ra3.16xlarge Data warehouse (Amazon Redshift) Data lake (Amazon S3) ELT orchestration COPY (data ingest) External tables (Amazon Redshift Spectrum) APIs & SaaS ▪ 2,670 tables ▪ 2m queries / month ▪ Layered data architecture ▪ Raw data stored in SUPER columns ▪ Hourly ELT with idempotent pipelines Cache (Amazon Aurora) Foreign data wrapper (pg_cron for scheduling) External schema (live federated queries) ▪ Used as fast storage layer for data apps ▪ Serves raw data for ELT data pipelines 1 2 3
  • 10. 10 New Redshift features we are excited about ▪ Long-awaited improvement that help us efficiently generate large pre- aggregated multi- dimensional cubes ▪ Great in combination with HLL functions for fast unique counts ▪ MERGE to simplify our incremental data pipelines ▪ S3 auto-copy to simplify data lake ingest pipelines ▪ Aurora zero-ETL integration to simplify CDC pipelines ▪ Create ”masked” versions of tables to improve data privacy and governance ▪ Eliminates overhead of maintaining multiple versions / slices of the data ROLLUP / CUBE 1 DATA MASKING 2 OTHER 3
  • 11. 11 BI & Analytics architecture Data warehouse (Amazon Redshift) Raw data layer Operational data layer BI data layer Cubes Consumers Raw unmodified data from source systems – ELT from data lake Clean, transformed, disaggregated entity relationship model – starting point for all reporting & analytics Customer, orders, product, stores, warehouse, stock master data Semi-aggregated datasets to enable fast reporting & analytics. Includes pre-computed HLL sketches for efficient unique counts. Multi-dimensional ROLAP cubes delivering pre-aggregated metrics along pre-defined dimensions. Best practice: CUBE/ROLLUP on top of pre-computed HLL sketches Data IDEs (JDBC) Data sharing Athena One-stop shop analytics APIs 1 2 3 4 5
  • 12. 12 Redshift enables all reporting & analytics use cases ▪ Official reporting built by central BI team ▪ Self-service analytics done autonomously within teams ▪ Field analytics embedded in customer-facing apps Registered users (self-service analytics)
  • 13. 13 Data Science & ML architecture Develop Train Serve Amazon Athena Amazon Redshift Amazon EC2 AWS Batch Aurora / RDS DynamoDB API Gateway AWS Lambda R / Python Notebooks Feature engineering Model development Model training Amazon Redshift Feature extraction pipelines Amazon Athena EC2 instances ML data layer Serverless 1 2 3
  • 14. 14 H&B data drives core business value & innovation ✓ Unit economics ✓ Store network planning ✓ Competitor intelligence ✓ Promo effectiveness ✓ Econometrics / MMM ✓ Space & range analytics Commercial Finance Wellness Supply chain ✓ Daily / weekly / monthly management reporting ✓ Operational trade reporting ✓ Intraday / peak reporting ✓ Exception reporting ✓ Single view of stock ✓ Forecasting & replenishment ✓ Fulfilment analytics ✓ Stock availability ✓ Clearance / overstock analytics ✓ Supplier analytics ✓ Diagnostics ✓ Health analytics ✓ Personalised wellness ✓ Behavioural engine Customer Digital ✓ Single customer view ✓ Customer lifecycle management ✓ eCRM enablement ✓ Customer lifetime value ✓ Digi marketing measurement ✓ Personalisation & search ✓ OKRs ✓ UX / funnel analytics ✓ Experimentation platform ✓ Web / app event tracking ✓ SEO analytics