4. Line Of Business
HR Finance Sales Customers
Competitors Markets Products Supply
Trafic
Acquisition
Communication Security Prospects
* If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/
5. Use Line Of Business
•LOB 1
( Customer )
BI Team
DataScience
Team
LOB 2
( Support )
BI Team
DataScience
Team
LOB 3
…
BI Team
DataScience
Team
6. Data Office
Data
Centralization
Datalake
Cleansing
Data
Integration
Data Office
CRM
BI Team
Data Science
Team
• ExtractsData
Analyst
•Events
•Actions
Customer
Animation
•Product Analysis
•Global AnalysisBUS
•Country Analysis
SUBS
•PAC
•Analyse AdhocDigital
•Onsite
•PartnerBIZDEV
•Campaigns
•Text mining
Trafic
Acquistion
•Segmentation
•Normalisation
Targeting
Channel
Incaseyoumisseditonthepreviousslide,ifyouworkinthedatafield,
weareinterestedinyourprofile!
7. Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
8. Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2:
Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
9. Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2:
Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
Level 3:
Automatic
Data are created through a controlled business process
Data are automatically added to the enterprise model
Data can be used by all data scientists, data analysts or business analysts
10. Data Maturity Matrix
Customers Competitors Products
Advanced 5 Potential Strategy
4 Attrition New Product
3 Churn Rank
2 Adds Event
Basic 1 NIC Pricing …
11. Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
12. Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
Technical model
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
13. Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
14. Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Business Intelligence Team
POC
Expose
POC
POC Mode
Data Lake Team
15. Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
DataCommitee
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model Building
Datamart and report
building
Business Intelligence Team
DTM
Data Prepare:
industrialise
POC
Datastore 360
Level 2 & 3
mode
Expose
POC
Entreprise model
POC Mode
Data Lake Team
16. Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
DataCommitee
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model Building
Datamart and report
building
Business Intelligence Team
DTM
Data Prepare:
industrialise
Build Datamart and
Dashboard
POC
Datastore 360
Expose
POC
Entreprise model
POC Mode
Level 2 & 3
mode
Data Lake Team
17. Data Commitee
Define data that needs to be added to
enterprise data
Define priority and owners by subject
Industrialise New data production : from
excel to full business process
Validate enterprise model
– Common vocabulary
– Business and/or Functional model
Be informed of evolution
Participant
Data Scientist
Data Analyst
Business Analyst
Data Management Team
Periodicity
Every month
Objectives
18. Datastore 360
EDS 360
History
Get all data from
– Front office application
– Back Office Application
– External Data
Stores data in a business oriented model
Responsable to historize data when this makes
sense for the business
– What data do we want to keep ? What will I need in 20 years ?
Expose data to all application that requires it
– Business Intelligence : reporting or datamart
– Front office Application
Current
Client Produit Activity
Client Produit Activity
…
…
Data Scientist
Data Analyst
Business Analyst
DataViz
User APPs
(CRM,
Support
api
api Direct
read
A secured cluster accessible through a gateaway
Computing layer is based on Public cloud instances in order to scale fastly
On the other hand Cold Storage is based on dedicated server for higher performances
Technologie vRACK pour le réseau dédié
Public Cloud pour la scalabilité
A secured cluster accessible through a gateaway
Computing layer is based on Public cloud instances in order to scale fastly
On the other hand Cold Storage is based on dedicated server for higher performances
Technologie vRACK pour le réseau dédié
Public Cloud pour la scalabilité -> datanode
Hadoop ecosystem with HDFS for data storage, Hbase plus phoenix for SQL support on columnar storage -> Relationnal data storage layer
CouchBase for document data storage. Key, value can either be stored into HDFS or couchbase depending on their access rate
Processing is made by Spark / Flink / Pig. Each of these solution has its strong points, but spark and flink may be abstracted as a apache Beam layer in incoming versions.