Depuis les années 1980, le volume de données produit et le risque lié à ces données ont littéralement explosé. 90% des données existantes aujourd’hui ont été créé ces 2 dernières années, dont 80% sont non structurées. Avec plus d’utilisateurs et le besoin de disponibilité permanent, les risques sont beaucoup plus élevés.
Quels sont les paramètres de bases de données qu’un décideur doit prendre en compte pour déployer ses applications innovantes?
3. Digital Platforms Have Changed
The platforms your end users and customers use to engage with your applications and services have
fundamentally changed at an unprecedented speed over the past 5 years.
UPFRONT SUBSCRIBE
Business
YEARS / MONTHS WEEKS / DAYS
Applications
PC MOBILE / BYOD
Customers
ADS SOCIAL
Engagement
SERVERS CLOUD
Infrastructure
4. Goals of Digital Transformation
1. Unlocking operational
intelligence
2. Enhancing business
agility
3. Improving customer-
centricity
Source
https://451research.com/report-short?entityId=90066
http://www.slideshare.net/JakeHird/101-digital-transformation-statistics-2016
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Boosting bottom line in
5 years
Competing in new
segment in 3 years
Disadvanted by lack of
transformation
Actively digitizing
business
5. Challenges of Digital Transformation
Existing Systems
Overwhelmed
Growth in
Siloed Data
Lack Real-Time
Insight
6. Data Warehouse Challenges
“Of Gartner's "3Vs" of big data
(volume, velocity, variety), the variety
of data sources is seen by our clients
as both the greatest challenge and
the greatest opportunity.”*
Data Variety
Diverse, streaming
or new data types
Data Volume
Greater than 100TB
Other Data
Less than 100TB
* From Big Data Executive Summary of 50+ execs from F100, gov orgs; 2014
7. TRADITIONAL MODERNIZED
APPS On-Premise, Monoliths SaaS, Microservices
DATABASE Relational (Oracle) Non-Relational (MongoDB)
EDW Teradata, Oracle, etc. Hadoop
COMPUTE Scale-Up Server Containers / Commodity Server / Cloud
STORAGE SAN Local Storage & Data Lakes
NETWORK Routers and Switches Software-Defined Networks
The New Enterprise Stack
8. Data as a Cross-Enterprise Asset
1. Re-use data to power multiple apps
2. Enrich, analyze & monetize the data
3. Enforce privacy and governance
Data Pipeline
Ingest & Store Query & Transform Aggregate & Share Analyze
10. 3 Patterns to Turn Data into a Cross-Enterprise Asset
Single
View
Data-as-
a-Service
Operationalized
Data Lake
11. Single View
• Efficiently retrieve status of any
business entity in real time
• Foundation for analytics: i.e. cross-
sell, upsell, churn risk
• REQUIREMENTS:
– Flexible schema + data
governance
– Rich query, aggregation, search &
reporting
– Highly scalable & continuously
available
13. Solution: Aggregate with a Dynamic Schema
…Mobile
App
Web
Call
Centre CRM Social
Feed
COMMON FIELDS
CustomerID | Ac/vity ID | Type…
DYNAMIC FIELDS
Can vary from record to record
Single View
14. High Level Data Flow
Source:
Web App
Source:
CRM App
Source:
Mainframe
System
Batch or
real-time
Documents/
Objects
Customer
Service App
Churn
Analytics
Risk Model
Real-Time Access
Update
Queue
…
Group
Filter
Sort
Count
Average
Deviations
Validation
15. Single View of Customer
Insurance leader generates coveted single view of
customers in 90 days – “The Wall”
Problem Why MongoDB ResultsProblem Solution Results
No single view of customer, leading
to poor customer experience and
churn
145 years of policy data, 70+
systems, 24 800 numbers, 15+
front-end apps that are not
integrated
Spent 2 years, $25M trying build
single view with Oracle – failed
Built “The Wall,” pulling in disparate
data and serving single view to
customer service reps in real time
Flexible data model to aggregate
disparate data into single data
store
Expressive query language and
secondary indexes to serve any
field in real time
Prototyped in 2 weeks
Deployed to production in 90 days
Decreased churn and improved
ability to upsell/cross-sell
17. DaaS Architecture
API Access Layer
Operational Data
Customers
Products
Accounts
Transactions
Infrastructure
App1 App2 App3
• Shared, multi-tenant database
accessible via a common API
• Exposes CRUD, search,
geospatial, graph, analytics
• Each data domain isolated into
its own collection
• Access privileges and views
defined for each collection
• Self-service provisioning, scaling
on-demand
18. Square Enix: DaaS
• Multi-tenant OnLine Suite
• DaaS to studios & developers,
exposed as an API
• On-Prem Private Cloud:
Manages data shared by all titles
• Player profiles
• Credits
• Leaderboards
• Competitions
• Catalog
• Cross-platform messaging
API Access Layer
MongoDB Shared Data Service
On-Prem Infrastructure (Private Cloud)
• In-App functionality
provisioned to private
clusters on AWS
• Game state
• Player metrics
• Game-specific
content & features
• Elastically scalable
19. Data Lake
• Centralized repository for analytics
against data collected from
operational systems
• Extension of EDW: often
based on Hadoop
• 50% of organizations invested in
data lakes*
* Gartner
21. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Framework
s
Millisecond latency. Expressive querying & flexible indexing against subsets
of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data
stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstream
s
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
22. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Framework
s
Millisecond latency. Expressive querying & flexible indexing against subsets
of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data
stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstream
s
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Configure where to
land incoming data
23. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Framework
s
Millisecond latency. Expressive querying & flexible indexing against subsets
of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data
stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstream
s
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Raw data processed to
generate analytics models
24. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Framework
s
Millisecond latency. Expressive querying & flexible indexing against subsets
of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data
stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstream
s
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
MongoDB exposes
analytics models to
operational apps.
Handles real time
updates
25. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Framework
s
Millisecond latency. Expressive querying & flexible indexing against subsets
of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data
stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstream
s
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Compute new
models against
MongoDB &
HDFS
26. Operational Database Requirements
1 “Smart” integration with the data lake
2 Powerful real-time analytics
3 Flexible, governed data model
4 Scale with the data lake
5 Sophisticated management & security
27. Problem Why MongoDB ResultsProblem Solution Results
Existing EDW with nightly
batch loads
No real-time analytics to
personalize user experience
Application changes broke ETL
pipeline
Unable to scale as services
expanded
Microservices architecture running on AWS
All application events written to Kafka queue,
routed to MongoDB and Hadoop
Events that personalize real-time experience (ie
triggering email send, additional questions,
offers) written to MongoDB
All event data aggregated with other data
sources and analyzed in Hadoop, updated
customer profiles written back to MongoDB
2x faster delivery of new
services after migrating to new
architecture
Enabled continuous delivery:
pushing new features every
day
Personalized user experience,
plus higher uptime and
scalability
UK’s Leading Price Comparison Site
Out-pacing Internet search giants with continuous delivery pipeline
powered by microservices & Docker running MongoDB, Kafka and
Hadoop in the cloud
28. Patterns for Modern Data Architectures
Existing Systems
Overwhelmed
Growth in
Siloed Data
Lack Real-Time
Insight
Single View Data-as-a-Service Operationalized
Data Lake