2. DataOps
Definitions
VP Technology Strategy, MapR
DataOps is an agile methodology for developing and deploying data-intensive
applications, including data science and machine learning. A DataOps workflow supports
cross-functional collaboration and fast time to value.
http://www.gartner.com/it-glossary/data-ops/
A hub for collecting and distributing data, with a mandate to provide controlled access to systems
of record for customer and marketing performance data, while protecting privacy, usage
restrictions, and data integrity..
Tamr CEO Andy Palmer
DataOps is an enterprise collaboration framework that aligns data-management
objectives with data-consumption ideals to maximize data-derived value.
Nexla CEO
DataOps is the function within an organization that controls the data journey from
source to value.
3. DataOps
Gartner
Data & Analytics Summit 2018
DataOps, la plataforma de base de datos de nube privada como servicio (dbPaaS) y la gestión de
datos habilitados para el aprendizaje automático.
DataOps es una nueva práctica sin estándares ni frameworks
Nick Heudecker, vicepresidente de investigación de Gartner
5. DataOps
Brings Flexibility & Focus
Expands DevOps to include data-heavy roles
Organized around data-related goals
Better collaboration and communication between roles
6. DataOps
AN AGILE METHODOLOGY
FOR DATA-DRIVEN
ORGANIZATIONS
AXIOMS:
Continuous model deployment
Promote repeatability
Promote productivity -- focus on core competencies
Promote agility
Promote self-service
Data is central to disruptive enterprise applications
• Lightweight, stateless functions do not represent the majority of workloads
Data science and machine learning are an important paradigm
• Scientists become active users -- no longer just application developers
• Iterative workflow with different data usage patterns
Data volumes continue to grow
Moving data is a performance bottleneck
DataOps Goals:
7. DataOps 7
Analyze and VisualizeStore and ProcessConnect and Integrate
Structured
Data
Unstructured
Data
1010101
01010 Sandboxes
Data lakes
Varying data
types
Quick and actionable
business insights
Focus on algorithms,
not infrastructure
Data available from
structured and
unstructured sources
Data marts / warehouses
DATA PLATFORM DATA Stream DATA ANALYTICS
8. Data Science
Platforms CLOUD PROVIDERS
ETL & DATA
ENGINEERING VERTICAL
APPLICATIONS
BI & VISUALIZATION
TOOLS
SECURIT
Y
INFRASTRUCTU
RE
LIBRARIE
S
TOOL
S
DATA PLATFORMS
DATA SCIENCE PLATFORMS
9. DataOps
Approach Advantages
Data Self-Service
• Data Scientists need to develop Use Cases
quickly using the enterprise’s data without
any restrictions from IT.
Improved efficiency and better use of Team’s time
• Deploy Analytic platform in one click
Faster Time-to-Value
Improve productivity
• Implement use cases in parallel using the
same data, but with dedicated platforms to
each analytic teams. Storage
Compute
LIBRARI
ES
TOO
LS
DATA SCIENCE
PLATFORMS
10. DataOps
Continuous Model
Deployment
Key Building Blocks for Agility:
• Unified data platform
• Data governance
• Self-service data and compute access
• Multitenancy and resource management
Data
Engineering
Model
Development
Model
Management
Model
Deployment
Model Monitoring &
Rescoring
13. DataOps
Data-Driven Architecture
Traditional and Modern
Legacy, Custom, Mainframe, SaaS, Microservices, …
Source: Oracle Insight
Data Platform
Analytics
• Advanced Analytics
• Self-service
• Predictive
Data Science
• Machine Learning
• Deep Learning
Modern Data
Platform
Security & Compliance
X Data
Applications
Real-time Analytics
• Real-time
Marketing
• Fraud detection • Exec
Dashboarding
Real-time
Real-time Services
{OOP}
SparklineData
• Accessing multiple source of data
(Technologies, Silos/Locations,
Clouds) …
• … with high performances …
• … for broader Cross Multi-model
queries/algorithms on real-time
data as well as historical data
Applications
BigData SQL
14. DataOps
Cloud Native & Open Source
Community
Artificial
Intelligence Block Chain Internet of
Things
Container Native Microservices
Open Serverless Computing DevOps
Prometeus
Open Source
Cloud Native
Innovation
Open Source
Cloud Native
Development
ISTIO
Cloud-Native and Community Driven Innovation
Open Source Managed and Autonomous Cloud Native
15. DataOps
Data Stream
Data Preparation
Data Replication
Data ETLLogs
Oracle Cloud Infrastructure
Analytics
Consumers
Data Platform
BI
NL / AI
Data Integration
CDC / ETL
Discovering Structuring Cleaning Enriching Validating Deploying
17. Oracle Data
Science
Data Science Requires a Comprehensive Platform to Simplify Operations
and Deliver Value at Scale
• Accelerate use of proper tools, frameworks and infrastructure
• Overcome restricted skillsets with a simple, collaborative platform
• Quickly leverage predictive analytics to drive positive business outcomes
Collaborate
securely
Power
business
Work in standardized
environments
A Robust, Easy-to-Use Data Science Platform Removes Barriers to
Deploying Valuable Machine Learning Models in Production
Manage data
and tools
18. Oracle Data
Science
Projects LifeCycle
Reproducibility
Data
Versioning
Code
Versioning
Model
Versioning
Environment
Management
Model Deployment
Operationalize Models as
Scalable APIs
Model Management
Monitor and Optimize Model
Performance
Data Exploration
Collaborative Data Analysis /
Feature Engineering
Model Build and
Train
with Open Source
Frameworks
Collaborators
∙ Data Scientists
∙ Business Stakeholders
∙ App Developers
∙ IT Admins
Business
Analyst/Leader
Defining business
problem and
objective of analyses
Data Engineer
Prepare data, build
pipelines, and provide
data access for
analytical or
operational uses.
IT Admin
Oversees underlying
process, architecture,
operations, resource
constraints.
Data Scientist
Analyze data using
statistical methods
and coding languages
like Python, R, Scala
Application
Developer
Deploy data science
models into
applications. Build
data products.
19. Oracle Data
Science
Modules
Collaborative
Integrated
Enterprise-Grade
Oracle Data Science Cloud
Oracle PaaS & IaaS
Projects Notebooks
Open Source
Languages &
Libraries
Version Control Use Case
Templates
Model
Build & Train
Self-Service Scalable Compute (OCI)
Object
Store
Catalog Data Lake Streaming
Autonomous
Data Warehouse
Model
Deployment
Model
Monitoring
Access
Controls &
Security
Project driven UI enables teams to easily
work together on end-to-end modeling
workflows with self-service access to data
and resources
Support for latest open source tools, version
control, and tight integration with OCI and
Oracle Big Data Platform
A fully managed platform built to meet the
needs of the modern enterprise
21. Oracle Data
Science
Configure, Train & Deploy
Oracle PaaS
Language
Image
Video
HREmotion
Easy Deployment
3
Deploy
Model
Train
Data
Definitio
n
Model
Test
Publish
API
Data
Select
Code
Noteboo
k
2
Train
• Frameworks
• AI libraries
• Samples
• GPU clusters
• Connect to data
• Auto scale, updates
• HS network, storage
•Object Stores
•Database CS
•Spark
Easy Data Access
+
1
Configure
Autonomous
Setup
Model Sharing Model Library APIsModel Analytics
IT Persona
DevOps
Data Scientist
Data Scientist
Easy Development
Easy setup
24. DataOps
Conclusiones
Multi-Model Data Access
Interoperability
Data preparation and pipeline
Automation
Elasticity
Multidimensional agility
Automated governance
Next Generation
Platform for
All Data
Complete,
Integrated, Open
AI and Machine
Learning
ALL IN ONE
ORACLE PROVIDES