Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:
> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?
4. From BI to Advanced Analytics
What will happen?
How can we do
better?
What happened?
When? And
Where?
How and why did
it happen?
Time
Data Size
4
Facts
Interpretations
5. Advanced Analytics that Saves Us Money
• Customer churn analysis
model
• Integrated customer support
and services
• Fraud detection
5
5
6. Advanced Analytics that Makes Us Money
• Product recommendation
$
6
6
engines
• Location-based real-time
offers
• Target-based pricing strategy
7. Traditional Advanced Analytics Process
Problem
ID
Project
Definition
Data Access Request
& Discovery
Data Transformation
Data Sampling
Model
Evaluation
Data
Preparation
Time-to-Insight
7
Model
Creation
Model
Development
Deploy
Model
Model
Deployment
9. Accessing the Right Data is Difficult
Multi-structured or
External Data
Structured
Internal Data
Data
Warehouse
9
10. “Are we there yet?”
2. Get access
to data
3. Learn
about the data
4. Move data to
ADW and
process data
1. Find
the data
6. Model
Deployment
Data Discovery
5. Data
Modeling
10
11. Silo’d Platforms Challenge Collaboration & Mgmt
Non-Agile Models
Data
Sources
Departmental
Warehouse
Enterprise
Apps
Departmental
Warehouse
Reporting
Silo’d
Analytics
Silo’d
Analytics
Opaque schemas accumulates over time
11
Silo’d
Analytics
12. Impact of Status Quo
Executives
“We don’t have the information
we need to answer key business
questions.”
Data
Scientists
“I’m sick of waiting for
my data, I’m going to
make my own copy.”
12
DBA/DW
Admins
“I need to make sure the
DW is secure & compliant
for the mission critical
reports.”
14. Use All Your Data
Use more data, and more types
of data, with existing tools
• Reduce the need to limit or
move large datasets
• Centralize information security,
metadata, management, and
governance
•
14
15. Shorten Analytics Lifecycle
Facilitate data discovery
• Track data life-cycle in
place
• Define, test, deploy, and
update models all within
a single platform
•
15
16. Do More with Data
Deliver multi-genre analytics
in a single platform
• Apply diverse concurrent
analytics to full datasets inplace
• Protect existing technology
and skillset investments
•
Search
EDH
Machine
Learning
BI
16
SQL
Query
In-memory
analytics
17. Cloudera EDH for Analytics
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
WORKLOAD MANAGEMENT
3RD PARTY
APPS
DATA
MANAGEMENT
BATCH
PROCESSING
STORAGE FOR ANY TYPE OF DATA
Filesystem
17
Online NoSQL
SYSTEM
MANAGEMENT
UNIFIED, ELASTIC, RESILIENT, SECURE
18. Cloudera EDH for Analytics
Use all data with
centralized mgmt
& security
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
WORKLOAD MANAGEMENT
UNIFIED, ELASTIC, RESILIENT, SECURE
HADOOP
Filesystem
18
Online NoSQL
SYSTEM
CLOUDERA MANAGER
MANAGEMENT
STORAGE FOR ANY TYPE OF DATA
3RD PARTY
APPS
DATA
MANAGEMENT
BATCH
MAPREDUCE
PROCESSING
19. Cloudera EDH for Analytics
Faster data
discovery
ANALYTIC
SQL
SEARCH
SEARCH
ENGINE
MACHINE
LEARNING
STREAM
PROCESSING
WORKLOAD MANAGEMENT
3RD PARTY
APPS
DATA
NAVIGATOR
MANAGEMENT
BATCH
PROCESSING
STORAGE FOR ANY TYPE OF DATA
Filesystem
19
Online NoSQL
SYSTEM
MANAGEMENT
UNIFIED, ELASTIC, RESILIENT, SECURE
20. Cloudera EDH for Analytics
Multiple tools on
one platform
ANALYTIC
IMPALA
SQL
SEARCH
ENGINE
SPARK/ ORYX
MACHINE
LEARNING
/ MAHOUT
STREAM
PROCESSING
WORKLOAD MANAGEMENT
RD
3RD PARTY
APPS
DATA
MANAGEMENT
BATCH
PROCESSING
STORAGE FOR ANY TYPE OF DATA
Filesystem
20
Online NoSQL
SYSTEM
MANAGEMENT
UNIFIED, ELASTIC, RESILIENT, SECURE
21. Cloudera EDH for Analytics
Operationalize
Models
ANALYTIC
SQL
SEARCH
ENGINE
MACHINE
LEARNING
SPARK
STREAM
STREAMING /
PROCESSING
FLUME
WORKLOAD MANAGEMENT
3RD PARTY
APPS
DATA
MANAGEMENT
BATCH
PROCESSING
STORAGE FOR ANY TYPE OF DATA
Filesystem
21
Online NoSQL
SYSTEM
MANAGEMENT
UNIFIED, ELASTIC, RESILIENT, SECURE
27. Analytics Process with EDH
Problem
ID
Project
Definition
Data Access Request
& Discovery
Model
Creation
Data Transformation
Data Sampling
Model
Evaluation
Data
Preparation
Time-to-Insight
27
Model
Development
Deploy
Model
Model
Deployment
28. Analytics Process with EDH
Problem
ID
Project
Definition
Data
Access
Request &
Discovery
Data
Transformation
Data
Sampling
Data
Preparation
Time-to-Insight
28
Model
Creation
Model
Evaluation
Model
Development
Deploy
Model
Model
Deployment
29. Analytics Process with EDH
Problem
ID
Project
Definition
Data
Access
Request
&
Discovery
Data
Transformation
Data
Preparation
Data
Sampling
Model
Creation
Model
Evaluation
Model
Development
Deliver Insights Sooner
29
Deploy
Model
Model
Deployment
30. Business Value Delivered
Data Scientists
Executives
DBA/DW
Admins
• Acquire data
necessary for projects
• Acquire necessary
information sooner to
make critical business
decisions
• Support both
reporting and
analytics needs
• Develop
analysis/models with
better lift faster
• Share data sets to
empower others
30
• Save resources with
shared security and
management
32. Ask Bigger Questions:
How can we prevent
re-admittance?
Kaiser Permanente helps providers
recommend at-home action based on real-time data
to prevent hospital visits.
32
32
32
33. Kaiser Makes Medical Data Actionable
The Challenge:
•
•
•
Re-admittance is expensive, reflects sub-par provider-to-patient communications
IT infrastructures can’t accommodate 24x7 data streams from devices
Diverse medical ontologies present data challenge
Kaiser Permanente helps providers recommend
at-home action based on real-time data to prevent
hospital visits.
The Solution:
Cloudera EDH provides a scalable, flexible
platform for collection, ingestion &
dissemination of healthcare information
• Ingests real-time data streams of multistructured data
•
33
34. Ask Bigger Questions:
How do we feed the world?
Monsanto can automate data-driven R&D
decisions to reduce time to market from
years to months.
34
35. Monsanto feeds our growing, global population
The Challenge:
• 1,000+ research scientists developing products in silos
• Data processing bottleneck slows development
• Time to market for new product is 5-10 years
Monsanto can automate data-driven
R&D decisions to reduce time to
market to months from years.
The Solution:
• Cloudera Enterprise + Search + Impala: PB-scale
platform for single view of all R&D data
• Integration: Exadata, spatial awareness &
visualization
• Scientists directly access CDH; Navigator offers
auditing & access control
35
36. ARE YOU READY TO START?
Answer
questions using
ALL YOUR DATA
36
37. QUESTIONS?
•
Try Cloudera today
Type in the “Chat” panel to ask
a question
cloudera.com/downloads
Learn more
•
http://tinyurl.com/membtaw
Tweet @cloudera
Register now for Data Analysts Training
•
•
37
Follow Josh @josh_wills
Follow Sandy @sandyliiwozniak
Recording will be available
on-demand at cloudera.com
university.cloudera.com
•
•
Use discount code Analytics10 to save 10%
on new enrollments in classes delivered by
Cloudera until May 2014*
Use discount code 15off2 to save 15% on
enrollments in two or more classes
delivered by Cloudera until May 2014*
* Excludes classes sold or delivered by Cloudera Partners
Challenge and ProblemsData discovery is 90% of the projectLong data discovery => Cannot iterate fast, cannot capture business value quicklyDS are expensive! Shorten the analytics lifecycle means you can get more project done in the same timeframe