Diese Präsentation wurde erfolgreich gemeldet.
31. Okt. 2017
1© Cloudera, Inc. All rights reserved.
Stefan Lipp/Jochen Faltermeier
CWIN 2017 - Frankfurt
2© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
Large enterprises fueling growth
Last 4 years Global
by data and new
Best of breed solutions
of solution &
Open source innovation
Global team doing
Big data innovators
Yahoo and Oracle
3© Cloudera, Inc. All rights reserved.
The data-driven enterprise
Explosion of data and devices (IoT)
Transformation of IT infrastructure
1 IDC Worldwide Big Data and Business Analytics Market Through 2020
4© Cloudera, Inc. All rights reserved.
data can make what is impossible
today, possible tomorrow
5© Cloudera, Inc. All rights reserved.
people to transform complex data
into clear and actionable insights
PRODUCTS & SERVICES (IoT)
6© Cloudera, Inc. All rights reserved.
the modern platform for machine learning and analytics
optimized for the cloud
7© Cloudera, Inc. All rights reserved.
DRIVE CUSTOMER INSIGHTS CONNECT PRODUCTS & SERVICES (IoT) PROTECT BUSINESS
Delivering greater value through
improved customer understanding
Powering predictive analytics to increase
performance and reduce fleet downtime
Creating new revenue streams with an advanced
Cloudera powering data-driven customers
8© Cloudera, Inc. All rights reserved.
Navistar is a leading manufacturer of commercial trucks, buses, defense vehicles and
engines. Since 1831, our history has been interwoven with some of the most defining
moments in world history. Whether it was America's westward expansion or WWII, we
were there, pushing the limits of what's possible and driving history forward. But that
doesn't mean we're stuck in the past. We're determined to keep delivering smart, sustainable
technologies - because we believe that innovation defines America's future, too.
9© Cloudera, Inc. All rights reserved.
The Data Challenge & Pre-Hadoop Challenge
In late 2013, Navistar launched OnCommand™ Connection. OnCommand™ Connection is
part of the OnCommand™ family of fleet Management Services from Navistar.
OnCommand™ Connection leverages data feeds from telematics service providers and
marries it with Meteorological, Geographical, Engineering, Vehicle Usage, Traffic,
Historical Warranty, Service and Part Inventory Data to provide:
Real-time vehicle performance data streamlined within a single portal.
Service Advisory’s and Scheduling before problems occur
Optimized service plans and part delivery to the nearest dealer when problems do occur
We now actively monitor more than 300,000 vehicles and are adding to that total daily
10© Cloudera, Inc. All rights reserved.
Using Predictive Maintenance to Improve
Performance and Reduce Fleet Downtime
• OnCommand Connection is collecting
telematics and geolocation data across
• Reduced maintenance costs to $.03 per
mile from $.12-$.15 per mile
• Centralizing data from 13 systems with
varying frequency and semantic
• Real-time visibility of ca. 300,000 trucks
in order to improve uptime and vehicle
» SERVICE IMPROVEMENT
» PREDICTIVE ANALYTICS
» PROCESS IMPROVEMENT
11© Cloudera, Inc. All rights reserved.
Benefits & Impact
Quantifying Hadoop’s impact:
By having literally all of our data in one place, we can perform analytics on an ad-hoc
basis. Historically, simple questions required months to answer as we built out subject
areas and transformed data.
Our “Publish” Cluster brings the data to the consumer and it is certified.
We have reduced hard dollar spending on proprietary hardware and expensive disk
solutions, but also soft dollars in our speed to deliver answers.
We can evaluate “what if” scenarios without the risk of impacting production processes.
We can evaluate billions of rows of data and deliver answers in hours not weeks.
12© Cloudera, Inc. All rights reserved.
Data/Software >Analytics >Automation >AI is eating the world
„the innovation foodchain“ MarcAndreessen
Navistar IR Deck – H1 2017
− Connected services to reduce
maintenance cost and improve
− Advanced driver assistance
systems and platooning to
improve fuel efficiency
− Automated record-keeping to
enhance driver productivity
13© Cloudera, Inc. All rights reserved.
#1 Telematics provider with 130 billion miles
of driving data collected from black boxes in
• Drive analytics on 12 million miles of
driving data collected every hour
• Telematics solution based on Cloudera to
process data from black boxes
• Analytics around driving behavior, risks,
location, braking patterns, contextual
elements and crash information
• Provide Usage Based Insurance services
» CONNECTED VEHICLES
» INSURANCE TELEMATICS
» PREDICTIVE ANALYTICS
Connected Car Telematics for Insurance
14© Cloudera, Inc. All rights reserved.
15© Cloudera, Inc. All rights reserved.
The IoT Ecosystem &Architecture
IoT Data Storage, Processing & Analytics
Centralized IoT Analytics
• Time Series Data, Trends
• Machine Learning
• Context Enrichment
• Deeper business insights
Processing & Analytics
• Cloud & On-PremiseConnected Things
• Analytics at the edge
• For immediate response
Enterprise Data Sources
Combining sensor data with contextual data is the
key to value creation from IoT
16© Cloudera, Inc. All rights reserved.
17© Cloudera, Inc. All rights reserved.
The Cloudera Platform for IoT – Data Mgmt. Value Chain
Data Sources Data Ingest Data Storage & Processing
Serving, Analytics &
ENTERPRISE DATA HUB
Stream or batch ingestion of IoT data
Ingestion of data from relational sources
Storage (HDFS) & deep batch processing
Storage & serving for fast changing data
NoSQL data store for real time
MPP SQL for fast analytics
Real time searchConnected Things/ Data
Structured Data Sources Security, Scalability & Easy Management
Stream & iterative processing, ML
18© Cloudera, Inc. All rights reserved.
Cloudera for IoT – Key Innovations / Differentiators
Ideal for real-time analytics on IoT
and time series data. Simplifies
Lambda architectures for running
real-time analytics on streaming data
Preserve business flexibility and data
portability and minimize cloud lock-in
by running in any one of the three
major public cloud providers or in
Kudu: Real-Time Analytics Shared Data Experience SDX Data Science Workbench
Collaborative hub for enterprise
data science and an integrated
development environment for
running Python, R, & Scala with
support for Spark
19© Cloudera, Inc. All rights reserved.
Fast Scans, Analytics
and Processing of
(on fast-changing or
Kudu – Fast Analytics on Fast Data
Pace of Analysis
Pace of Data
20© Cloudera, Inc. All rights reserved.
S3 | ADLS | HDFS | KUDU
The modern platform for machine learning and analytics optimized for the cloud
21© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent controls,
even for transient and recurring workloads
• Consistent governance – enables secure self-service access to all
relevant data and increases compliance
• Easy workload management – increases user productivity and boosts
• Flexible ingest and replication – aggregates a single copy of all data,
provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and business
context of data for new applications and partner solutions
Open platform services
Built for multi-function analytics | Optimized for cloud
22© Cloudera, Inc. All rights reserved.
Shared: Data, Operations, Governance, Security, Metadata
Data Engineering Data Science Deployment
& Testing Batch Scoring
Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration
Support the complete data science workflow
From data to exploration to action
23© Cloudera, Inc. All rights reserved.
Accelerates data science from
development to production with:
● Secure self-service data access
● On-demand compute
● Support for Python, R, and Scala
● Project dependency isolation for
multiple library versions
● Workflow automation, version
control, collaboration and sharing
Cloudera Data Science Workbench
Self-service data science for the enterprise
24© Cloudera, Inc. All rights reserved.
Amodern data science architecture
gateway nodes CDH nodes
● Built on Docker and Kubernetes
● Runs on dedicated gateway nodes
● User sessions run in isolated “engine”
○ Host Kerberos-authenticated
○ Interact with Spark via YARN
client mode (Driver runs in
container, workers on CDH)
● Single-cluster only (for now)
Hive, HDFS, ...
25© Cloudera, Inc. All rights reserved.
“Our data scientists want GPUs, but we
can’t find a way to deliver multi-tenancy.
If they go to the cloud on their own, it’s
expensive and we lose governance.”
●Extend existing CDSW benefits to GPU-
optimized deep learning tools
●Schedule & share GPU resources
●Train on GPUs, deploy on CPUs
●Works on-premises or cloud
Accelerated deep learning on-demand with GPUs
Data Science Workbench
Multi-tenant GPU support on-premises or cloud
26© Cloudera, Inc. All rights reserved.
Open Ecosystem Black Box
An open ecosystem for agility and innovation
27© Cloudera, Inc. All rights reserved.
Run anywhere. Deploy any way.
Simple Unified Enterprise
Proven at scale
Hybrid or multi cloud
Works with your tools
28© Cloudera, Inc. All rights reserved.
RealtimeAnalytics bzw. OperationalAnalytics?
„apply logic and mathematics real-time on data to improve operations“
Model Analyze Repeat
# Aggregate relational, NoSQL, structured & unstructured data
# Accelerate data science from exploration to production using R, Python, Spark and more
# Deploy pipelines and models on-premise or in the cloud.
Seeking Abnormal Behavior
# Serve real-time data at scale for real-time decision making
# Stream processing & analytics on changing operational data
29© Cloudera, Inc. All rights reserved.
Lohnt sich das überhaupt?
HW > Data/Software > Analytics > Automation > AI/ML Technology Foodchain aus „Digital or Dead“