SlideShare ist ein Scribd-Unternehmen logo
1 von 39
1© Cloudera, Inc. All rights reserved.
Transforming into a data-driven enterprise:
Paths to success
Philip Carnelley | Research Director, IDC
Michael Wrisley |Analytic Sales Enablement Director, Intel
Wim Stoop | Senior Product Marketing Manager, Cloudera
Transforming into a data-driven enterprise:
paths to success
Philip Carnelley, Research Director, IDC Europe
Digital Transformation: A Board Level Agenda Item
3© IDC Visit us at IDC.com and follow us on Twitter: @IDC
80% of large
European
companies
have “DX” at
the heart of
their
corporate
strategy
Source: IDC, European DX Survey 2017
21%
26%
29%
18%
6%
Digital Resister Digital Explorer Digital Player Digital Transformer Digital Disrupter
Many Organizations Are at a DX Deadlock
© IDC 4
55%
Source: IDC, European Digital Transformation Maturity Model Benchmark, 2017; n=403, May 2017
© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Getting the Pulse to Test Our Ideas
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 5
750 Business and IT Leaders
Across Western Europe
All Major Industries
0 50 100 150 200
Finance and
Insurance
Telco and Media
Public Sector /
Government
Retail
Energy and
Utilities
Manufacturing
and Automotive
Source: IDC survey for Cloudera and Intel, 2017
Recognizing the Significance of Big Data
Analytics to Digital Transformation
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 6
43% 70%
Source: IDC survey for Cloudera and Intel, 2017
Now In2years
“Important/Very Important”
“Important/Very
Important”
The New Digital Platform
7© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Source: IDC
EXTERNAL
PROCESSES
Connected
Processes
Assets
People
INTERNAL
PROCESSES
INTELLIGENT
CORE
Mobile
IoT
AR/VR
BOT
API
The New Digital Platform
8
EXTERNAL
PROCESSES
Connected
Processes
Assets
People
INTERNAL
PROCESSES
INTELLIGENT
CORE
Mobile
IoT
AR/VR
BOT
API
© IDC Visit us at IDC.com and follow us on Twitter: @IDC
Source: IDC
INTELLIGENT
CORE
Databases
Data
StreamsBig Data
AI/MLAnalytics Decision
Support
But …
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 9
12%
44%
33%
11%
Still exploring
Enterprise-wide
platform being
established
Platform available to
customers and partners
Source: IDC survey for Cloudera and Intel, 2017
37% Infrastructure is unsuitable
44%Skills issues
55% Uncoordinated
Used in isolated
pockets
Paths to Success
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 10
Adopt a flexible hybrid
deployment model
Seek to exploit
advanced analytics /
AI
Choose a suitable
platform for advanced
analytics
What Do We Mean By AI?
AI can be viewed in three layers:
• Artificial intelligence — the broadest term, applying to any technique that enables
computers to mimic human intelligence. More precisely, AI is the study and
development of software and hardware that attempts to emulate a human being in
learning and reasoning.
• Machine learning — A subset of AI: the process of creating a statistical model from
various types of data that perform various functions without having to be programmed
by a human. Machine learning models are "trained" by various types of data (often, lots
of data).This category includes deep learning.
• Deep learning — The subset of machine learning composed of algorithms that permit
software to train itself to perform tasks, like speech and image recognition, without
specifying outcomes or goals. These generally rely on the input of large amounts of
data.
Cognitive computing / AI software systems are self-learning, reasoning systems that can augment
or replace human decision-making in situations that involve complexity, very high information volumes,
and/or uncertainty. They are adaptive, iterative and contextual, and make a new class of problems
computable.
AI Systems learn as they operate. They replace logic with data as the primary behavior driver.
They are therefore critically dependent on (big) data.
11© IDC Visit us at IDC.com and follow us on Twitter: @IDC 11
AI
ML
DL
Establish a Flexible, Hybrid Deployment
Model
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 12
Source: IDC survey for
Cloudera and Intel, 2017
46%
Using open source data
science frameworks and
languages
Seek to Exploit Advanced Analytics, AI and
Machine Learning
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 13
94%
74%
17%
9%5%
20%
31%
17%
Descriptive Predictive Prescriptive Cognitive analytics
Using now
Planning to
use
Source: IDC survey for Cloudera and Intel, 2017
Establish a Suitable Platform for Big Data,
Advanced Analytics and AI
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 14
25%
A quarter of organisations believe
data science to be very or extremely
important to their Big Data Analytics
environment.
This will grow.
31%
Almost one third of
organisations plan to use
self-learning and AI
techniques, e.g. deep
learning and neural nets.
Standard
hardware
platforms have a
key role to play
Recap: Paths to Success
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 15
Adopt a flexible hybrid
deployment model
Seek to exploit
advanced analytics /
AI
Choose a suitable
platform for advanced
analytics
Digital Winners are Leaders in Information
16
Source: IDC Custom Research 2016
© IDC Visit us at IDC.com and follow us on Twitter: @IDC
The more mature an organization is in its information
strategy,
the more impactful its digital transformation efforts are.
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 17
Philip Carnelley
Research Director, Enterprise Software
IDC Europe
pcarnelley@idc.com
@PCarnelley
Data Center Group
Michael Wrisley
Industry Technical Specialist
Data Center Group
Begin your AI journey today using
existing, familiar infrastructure
DL training in days HOURS with up
to 113X2 performance vs. prior gen
(2.2x excluding optimized SW1)
Robust support for full range of
AI deployments
Intel® Xeon® Scalable Processors
Scalable performance for widest variety of AI & other datacenter workloads –
including deep learning
1,2Configuration details on slide: 4, 5, 6
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For
more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These
optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to
Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction
sets covered by this notice.
Notice Revision #20110804
The AI you need
On the chip you know
Built-in ROI
Potent
Performance
Production
Ready
Data Center Group
Intel’s Role in Accelerating Analytics & AI
Holistic Strategy from Edge-Cloud to the Enterprise
¥Note: Intel® Data Analytics Acceleration Library, Intel® Math Kernel Library, Intel® Math Kernel Library for Deep Neural Networks, BigDL: Distributed Deep Learning on Apache Spark*,
MLib: Apache Spark’s Scalable Machine Learning Library
*Other names and brands may be claimed as the property of others.
Co-
Optimizin
g
Applicatio
ns
Optimized
Libraries Intel® MKL¥ Intel® MKL-DNN¥Intel® DAAL¥Intel® Distribution for Python*
Intel® Nervana™ GraphMovidius MvTensor LibraryMLib* BigDL
Open
Source
Enabling
HARDWA
RE/
SOFTWA
RE
Networking
Lake
Crest
Compute Memory & Storage Artificial Intelligence
Solutions
Data Center Group
BigDL – DL On Your Existing Infrastructure, Now
Make deep learning more accessible to big data and data science
communities
*Other names and brands may be claimed as the property of others.
Continue the use of
familiar SW tools and
HW infrastructure to
build deep learning
applications
Analyze “big data”
using deep learning on
the same Apache
Hadoop*/Spark* cluster
where the data are
stored
Add deep learning
functionalities to the Big
Data (Spark) programs
and/or workflow
Leverage existing
Hadoop/Spark clusters
to run deep learning
applications
Dynamically share with other
workloads (e.g., ETL, data
warehouse, feature engineering,
statistic machine learning, graph
analytics, etc.)
Data Center Group
BigDL Industry Support – Start Today!
Technology Cloud Service
Providers
End Users
Data Center Group
More Resources…..
www.intel.com/bigdata
www.intel.com/ai
www.intel.com/software
Thank You!
26© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
26
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE
SERVICES
CORE SERVICES
DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA CATALOG
INGEST &
REPLICATION
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
DATA
SCIENCE
Amazon S3 Microsoft ADLS HDFS KUDU
STORAGE
SERVICES
27© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent controls,
even for transient and recurring workloads
• Consistent governance – enables secure self-service access to all
relevant data and increases compliance
• Easy workload management – increases user productivity and
boosts job predictability
• Flexible ingest and replication – aggregates a single copy of all data,
provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and business
context of data for new applications and partner solutions
Open platform services
Built for multi-function analytics | Optimized for cloud
28© Cloudera, Inc. All rights reserved.
5 keys to success
1) Build a data-driven culture
2) Develop the right team and skills
3) Be agile/lean in development
4) Leverage DevOps for production
5) Right-size data governance
28© Cloudera, Inc. All rights reserved.
29© Cloudera, Inc. All rights reserved.
World-class training, services, and support
3 top big data
certifications
Cloudera University
Fastest route from zero
to production
Professional Services
SCP-certified support
anywhere in the world
Cloudera Support
30© Cloudera, Inc. All rights reserved.
Published research subscription service
Delivers cutting edge advances in applied ML / AI
Accelerates adoption in large enterprises
Drives demand for our platform
Applied research for machine
learning and data science
Continued machine
learning innovation
30© Cloudera, Inc. All rights reserved.
31© Cloudera, Inc. All rights reserved.
Thank you
Data Center Group
Data Center Group
Notices and Disclaimers
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration. Check with your system manufacturer or retailer or learn more at intel.com.
No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other
sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit
http://www.intel.com/performance.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For
more complete information visit http://www.intel.com/performance.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include
SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not
manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets
covered by this notice.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide
cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are
accurate.
© 2017 Intel Corporation.
3D XPoint, Arria, the Arria logo, Intel, the Intel logo, Intel Nervana, Intel Optane, Intel RealSense, Intel Xeon Phi, Stratix and Xeon are trademarks of Intel Corporation in the U.S. and/or
other countries.
*Other names and brands may be claimed as property of others.
Data Center Group
Notices and Disclaimers
Slide 23 under Potent Performance current footnote #1 (2.2x performance)
2.2X higher deep learning training and inference performance than the prior generation: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo
disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux* release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64.
SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘,
OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Compared with Platform: 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22
cores), HT enabled, turbo disabled, scaling governor set to “performance” via acpi-cpufreq driver, 256GB DDR4-2133 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel
3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC). Performance measured with: Environment variables:
KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance. Neon: ZP/MKL_CHWN
branch commit id:52bd02acb947a2adabb8a227166a7da5d9123b6d. Dummy data was used. The main.py script was used for benchmarking , in mkl mode. ICC version used : 17.0.3
20170404, Intel® MKL small libraries version 2018.0.20170425; Inference and training throughput uses FP32 instructions.
Data Center Group
Slide 23 under Potent Performance current footnote #2 (113x)
https://www.intel.com/content/www/us/en/benchmarks/server/xeon-scalable/xeon-scalable-artificial-intelligence.html
Notices and Disclaimers
Platform 2S Intel® Xeon® Platinum 8180 processor CPU @ 2.50GHz (28 cores) 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores)
Hyper Threading HT disabled HT enabled
Turbo Turbo disabled Turbo disabled
Driver Scaling governor set to “performance” via intel_pstate driver Scaling governor set to “performance” via acpi-cpufreq driver
Memory 384GB DDR4-2666 ECC RAM 256GB DDR4-2133 ECC RAM
OS CentOS* Linux release 7.3.1611 (Core) CentOS* Linux release 7.3.1611 (Core)
Kernel Linux kernel 3.10.0-514.10.2.el7.x86_64 Linux kernel 3.10.0-514.10.2.el7.x86_64
SSD SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm,
MLC)
SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm,
MLC)
Performance
Measurement
Command
Variables
Environment variables: KMP_AFFINITY='granularity=fine, compact‘,
OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -
d 2.5G -u 3.8G -g performance
Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘,
OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set
-d 2.2G -u 2.2G -g performance
Caffe Revision Caffe: (http://github.com/intel/caffe/), revision
f96b759f71b2281835f690af267158b82b150b5c.
Caffe: (http://github.com/intel/caffe/), revision
f96b759f71b2281835f690af267158b82b150b5c.
Other
Arguments
Training measured with “caffe time” command. Caffe run with “numactl -
l“.
Training measured with “caffe time” command.
Dataset For “ConvNet” topologies, dummy dataset was used. For other topologies,
data was stored on local storage and cached in memory before training.
For “ConvNet” topologies, dummy dataset was used. For other
topologies, data was stored on local storage and cached in memory
before training.
Topologies Topology specs from https://github.com/intel/caffe/tree/master/
models/intel_optimized_models (GoogLeNet v1),
Topology specs from https://github.com/intel/caffe/tree/master/
models/intel_optimized_models (GoogLeNet v1),
Compiler Intel C++ compiler ver. 17.0.2 20170213 GCC 4.8.5
Library Intel® MKL small libraries version 2018.0.20170425 Intel® MKL small libraries version 2017.0.2.20170110
Data Center Group
Hardware Configuration
Processors Platinum 8160 E5-2699 v4
Number of Nodes in Cluster 4 (1 master + 3 workers) 4 (1 master + 3 workers)
Number of Sockets per Node 2 2
Number of Cores per Node 48 Cores/ 96 Threads 44 Cores/ 88 Threads
Clock 2.1 GHz (3.70 GHz Max) 2.2 GHz (3.60 GHz Max)
Cache 33 MB L3 Cache 55MB Smart Cache
Memory
384GB DDR4
(12 x 32GB, 2666 MT/s)
384GB DDR4
(24 x 16GB, 2133 MT/s)
Storage 8x800GB SATA SSD 8x800GB SATA SSD
Network 10 Gigabit 10 Gigabit
Decision Support Workload Performance Comparison
Notices and Disclaimers
Data Center Group
BIOS Knob SKX BDX
BIOS version SE5C620.86B.01.00.0470.040720170855 SE5C610.86B.01.01.0018.072020161249
Hyper-Threading Enabled Enabled
Other Options Default Default
Decision Support Workload Performance Comparison
Notices and Disclaimers
Data Center Group
Decision Support Workload Performance Comparison
* Software Stack A – Old software stack with old software component versions
** Software Stack B – New software stack with upgraded software component versions (more software optimizations included, such as Hive Parquet Vectorization)
Software
Configuration SKX BDX
OS CentOS 7.3 CentOS 7.3
Kernel
3.10.0-
514.el7.x86_64
3.10.0-
514.el7.x86_64
Java
Oracle JDK
1.8.0_121
Oracle JDK
1.8.0_121
Hadoop 2.7.3 2.7.3
File System HDFS HDFS
Hive 2.0.0 2.0.0
Spark 1.6.3 1.6.3
Software
Configuratio
n SKX BDX
OS CentOS 7.3 CentOS 7.3
Kernel
3.10.0-
514.el7.x86_64
3.10.0-
514.el7.x86_64
Java
Oracle JDK
1.8.0_121
Oracle JDK
1.8.0_121
Hadoop 2.7.3 2.7.3
File System HDFS HDFS
Hive
3.0.0-SNAPSHOT
(commit id:
3330403)
3.0.0-SNAPSHOT
(commit id:
3330403)
Spark 2.0.2 2.0.2
Notices and Disclaimers
Data Center Group
Hardware Configuration (each data node)
Processors E5-2697v4 (BDX) Xeon Platinum 8168 (SKX)
Nodes 8
Number of Sockets 2
Number of Cores / Socket 18 Cores / 36 Threads 24 Cores / 48 Threads
Clock 2.3 GHz 2.7 GHz
L3 Cache 45 MB 33 MB
Memory 768 GB (24 * 32GB Samsung DIMMs @
2133/2400MHz)
768 GB (12 * 64GB Micron DIMMS @
2400MHz)
Data Storage (SATA3 SSDs) 2 * 2 TB + 2 * 1 TB
Network 1 * 10 Gbps Ethernet
TPCx-BB and Hibench System Configuration
Hardware
Notices and Disclaimers
Data Center Group
BigBench and Hibench System Configuration
Software
Software Configuration
OS CentOS release 7.3
Kernel 3.10.0-514.el7.x86_64
Java 1.8.0_131
Python 2.7.5
Hadoop 2.7.3
File System HDFS
Spark 2.2.0
Notices and Disclaimers
Data Center Group
Intel® Math Kernel Library
Intel® MLSL
Intel® Data
Analytics
Acceleration
Library (DAAL)
Intel®
Distribution
Open Source
Frameworks
Intel Deep
Learning SDK
Intel® Computer
Vision SDKIntel® MKL MKL-DNN
High
Level
Overview
High performance
math primitives
granting low level
of control
Free open source
DNN functions for
high-velocity
integration with
deep learning
frameworks
Primitive
communication
building blocks to
scale deep learning
framework
performance over a
cluster
Broad data analytics
acceleration object
oriented library
supporting
distributed ML at the
algorithm level
Most popular and
fastest growing
language for
machine learning
Toolkits driven by
academia and
industry for training
machine learning
algorithms
Accelerate deep
learning model
design, training and
deployment
Toolkit to develop &
deploying vision-
oriented solutions
that harness the full
performance of Intel
CPUs and SOC
accelerators
Primary
Audience
Consumed by
developers of
higher level
libraries and
Applications
Consumed by
developers of the
next generation of
deep learning
frameworks
Deep learning
framework
developers and
optimizers
Wider Data Analytics
and ML audience,
Algorithm level
development for all
stages of data
analytics
Application
Developers and
Data Scientists
Machine Learning
App Developers,
Researchers and
Data Scientists.
Application
Developers and
Data Scientists
Developers who
create vision-
oriented solutions
Example
Usage
Framework
developers call
matrix
multiplication,
convolution
functions
New framework
with functions
developers call for
max CPU
performance
Framework
developer calls
functions to
distribute Caffe
training compute
across an Intel®
Xeon Phi™ cluster
Call distributed
alternating least
squares algorithm for
a recommendation
system
Call scikit-learn
k-means function
for credit card
fraud detection
Script and train a
convolution neural
network for image
recognition
Deep Learning
training and model
creation, with
optimization for
deployment on
constrained end
device
Use deep learning to
do pedestrian
detection
…
Data Scientists: Libraries, Frameworks & Tools
Find out more at software.intel.com/ai

Weitere ähnliche Inhalte

Mehr von Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceCloudera, Inc.
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 

Kürzlich hochgeladen

8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxsaniyaimamuddin
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 

Kürzlich hochgeladen (20)

8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 

Transforming into a data-driven enterprise: paths to success

  • 1. 1© Cloudera, Inc. All rights reserved. Transforming into a data-driven enterprise: Paths to success Philip Carnelley | Research Director, IDC Michael Wrisley |Analytic Sales Enablement Director, Intel Wim Stoop | Senior Product Marketing Manager, Cloudera
  • 2. Transforming into a data-driven enterprise: paths to success Philip Carnelley, Research Director, IDC Europe
  • 3. Digital Transformation: A Board Level Agenda Item 3© IDC Visit us at IDC.com and follow us on Twitter: @IDC 80% of large European companies have “DX” at the heart of their corporate strategy Source: IDC, European DX Survey 2017
  • 4. 21% 26% 29% 18% 6% Digital Resister Digital Explorer Digital Player Digital Transformer Digital Disrupter Many Organizations Are at a DX Deadlock © IDC 4 55% Source: IDC, European Digital Transformation Maturity Model Benchmark, 2017; n=403, May 2017 © IDC Visit us at IDC.com and follow us on Twitter: @IDC
  • 5. Getting the Pulse to Test Our Ideas © IDC Visit us at IDC.com and follow us on Twitter: @IDC 5 750 Business and IT Leaders Across Western Europe All Major Industries 0 50 100 150 200 Finance and Insurance Telco and Media Public Sector / Government Retail Energy and Utilities Manufacturing and Automotive Source: IDC survey for Cloudera and Intel, 2017
  • 6. Recognizing the Significance of Big Data Analytics to Digital Transformation © IDC Visit us at IDC.com and follow us on Twitter: @IDC 6 43% 70% Source: IDC survey for Cloudera and Intel, 2017 Now In2years “Important/Very Important” “Important/Very Important”
  • 7. The New Digital Platform 7© IDC Visit us at IDC.com and follow us on Twitter: @IDC Source: IDC EXTERNAL PROCESSES Connected Processes Assets People INTERNAL PROCESSES INTELLIGENT CORE Mobile IoT AR/VR BOT API
  • 8. The New Digital Platform 8 EXTERNAL PROCESSES Connected Processes Assets People INTERNAL PROCESSES INTELLIGENT CORE Mobile IoT AR/VR BOT API © IDC Visit us at IDC.com and follow us on Twitter: @IDC Source: IDC INTELLIGENT CORE Databases Data StreamsBig Data AI/MLAnalytics Decision Support
  • 9. But … © IDC Visit us at IDC.com and follow us on Twitter: @IDC 9 12% 44% 33% 11% Still exploring Enterprise-wide platform being established Platform available to customers and partners Source: IDC survey for Cloudera and Intel, 2017 37% Infrastructure is unsuitable 44%Skills issues 55% Uncoordinated Used in isolated pockets
  • 10. Paths to Success © IDC Visit us at IDC.com and follow us on Twitter: @IDC 10 Adopt a flexible hybrid deployment model Seek to exploit advanced analytics / AI Choose a suitable platform for advanced analytics
  • 11. What Do We Mean By AI? AI can be viewed in three layers: • Artificial intelligence — the broadest term, applying to any technique that enables computers to mimic human intelligence. More precisely, AI is the study and development of software and hardware that attempts to emulate a human being in learning and reasoning. • Machine learning — A subset of AI: the process of creating a statistical model from various types of data that perform various functions without having to be programmed by a human. Machine learning models are "trained" by various types of data (often, lots of data).This category includes deep learning. • Deep learning — The subset of machine learning composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, without specifying outcomes or goals. These generally rely on the input of large amounts of data. Cognitive computing / AI software systems are self-learning, reasoning systems that can augment or replace human decision-making in situations that involve complexity, very high information volumes, and/or uncertainty. They are adaptive, iterative and contextual, and make a new class of problems computable. AI Systems learn as they operate. They replace logic with data as the primary behavior driver. They are therefore critically dependent on (big) data. 11© IDC Visit us at IDC.com and follow us on Twitter: @IDC 11 AI ML DL
  • 12. Establish a Flexible, Hybrid Deployment Model © IDC Visit us at IDC.com and follow us on Twitter: @IDC 12 Source: IDC survey for Cloudera and Intel, 2017
  • 13. 46% Using open source data science frameworks and languages Seek to Exploit Advanced Analytics, AI and Machine Learning © IDC Visit us at IDC.com and follow us on Twitter: @IDC 13 94% 74% 17% 9%5% 20% 31% 17% Descriptive Predictive Prescriptive Cognitive analytics Using now Planning to use Source: IDC survey for Cloudera and Intel, 2017
  • 14. Establish a Suitable Platform for Big Data, Advanced Analytics and AI © IDC Visit us at IDC.com and follow us on Twitter: @IDC 14 25% A quarter of organisations believe data science to be very or extremely important to their Big Data Analytics environment. This will grow. 31% Almost one third of organisations plan to use self-learning and AI techniques, e.g. deep learning and neural nets. Standard hardware platforms have a key role to play
  • 15. Recap: Paths to Success © IDC Visit us at IDC.com and follow us on Twitter: @IDC 15 Adopt a flexible hybrid deployment model Seek to exploit advanced analytics / AI Choose a suitable platform for advanced analytics
  • 16. Digital Winners are Leaders in Information 16 Source: IDC Custom Research 2016 © IDC Visit us at IDC.com and follow us on Twitter: @IDC The more mature an organization is in its information strategy, the more impactful its digital transformation efforts are.
  • 17. © IDC Visit us at IDC.com and follow us on Twitter: @IDC 17 Philip Carnelley Research Director, Enterprise Software IDC Europe pcarnelley@idc.com @PCarnelley
  • 18. Data Center Group Michael Wrisley Industry Technical Specialist
  • 19. Data Center Group Begin your AI journey today using existing, familiar infrastructure DL training in days HOURS with up to 113X2 performance vs. prior gen (2.2x excluding optimized SW1) Robust support for full range of AI deployments Intel® Xeon® Scalable Processors Scalable performance for widest variety of AI & other datacenter workloads – including deep learning 1,2Configuration details on slide: 4, 5, 6 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 The AI you need On the chip you know Built-in ROI Potent Performance Production Ready
  • 20. Data Center Group Intel’s Role in Accelerating Analytics & AI Holistic Strategy from Edge-Cloud to the Enterprise ¥Note: Intel® Data Analytics Acceleration Library, Intel® Math Kernel Library, Intel® Math Kernel Library for Deep Neural Networks, BigDL: Distributed Deep Learning on Apache Spark*, MLib: Apache Spark’s Scalable Machine Learning Library *Other names and brands may be claimed as the property of others. Co- Optimizin g Applicatio ns Optimized Libraries Intel® MKL¥ Intel® MKL-DNN¥Intel® DAAL¥Intel® Distribution for Python* Intel® Nervana™ GraphMovidius MvTensor LibraryMLib* BigDL Open Source Enabling HARDWA RE/ SOFTWA RE Networking Lake Crest Compute Memory & Storage Artificial Intelligence Solutions
  • 21. Data Center Group BigDL – DL On Your Existing Infrastructure, Now Make deep learning more accessible to big data and data science communities *Other names and brands may be claimed as the property of others. Continue the use of familiar SW tools and HW infrastructure to build deep learning applications Analyze “big data” using deep learning on the same Apache Hadoop*/Spark* cluster where the data are stored Add deep learning functionalities to the Big Data (Spark) programs and/or workflow Leverage existing Hadoop/Spark clusters to run deep learning applications Dynamically share with other workloads (e.g., ETL, data warehouse, feature engineering, statistic machine learning, graph analytics, etc.)
  • 22. Data Center Group BigDL Industry Support – Start Today! Technology Cloud Service Providers End Users
  • 23. Data Center Group More Resources….. www.intel.com/bigdata www.intel.com/ai www.intel.com/software Thank You!
  • 24. 26© Cloudera, Inc. All rights reserved. Cloudera Enterprise 26 The modern platform for machine learning and analytics optimized for the cloud EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA CATALOG INGEST & REPLICATION SECURITY GOVERNANCE WORKLOAD MANAGEMENT DATA SCIENCE Amazon S3 Microsoft ADLS HDFS KUDU STORAGE SERVICES
  • 25. 27© Cloudera, Inc. All rights reserved. • Unified security – protects sensitive data with consistent controls, even for transient and recurring workloads • Consistent governance – enables secure self-service access to all relevant data and increases compliance • Easy workload management – increases user productivity and boosts job predictability • Flexible ingest and replication – aggregates a single copy of all data, provides disaster recovery, and eases migration • Shared catalog – defines and preserves structure and business context of data for new applications and partner solutions Open platform services Built for multi-function analytics | Optimized for cloud
  • 26. 28© Cloudera, Inc. All rights reserved. 5 keys to success 1) Build a data-driven culture 2) Develop the right team and skills 3) Be agile/lean in development 4) Leverage DevOps for production 5) Right-size data governance 28© Cloudera, Inc. All rights reserved.
  • 27. 29© Cloudera, Inc. All rights reserved. World-class training, services, and support 3 top big data certifications Cloudera University Fastest route from zero to production Professional Services SCP-certified support anywhere in the world Cloudera Support
  • 28. 30© Cloudera, Inc. All rights reserved. Published research subscription service Delivers cutting edge advances in applied ML / AI Accelerates adoption in large enterprises Drives demand for our platform Applied research for machine learning and data science Continued machine learning innovation 30© Cloudera, Inc. All rights reserved.
  • 29. 31© Cloudera, Inc. All rights reserved. Thank you
  • 31. Data Center Group Notices and Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. © 2017 Intel Corporation. 3D XPoint, Arria, the Arria logo, Intel, the Intel logo, Intel Nervana, Intel Optane, Intel RealSense, Intel Xeon Phi, Stratix and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others.
  • 32. Data Center Group Notices and Disclaimers Slide 23 under Potent Performance current footnote #1 (2.2x performance) 2.2X higher deep learning training and inference performance than the prior generation: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux* release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Compared with Platform: 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores), HT enabled, turbo disabled, scaling governor set to “performance” via acpi-cpufreq driver, 256GB DDR4-2133 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance. Neon: ZP/MKL_CHWN branch commit id:52bd02acb947a2adabb8a227166a7da5d9123b6d. Dummy data was used. The main.py script was used for benchmarking , in mkl mode. ICC version used : 17.0.3 20170404, Intel® MKL small libraries version 2018.0.20170425; Inference and training throughput uses FP32 instructions.
  • 33. Data Center Group Slide 23 under Potent Performance current footnote #2 (113x) https://www.intel.com/content/www/us/en/benchmarks/server/xeon-scalable/xeon-scalable-artificial-intelligence.html Notices and Disclaimers Platform 2S Intel® Xeon® Platinum 8180 processor CPU @ 2.50GHz (28 cores) 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores) Hyper Threading HT disabled HT enabled Turbo Turbo disabled Turbo disabled Driver Scaling governor set to “performance” via intel_pstate driver Scaling governor set to “performance” via acpi-cpufreq driver Memory 384GB DDR4-2666 ECC RAM 256GB DDR4-2133 ECC RAM OS CentOS* Linux release 7.3.1611 (Core) CentOS* Linux release 7.3.1611 (Core) Kernel Linux kernel 3.10.0-514.10.2.el7.x86_64 Linux kernel 3.10.0-514.10.2.el7.x86_64 SSD SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC) SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC) Performance Measurement Command Variables Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set - d 2.5G -u 3.8G -g performance Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance Caffe Revision Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Other Arguments Training measured with “caffe time” command. Caffe run with “numactl - l“. Training measured with “caffe time” command. Dataset For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topologies Topology specs from https://github.com/intel/caffe/tree/master/ models/intel_optimized_models (GoogLeNet v1), Topology specs from https://github.com/intel/caffe/tree/master/ models/intel_optimized_models (GoogLeNet v1), Compiler Intel C++ compiler ver. 17.0.2 20170213 GCC 4.8.5 Library Intel® MKL small libraries version 2018.0.20170425 Intel® MKL small libraries version 2017.0.2.20170110
  • 34. Data Center Group Hardware Configuration Processors Platinum 8160 E5-2699 v4 Number of Nodes in Cluster 4 (1 master + 3 workers) 4 (1 master + 3 workers) Number of Sockets per Node 2 2 Number of Cores per Node 48 Cores/ 96 Threads 44 Cores/ 88 Threads Clock 2.1 GHz (3.70 GHz Max) 2.2 GHz (3.60 GHz Max) Cache 33 MB L3 Cache 55MB Smart Cache Memory 384GB DDR4 (12 x 32GB, 2666 MT/s) 384GB DDR4 (24 x 16GB, 2133 MT/s) Storage 8x800GB SATA SSD 8x800GB SATA SSD Network 10 Gigabit 10 Gigabit Decision Support Workload Performance Comparison Notices and Disclaimers
  • 35. Data Center Group BIOS Knob SKX BDX BIOS version SE5C620.86B.01.00.0470.040720170855 SE5C610.86B.01.01.0018.072020161249 Hyper-Threading Enabled Enabled Other Options Default Default Decision Support Workload Performance Comparison Notices and Disclaimers
  • 36. Data Center Group Decision Support Workload Performance Comparison * Software Stack A – Old software stack with old software component versions ** Software Stack B – New software stack with upgraded software component versions (more software optimizations included, such as Hive Parquet Vectorization) Software Configuration SKX BDX OS CentOS 7.3 CentOS 7.3 Kernel 3.10.0- 514.el7.x86_64 3.10.0- 514.el7.x86_64 Java Oracle JDK 1.8.0_121 Oracle JDK 1.8.0_121 Hadoop 2.7.3 2.7.3 File System HDFS HDFS Hive 2.0.0 2.0.0 Spark 1.6.3 1.6.3 Software Configuratio n SKX BDX OS CentOS 7.3 CentOS 7.3 Kernel 3.10.0- 514.el7.x86_64 3.10.0- 514.el7.x86_64 Java Oracle JDK 1.8.0_121 Oracle JDK 1.8.0_121 Hadoop 2.7.3 2.7.3 File System HDFS HDFS Hive 3.0.0-SNAPSHOT (commit id: 3330403) 3.0.0-SNAPSHOT (commit id: 3330403) Spark 2.0.2 2.0.2 Notices and Disclaimers
  • 37. Data Center Group Hardware Configuration (each data node) Processors E5-2697v4 (BDX) Xeon Platinum 8168 (SKX) Nodes 8 Number of Sockets 2 Number of Cores / Socket 18 Cores / 36 Threads 24 Cores / 48 Threads Clock 2.3 GHz 2.7 GHz L3 Cache 45 MB 33 MB Memory 768 GB (24 * 32GB Samsung DIMMs @ 2133/2400MHz) 768 GB (12 * 64GB Micron DIMMS @ 2400MHz) Data Storage (SATA3 SSDs) 2 * 2 TB + 2 * 1 TB Network 1 * 10 Gbps Ethernet TPCx-BB and Hibench System Configuration Hardware Notices and Disclaimers
  • 38. Data Center Group BigBench and Hibench System Configuration Software Software Configuration OS CentOS release 7.3 Kernel 3.10.0-514.el7.x86_64 Java 1.8.0_131 Python 2.7.5 Hadoop 2.7.3 File System HDFS Spark 2.2.0 Notices and Disclaimers
  • 39. Data Center Group Intel® Math Kernel Library Intel® MLSL Intel® Data Analytics Acceleration Library (DAAL) Intel® Distribution Open Source Frameworks Intel Deep Learning SDK Intel® Computer Vision SDKIntel® MKL MKL-DNN High Level Overview High performance math primitives granting low level of control Free open source DNN functions for high-velocity integration with deep learning frameworks Primitive communication building blocks to scale deep learning framework performance over a cluster Broad data analytics acceleration object oriented library supporting distributed ML at the algorithm level Most popular and fastest growing language for machine learning Toolkits driven by academia and industry for training machine learning algorithms Accelerate deep learning model design, training and deployment Toolkit to develop & deploying vision- oriented solutions that harness the full performance of Intel CPUs and SOC accelerators Primary Audience Consumed by developers of higher level libraries and Applications Consumed by developers of the next generation of deep learning frameworks Deep learning framework developers and optimizers Wider Data Analytics and ML audience, Algorithm level development for all stages of data analytics Application Developers and Data Scientists Machine Learning App Developers, Researchers and Data Scientists. Application Developers and Data Scientists Developers who create vision- oriented solutions Example Usage Framework developers call matrix multiplication, convolution functions New framework with functions developers call for max CPU performance Framework developer calls functions to distribute Caffe training compute across an Intel® Xeon Phi™ cluster Call distributed alternating least squares algorithm for a recommendation system Call scikit-learn k-means function for credit card fraud detection Script and train a convolution neural network for image recognition Deep Learning training and model creation, with optimization for deployment on constrained end device Use deep learning to do pedestrian detection … Data Scientists: Libraries, Frameworks & Tools Find out more at software.intel.com/ai