Organisations of all kinds recognise that they must rapidly digitise their businesses to remain competitive in the face of massive technological change. They must develop new business models and routes to customer and partner engagement using the power of digital. That's why over 80% of the CEOs of large European companies have digital transformation (DX) at the centre of today’s corporate strategy. As part of this shift, forward-thinking companies are investing heavily in becoming data-driven organisations, utilising an evidence based culture that expands their capacity to collect, analyse and monetise data in areas such as enhancing customer experiences, empowering the workforce and rethinking business models.
19. Data Center Group
Begin your AI journey today using
existing, familiar infrastructure
DL training in days HOURS with up
to 113X2 performance vs. prior gen
(2.2x excluding optimized SW1)
Robust support for full range of
AI deployments
Intel® Xeon® Scalable Processors
Scalable performance for widest variety of AI & other datacenter workloads –
including deep learning
1,2Configuration details on slide: 4, 5, 6
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For
more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These
optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to
Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction
sets covered by this notice.
Notice Revision #20110804
The AI you need
On the chip you know
Built-in ROI
Potent
Performance
Production
Ready
20. Data Center Group
Intel’s Role in Accelerating Analytics & AI
Holistic Strategy from Edge-Cloud to the Enterprise
¥Note: Intel® Data Analytics Acceleration Library, Intel® Math Kernel Library, Intel® Math Kernel Library for Deep Neural Networks, BigDL: Distributed Deep Learning on Apache Spark*,
MLib: Apache Spark’s Scalable Machine Learning Library
*Other names and brands may be claimed as the property of others.
Co-
Optimizin
g
Applicatio
ns
Optimized
Libraries Intel® MKL¥ Intel® MKL-DNN¥Intel® DAAL¥Intel® Distribution for Python*
Intel® Nervana™ GraphMovidius MvTensor LibraryMLib* BigDL
Open
Source
Enabling
HARDWA
RE/
SOFTWA
RE
Networking
Lake
Crest
Compute Memory & Storage Artificial Intelligence
Solutions
21. Data Center Group
BigDL – DL On Your Existing Infrastructure, Now
Make deep learning more accessible to big data and data science
communities
*Other names and brands may be claimed as the property of others.
Continue the use of
familiar SW tools and
HW infrastructure to
build deep learning
applications
Analyze “big data”
using deep learning on
the same Apache
Hadoop*/Spark* cluster
where the data are
stored
Add deep learning
functionalities to the Big
Data (Spark) programs
and/or workflow
Leverage existing
Hadoop/Spark clusters
to run deep learning
applications
Dynamically share with other
workloads (e.g., ETL, data
warehouse, feature engineering,
statistic machine learning, graph
analytics, etc.)
22. Data Center Group
BigDL Industry Support – Start Today!
Technology Cloud Service
Providers
End Users
23. Data Center Group
More Resources…..
www.intel.com/bigdata
www.intel.com/ai
www.intel.com/software
Thank You!
32. Data Center Group
Notices and Disclaimers
Slide 23 under Potent Performance current footnote #1 (2.2x performance)
2.2X higher deep learning training and inference performance than the prior generation: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo
disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux* release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64.
SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘,
OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Compared with Platform: 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22
cores), HT enabled, turbo disabled, scaling governor set to “performance” via acpi-cpufreq driver, 256GB DDR4-2133 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel
3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm, MLC). Performance measured with: Environment variables:
KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set -d 2.2G -u 2.2G -g performance. Neon: ZP/MKL_CHWN
branch commit id:52bd02acb947a2adabb8a227166a7da5d9123b6d. Dummy data was used. The main.py script was used for benchmarking , in mkl mode. ICC version used : 17.0.3
20170404, Intel® MKL small libraries version 2018.0.20170425; Inference and training throughput uses FP32 instructions.
33. Data Center Group
Slide 23 under Potent Performance current footnote #2 (113x)
https://www.intel.com/content/www/us/en/benchmarks/server/xeon-scalable/xeon-scalable-artificial-intelligence.html
Notices and Disclaimers
Platform 2S Intel® Xeon® Platinum 8180 processor CPU @ 2.50GHz (28 cores) 2S Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz (22 cores)
Hyper Threading HT disabled HT enabled
Turbo Turbo disabled Turbo disabled
Driver Scaling governor set to “performance” via intel_pstate driver Scaling governor set to “performance” via acpi-cpufreq driver
Memory 384GB DDR4-2666 ECC RAM 256GB DDR4-2133 ECC RAM
OS CentOS* Linux release 7.3.1611 (Core) CentOS* Linux release 7.3.1611 (Core)
Kernel Linux kernel 3.10.0-514.10.2.el7.x86_64 Linux kernel 3.10.0-514.10.2.el7.x86_64
SSD SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm,
MLC)
SSD: Intel® SSD DC S3500 Series (480GB, 2.5in SATA 6Gb/s, 20nm,
MLC)
Performance
Measurement
Command
Variables
Environment variables: KMP_AFFINITY='granularity=fine, compact‘,
OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -
d 2.5G -u 3.8G -g performance
Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘,
OMP_NUM_THREADS=44, CPU Freq set with cpupower frequency-set
-d 2.2G -u 2.2G -g performance
Caffe Revision Caffe: (http://github.com/intel/caffe/), revision
f96b759f71b2281835f690af267158b82b150b5c.
Caffe: (http://github.com/intel/caffe/), revision
f96b759f71b2281835f690af267158b82b150b5c.
Other
Arguments
Training measured with “caffe time” command. Caffe run with “numactl -
l“.
Training measured with “caffe time” command.
Dataset For “ConvNet” topologies, dummy dataset was used. For other topologies,
data was stored on local storage and cached in memory before training.
For “ConvNet” topologies, dummy dataset was used. For other
topologies, data was stored on local storage and cached in memory
before training.
Topologies Topology specs from https://github.com/intel/caffe/tree/master/
models/intel_optimized_models (GoogLeNet v1),
Topology specs from https://github.com/intel/caffe/tree/master/
models/intel_optimized_models (GoogLeNet v1),
Compiler Intel C++ compiler ver. 17.0.2 20170213 GCC 4.8.5
Library Intel® MKL small libraries version 2018.0.20170425 Intel® MKL small libraries version 2017.0.2.20170110
34. Data Center Group
Hardware Configuration
Processors Platinum 8160 E5-2699 v4
Number of Nodes in Cluster 4 (1 master + 3 workers) 4 (1 master + 3 workers)
Number of Sockets per Node 2 2
Number of Cores per Node 48 Cores/ 96 Threads 44 Cores/ 88 Threads
Clock 2.1 GHz (3.70 GHz Max) 2.2 GHz (3.60 GHz Max)
Cache 33 MB L3 Cache 55MB Smart Cache
Memory
384GB DDR4
(12 x 32GB, 2666 MT/s)
384GB DDR4
(24 x 16GB, 2133 MT/s)
Storage 8x800GB SATA SSD 8x800GB SATA SSD
Network 10 Gigabit 10 Gigabit
Decision Support Workload Performance Comparison
Notices and Disclaimers
35. Data Center Group
BIOS Knob SKX BDX
BIOS version SE5C620.86B.01.00.0470.040720170855 SE5C610.86B.01.01.0018.072020161249
Hyper-Threading Enabled Enabled
Other Options Default Default
Decision Support Workload Performance Comparison
Notices and Disclaimers
36. Data Center Group
Decision Support Workload Performance Comparison
* Software Stack A – Old software stack with old software component versions
** Software Stack B – New software stack with upgraded software component versions (more software optimizations included, such as Hive Parquet Vectorization)
Software
Configuration SKX BDX
OS CentOS 7.3 CentOS 7.3
Kernel
3.10.0-
514.el7.x86_64
3.10.0-
514.el7.x86_64
Java
Oracle JDK
1.8.0_121
Oracle JDK
1.8.0_121
Hadoop 2.7.3 2.7.3
File System HDFS HDFS
Hive 2.0.0 2.0.0
Spark 1.6.3 1.6.3
Software
Configuratio
n SKX BDX
OS CentOS 7.3 CentOS 7.3
Kernel
3.10.0-
514.el7.x86_64
3.10.0-
514.el7.x86_64
Java
Oracle JDK
1.8.0_121
Oracle JDK
1.8.0_121
Hadoop 2.7.3 2.7.3
File System HDFS HDFS
Hive
3.0.0-SNAPSHOT
(commit id:
3330403)
3.0.0-SNAPSHOT
(commit id:
3330403)
Spark 2.0.2 2.0.2
Notices and Disclaimers
37. Data Center Group
Hardware Configuration (each data node)
Processors E5-2697v4 (BDX) Xeon Platinum 8168 (SKX)
Nodes 8
Number of Sockets 2
Number of Cores / Socket 18 Cores / 36 Threads 24 Cores / 48 Threads
Clock 2.3 GHz 2.7 GHz
L3 Cache 45 MB 33 MB
Memory 768 GB (24 * 32GB Samsung DIMMs @
2133/2400MHz)
768 GB (12 * 64GB Micron DIMMS @
2400MHz)
Data Storage (SATA3 SSDs) 2 * 2 TB + 2 * 1 TB
Network 1 * 10 Gbps Ethernet
TPCx-BB and Hibench System Configuration
Hardware
Notices and Disclaimers
38. Data Center Group
BigBench and Hibench System Configuration
Software
Software Configuration
OS CentOS release 7.3
Kernel 3.10.0-514.el7.x86_64
Java 1.8.0_131
Python 2.7.5
Hadoop 2.7.3
File System HDFS
Spark 2.2.0
Notices and Disclaimers
39. Data Center Group
Intel® Math Kernel Library
Intel® MLSL
Intel® Data
Analytics
Acceleration
Library (DAAL)
Intel®
Distribution
Open Source
Frameworks
Intel Deep
Learning SDK
Intel® Computer
Vision SDKIntel® MKL MKL-DNN
High
Level
Overview
High performance
math primitives
granting low level
of control
Free open source
DNN functions for
high-velocity
integration with
deep learning
frameworks
Primitive
communication
building blocks to
scale deep learning
framework
performance over a
cluster
Broad data analytics
acceleration object
oriented library
supporting
distributed ML at the
algorithm level
Most popular and
fastest growing
language for
machine learning
Toolkits driven by
academia and
industry for training
machine learning
algorithms
Accelerate deep
learning model
design, training and
deployment
Toolkit to develop &
deploying vision-
oriented solutions
that harness the full
performance of Intel
CPUs and SOC
accelerators
Primary
Audience
Consumed by
developers of
higher level
libraries and
Applications
Consumed by
developers of the
next generation of
deep learning
frameworks
Deep learning
framework
developers and
optimizers
Wider Data Analytics
and ML audience,
Algorithm level
development for all
stages of data
analytics
Application
Developers and
Data Scientists
Machine Learning
App Developers,
Researchers and
Data Scientists.
Application
Developers and
Data Scientists
Developers who
create vision-
oriented solutions
Example
Usage
Framework
developers call
matrix
multiplication,
convolution
functions
New framework
with functions
developers call for
max CPU
performance
Framework
developer calls
functions to
distribute Caffe
training compute
across an Intel®
Xeon Phi™ cluster
Call distributed
alternating least
squares algorithm for
a recommendation
system
Call scikit-learn
k-means function
for credit card
fraud detection
Script and train a
convolution neural
network for image
recognition
Deep Learning
training and model
creation, with
optimization for
deployment on
constrained end
device
Use deep learning to
do pedestrian
detection
…
Data Scientists: Libraries, Frameworks & Tools
Find out more at software.intel.com/ai