SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
ACCELERATINGMACHINELEARNING
SOFTWAREONIA
Ananth Sankaranarayanan
Director of Engineering, Analytics & AI Solutions Group, Intel
November 2016
2
This session reviews Intel’s hardware and software strategy for machine
learning. Intel is offering a range of tools and partnering with the open source
community to help developers deliver optimized machine learning applications
for Intel Architecture systems. This session spans a range of software
development frameworks, libraries and other tools for machine learning with a
focus on performance. Topics include IA-optimized frameworks such as Apache
Spark*, Berkeley Caffe*, Google Tensorflow, and Nervana Neon and related
performance benchmarks on the latest Intel Xeon and Intel Xeon Phi™
processors. Machine learning libraries Intel® Math Kernel Library and Intel® Data
Analytics Acceleration Library will also be highlighted along with the new Intel®
Distribution for Python and Intel® Deep Learning SDK.
Abstract
3
Ananth Sankaranarayanan is currently the Director of Engineering with Intel in
the Analytics and Artificial Intelligence Solutions group. His team is responsible
for performance, partner and solution engineering functions, driving new
platform initiatives jointly with leading Cloud Service Providers, Hardware
Manufacturers and Software Vendors worldwide delivering engineered
solutions to simplify implementations. Ananth previously led the HPC program
for Intel silicon design/manufacturing and delivered 5 successful generations of
supercomputers that directly contributed to reducing Intel Silicon Time to
Market, and he has been with Intel since 2001. Ananth received his bachelors in
computer science and engineering from Bharathidasan University in India and
his Masters in Business Administration from City University of Seattle, USA.
Speaker Bio
Bigger Data Better Hardware Smarter Algorithms
4
Drivers for fast emergence of AI
Image: 1 MB / picture
Audio: 5 MB / song
Video: 5 GB / movie
Significant compute performance
increases year over year
Parallel processing norm now
Advances in algorithm
innovation, including neural
networks, leading to better
accuracy in training models
5
Data + Analytics Creates Unique Opportunities
Competitive Advantage
Impactful Insights
Measurable Business Value
2X 5X 3X 2X
Companies that use analytics best are…
…more likely to
Make
data-driven
decisions
Make
decisions
faster than
others
Execute on
decisions
faster
Have top-
quartile
financial
results
Source: Bain
6
The Evolution of Artificial Intelligence
What
happened?
Where is the
problem?
What will
happen in the
future?
Why is this
happening?
How can we
achieve the
best outcome?
Businessvalue
Complexity/ Opportunity
Cognitive
AnalyticsPrescriptive
Analytics
Predictive Analytics
Advanced Operational Analytics
Integrated OLTP and Analytics
Early BI
Transactional Reporting
Spread sheets
Late Mainstream Early
Analytics using Artificial Intelligence
Taxonomy
7
Training: Build a mathematical model based on a data set
Inference: Use trained model to make predictions about new data
Deep Learning (DL)
Algorithms where abstract ideas are
represented by multiple (deep) layers of graphs
RBM …RNNCNN
Classical Machine Learning
Algorithms based on statistical or other techniques
for estimating functions from examples
GA
Linear
Regression
Support
Vector
Machines
Naïve
Bayes
Artificial Intelligence (AI)
Machines that can sense, reason, act without explicit programming 
Machine Learning (ML), a key tool for AI, is the development, and application of algorithms that
improve their performance at some task based on experience (previous iterations)
8
Step 1: Training
(In Data Center – Over Hours/Days/Weeks)
AI: Deep Learning Example:
Person
90% person
8% traffic light
Massive data
sets: labeled or
tagged input
data
Output
Classification
Create “Deep
neural net”
math model
Step 2: Inference
(At the Edge or in the Data Center - Instantaneous)
New input from
camera and
sensors
Output
Classification
Trained neural
network model
97%
person
Trained
Model
9
AI: Example Use Cases
Cloud Service
Providers
Financial
Services
Healthcare
Automotive
•  Image classification and
detection for accurate
diagnosis
•  Image recognition/
tagging for defect
identification
•  Natural language
recognition (digital
assistants)
•  Big data pattern detection
•  Targeted ads to increase
revenue
•  Fraud prevention/ face
detection
•  Gaming, check processing
•  Computer server monitoring
•  Safe navigation for
autonomous vehicles
•  Financial forecasting and
prediction to avoid risk
•  Network intrusion detection
10
AI Market Opportunity
$8.2B
2013
$70B
2020
Hardware &
Software
7%
93%	
  
1Source: DCG Market Intelligence team
AI servers
Other servers
Server Market (2015)1
7%93%
Intel
Intel+GPU
Other
Architecture MSS
94.3% 2.4%
30MB/DAY
Smartphone
PRESENT FUTURE
Data is the next disrupter
By 2020, Machine to Machine
connections will be 47% of
total devices & connections
90MB/DAY
PC
4TB/DAY
Connected Car
1PB/DAY
Connected Factory
40TB/DAY
Connected Plane
2Source: IDC, IOT market related to analytics
AI-based
analytics
market2
3.3%
INTELAISTRATEGY
12
Intel AI strategy
Intel® Math Kernel
Library (Intel® MKL &
MKL-DNN)
Intel® Data Analytics
Acceleration Library
(Intel® DAAL)
Trusted Analytics Platform
+Network
+Memory
+StorageDatacenter Endpoint
Solution blueprints
for reference across industries
Tools/Platforms
to accelerate deployment of IA solution stack
Optimized Open Frameworks
that scale to multi-node and deliver best performance
Free Libraries/Languages
featuring optimized ML/DL building blocks to enable developers
Best in class hardware
Cross compatible portfolio spanning from data center to edge
delivering high perf, perf/TCO, perf/w
Making AI more pervasive by enabling deployment ready AI solutions through a large, open ecosystem
Intel Deep Learning
Training & Deployment
Tools (SDK)
Intel
Distribution
for Python
AI:HARDWARE
Training Inference
14
Intel AI Products for the Datacenter
Intel® Xeon Phi™ Processors
•  Optimized for performance
•  Scales with cluster size for shorter time to model
•  x86 architecture, consistent programming model
for training and inference
Intel® Xeon® Processors
•  Optimized for performance/TCO
•  Most widely deployed inference
solution
Intel® Xeon® Processors + FPGA
(discrete)
•  Optimized for performance/watt
•  Reconfigurable – can be used to
accelerate many DC workloads
•  Programmable with industry
standard OpenCL
15
Intel® Xeon Phi™ Processor Family
•  Up to ~6 SGEMM TFLOPs
3
per socket
•  Great scaling efficiency resulting in
lower time to train for multi-node
•  Eliminates add-in card PCIe* offload
bottleneck and utilization constraints
•  Integrated Intel® Omni-Path fabric
(dual-port; 50 GB/s) increases price-
performance and reduces communication
latency for deep learning networks
•  Binary-compatible with Intel® Xeon®
processors
•  Open standards, libraries and
frameworks
Breakthrough Highly-
Parallel Performance
Removes Barriers
through Integration
Better Programmability
All specifications refer to the future Intel® Xeon Phi™ processor (Knights Landing) unless otherwise noted
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.
Enables shorter time to train
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance *Other brands and names are the property of others.
Configurations: 1-8 see page 16.
16
Mainstream Training: Intel® Xeon Phi™ Processor 7250
Competitive Deep Learning Image Classification SINGLE-NODE Training With Mainstream cuDNN
1
1.04
1.29
1.03
1.10
0
0.2
0.4
0.6
0.8
1
1.2
1.4
2S Intel® Xeon® processor E5-2620 v3 +
Nvidia® Tesla M40
Intel® Xeon Phi™ processor 7250
NormalizedImages/Second
Higherisbetter
cuDNN v4 BVLC/Caffe cuDNN v5 NVDA/Caffe cuDNN v5 Intel Caffe - MKL 2017 Beta Update 1 Intel Caffe - MKL 2017 Gold**
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance . *Other names and brands may be property of others
Configurations:
1.  Nvidia Tesla M40* (core@923MHz, 12GB, mem@3004MHz, 250W), DIGITS Deep Learning Machine* hosted on 2S Intel® Xeon® processor E5-2620 v3, 64GB memory, Ubuntu* 14.04, Nvidia* Driver v352.41, cuDNN v4,
BVLC/Caffe cuDNN v5 or NVIDIA/Caffe cuDNN v5
2.  Intel® Xeon Phi™ processor 7250 (68 Cores, 1.4 GHz, 16GB), 128GB memory, Red Hat* Enterprise Linux 6.7, Intel® Caffe: https://github.com/intelcaffe
AlexNet: https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf, Batch Size:256; ** Q4’16 is estimated based on MKL engineering version
Topology: Caffe*/AlexNet1 Dataset: Large image
database
NDA Only
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or
configuration may affect your actual performance.
Intel Confidential – NDA USE ONLY
17
Why does Scaling Matter?
Train Up to 50x faster with Intel® Xeon Phi™ Processor
Deep Learning Image Classification Training Performance - MULTI-NODE Scaling
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance/datacenter. Configurations: Up to 50x faster training
on 128-node as compared to single-node based on AlexNet* topology workload (batch size = 1024) training time using a large image database running one node Intel Xeon Phi processor 7250 (16 GB MCDRAM, 1.4 GHz, 68
Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat Enterprise Linux* 6.7 (Santiago), 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk,
running Intel® Optimized DNN Framework, training in 39.17 hours compared to 128-node identically configured with Intel® Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe x16 connectors training in 0.75
hours. Contact your Intel representative for more information on how to obtain the binary. For information on workload, see https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-
neural-networks.pdf.
Topology: AlexNet* Dataset: Large image database
1.0x 1.9x 3.7x
6.6x
12.8x
23.5x
33.7x
52.2x
1 node 2 nodes 4 nodes 8 nodes 16 nodes 32 nodes 64 nodes 128 nodes
NormalizedTrainingTime–
Higherisbetter
18
Great Scaling Efficiency: Intel® Xeon Phi™ Processor
Deep Learning Image Classification Training Performance - MULTI-NODE Scaling
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance . *Other names and brands may be property of others
Configurations: Intel® Xeon Phi™ Processor 7250 (68 Cores, 1.4 GHz, 16GB MCDRAM), 128 GB memory, Red Hat* Enterprise Linux 6.7, Intel® Optimized Frameworks
Dataset: Large image database
0	
  
10	
  
20	
  
30	
  
40	
  
50	
  
60	
  
70	
  
80	
  
90	
  
100	
  
1	
   2	
   4	
   8	
   16	
   32	
   64	
   128	
  
SCALING	
  EFFICIENCY	
  %	
  
#	
  OF	
  INTEL®	
  XEON	
  PHI™	
  PROCESSOR	
  7250	
  (68-­‐CORES,	
  1.4	
  GHZ,	
  16	
  GB)	
  NODES	
  
OverFeat	
  	
   AlexNet	
   VGG-­‐A	
   GoogLeNet	
  
62
87
19
Knights Mill: Next Gen Intel® Xeon Phi™ processor
Enables shorter time to train
KNC	
  
(2013)	
  
KNL	
  
(2016)	
  
KNM	
  
(2017)	
  
	
  	
  	
  KNH	
  
(10nm	
  TBA)	
  
Trains	
  Machines	
  Faster	
  
•  >2X*	
  Single	
  Precision	
  &	
  >4X*	
  16-­‐bit	
  Mixed	
  Precision	
  faster	
  
deep	
  learning	
  training	
  performance	
  
•  Highly	
  distributed	
  processing	
  with	
  efficient	
  scaling	
  over	
  mule-­‐
node	
  offers	
  flexible	
  infrastructure	
  for	
  ML/DL	
  workloads	
  
Consistent	
  Programming	
  Model	
  
•  Common	
  Xeon	
  &	
  Xeon	
  Phi	
  programming	
  for	
  developers	
  	
  
•  Opemized	
  for	
  industry	
  standard	
  Open	
  Source	
  frameworks	
  
•  Bootable	
  Host-­‐CPU	
  avoids	
  offloading	
  latency	
  &	
  bomleneck	
  
Memory	
  Flexibility	
  
•  High	
  memory	
  bandwidth	
  with	
  integrated	
  DRAM	
  increases	
  
performance	
  for	
  complex	
  neural	
  datasets	
  by	
  reducing	
  latency	
  
•  Large	
  DDR4	
  memory	
  capacity	
  for	
  massive	
  AI	
  use	
  cases	
  
Single-­‐Precision	
  Teraflops	
  (Peak)	
  	
  
Common	
  Groveport	
  Pla6orm	
  
Bootable	
  Host	
  CPU	
   *NOTE: Performance theoretical wrt KNL7250 SKU based on KNM architectural changes.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete
information visit www.intel.com/benchmarks. Performance estimate wrt KNL 7250 SKU SGEMM. Performance Calculation= AVX freq X Cores X Flops per Core X Efficiency
Intel Confidential – NDA USE ONLY
20
Intel® Xeon® Processor E5 Family
•  Classify 1115 images/second •  Industry standard server
features: high reliability,
hardware enhanced security
•  Standard server infrastructure
•  Open standards, libraries &
frameworks
•  Optimized to run wide variety of
data center workloads
Server Class Reliability
Lowest TCO With Good
Infrastructure Flexibility
High throughput inference on existing server class infrastructure
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance
Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary.
Intel does not guarantee any costs or cost reduction. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any
differences in your system hardware, software or configuration may affect your actual performance. *Other names and brands may be claimed as property of others.
Leadership Throughput
Configuration: 2S Intel® Xeon® Processor E5-2699 v4, 22C, 2.3GHz, 128GB, Red Hat Enterprise Linux* 6.7, Intel® Caffe : https://github.com/intelcaffe
AlexNet: https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf, Batch Size:256; ** Q3’16 is estimated based on MKL engineering version
21
Scoring on Intel® Xeon® Processor E5-2699 v4Deep Learning Image Classification SINGLE-NODE SCORING Performance
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance . *Other names and brands may be property of others
Configurations (more details see slide 10):
1.  Nvidia Tesla M4* (core@923MHz, 4GB, mem@3004MHz, 75W), DIGITS* Deep Learning Machine , 1S Intel® Xeon® Processor E5-2620 v3, 2.4GHz, 64GB, Ubuntu 14.04, Nvidia* Driver version 352.68, cuDNN v4, BVLC/
Caffe cuDNN v5 or NVIDIA/Caffe cuDNN v5
2.  2S Intel® Xeon® Processor E5-2699 v4, 22C, 2.3GHz, 128GB, Red Hat Enterprise Linux* 6.7, Intel® Caffe : https://github.com/intelcaffe
AlexNet: https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf, Batch Size:256; ** Q4’16 based on MKL engineering version
Topology: Caffe*/AlexNet1 Dataset: Large image
database
1
1.14
1.53 1.54 1.59
0
0.5
1
1.5
2
Nvidia* Tesla M4 2S Intel® Xeon® processor
E5-2699 v4
NormalizedImages/Second
Higherisbetter
cuDNN v4 BVLC/Caffe cuDNN v5 NVDA/Caffe cuDNN v5 Intel Caffe - MKL 2017 Beta update Intel Caffe - MKL 2017 Gold**
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or
configuration may affect your actual performance.
Intel Confidential – NDA USE ONLY
TOOLANDLIBRARYDETAILS
INTEL®DAAL
24
Intel® Data Analytics Acceleration Library (Intel® DAAL)
§  Python, Java & C++ APIs
§  Can be used with many platforms (Hadoop*,
Spark*, R*, …) but not tied to any of them
§  Flexible interface to connect to different
data sources (CSV, SQL, HDFS, …)
§  Windows*, Linux*,
and OS X*
§  Developed by same team as the industry-
leading Intel® Math Kernel Library
§  Open source, Free community-supported
and commercial premium-supported
options
§  Also included in
Parallel Studio XE suites
An Intel-optimized library that provides building blocks for all data analytics
stages, from data preparation to data mining & machine learning
Intel DAAL Overview
Industry leading performance, C++/Java/Python library for machine
learning and deep learning optimized for Intel® Architectures.
(De-)Compression
PCA
Statistical moments
Variance matrix
QR, SVD, Cholesky
Apriori
Linear regression
Naïve Bayes
SVM
Classifier boosting
Kmeans
EM GMM
Collaborative filtering
Neural Networks
Pre-processing Transformation Analysis Modeling Decision Making
Scientific/Engineering
Web/Social
Business
Validation
INTEL CONFIDENTIAL
26
Big data frameworks: Hadoop, Spark, Cassandra, etc.
SQLstores
NoSQL
stores
In-memory
stores
Connectors
Data
mining
Recommendation
engines
Customer
behavior
modeling
BI
analytics
Real time
analytics
Big data analytics
Current common practice
•  Limited
performance
•  Many layers of
dependencies
•  Low ROI on HW
investment
•  Run on state-of-art
hardware
•  Built with a patchwork
of math libs
•  Under-exploiting
hardware performance
features
Data
sources
Finance
Social
media
Marketing
IoT
Mfg
…
Problem Statement
INTEL CONFIDENTIAL
27
Big data frameworks: Hadoop, Spark, Cassandra, etc.
SQLstores
NoSQL
stores
In-memory
stores
Connectors
Data
mining
Recommendation
engines
Customer
behavior
modeling
BI
analytics
Real time
analytics
Big data analytics
Current common practice
•  Limited
performance
•  Many layers of
dependencies
•  Low ROI on HW
investment
•  Run on state-of-art
hardware
•  Built with a patchwork
of math libs
•  Under-exploiting
hardware performance
features
Data
sources
Finance
Social
media
Marketing
IoT
Mfg
…
Problem Statement
Spark*
MLLib
Breeze
Netlib-
Java
JVM
JNI
Netlib
BLAS
INTEL CONFIDENTIAL
28
Big data frameworks: Hadoop, Spark, Cassandra, etc.
SQLstores
NoSQL
stores
In-memory
stores
Connectors
Data
mining
Recommendation
engines
Customer
behavior
modeling
BI
analytics
Real time
analytics
Big data analytics
Desired practice •  Optimized
performance
•  Simpler
development &
deployment
•  High ROI on HW
investment
•  Run on state-of-art
hardware
•  Single library to cover
all stages of data
analytics
•  Fully optimized for
underlying hardware
Data
sources
Finance
Social
media
Marketing
IoT
Mfg
…
Desired Solution
INTEL CONFIDENTIAL
Intel DAAL Components
Data Processing Algorithms
Optimized analytics building blocks for
all data analysis stages, from data
acquisition to data mining and
machine learning
Data Modeling Algorithms
Data structures for model
representation, and operations to
derive model-based predictions and
conclusions
Data Management
Interfaces for data representation and
access. Connectors to a variety of data
sources and data formats, such HDFS,
SQL, CSV, ARFF, and user-defined data
source/format
Data Sources Numeric Tables
Compression /
Decompression
Serialization /
Deserialization
Data Transformation and Analysis in Intel® DAAL
Basic
statistics for
datasets
Low
order
moments
Variance-
Covariance
matrix
Correlation
and
dependence
Cosine
distance
Correlation
distance
Matrix
factorizations
SVD
QR
Cholesky
Dimensionality
reduction
PCA
Outlier
detection
Association
rule mining
(Apriori)
Univariate
Multivariate
Algorithms supporting streaming and distributed processing in initial release
Quantiles
Machine Learning in Intel® DAAL
Supervised
learning
Regression
Linear
Regression
Classification
Weak
learner
Boosting
(Ada, Brown, Logit)
Naïve
Bayes
Support Vector Machine
Unsupervised
learning
K-Means
Clustering
EM for
GMM
Collaborative
filtering
Alternating
Least Squares
Algorithms supporting streaming and distributed processing
Neural
Networks
33
What’s New: Intel DAAL 2017
•  Neural Networks
•  Python API (a.k.a. PyDAAL)
–  Easy installation through Anaconda or pip
•  Open source project on GitHub
Fork me on GitHub:
https://github.com/01org/daal
INTEL®MKLANDMKL-DNN
35
Highly optimized threaded math routines
§  Performance, Performance, Performance!
Industry’s leading math library
§  Widely used in science, engineering, data processing
Tuned for Intel® processors – current and next generation
Intel® Math Kernel Library (Intel® MKL) Introduction
More math library users depend on MKL
than any other library
EDC North America
Development Survey
2016, Volume I
Be multiprocessor aware
•  Cross-Platform Support
•  Be vectorised , threaded, and
distributed multiprocessor aware
Intel Engineering Algorithm Experts
Optimized code path dispatch auto	
Intel®
Compatible
Processors
Past Intel®
Processors
Current
Intel®
Processors
Future
Intel®
Processors
36
Components of Intel MKL 2017
Linear Algebra
•  BLAS
•  LAPACK
•  ScaLAPACK
•  Sparse BLAS
•  Sparse Solvers
•  Iterative
•  PARDISO*
•  Cluster Sparse
Solver
Fast Fourier
Transforms
•  Multidimensional
•  FFTW interfaces
•  Cluster FFT
Vector Math
•  Trigonometric
•  Hyperbolic
•  Exponential
•  Log
•  Power
•  Root
•  Vector RNGs
Summary
Statistics
•  Kurtosis
•  Variation
coefficient
•  Order statistics
•  Min/max
•  Variance-
covariance
And More…
•  Splines
•  Interpolation
•  Trust Region
•  Fast Poisson
Solver
Deep Neural
Networks
•  Convolution
•  Pooling
•  Normalization
•  ReLU
•  Inner Product
New
Xeon Xeon Phi FPGA
Deep Learning Frameworks
Caffe
Intel®
MKL-DNN
Intel® Math Kernel
Library
(Intel® MKL)
* GEMM matrix multiply building blocks are binary
Intel® Math Kernel Library and Intel® MKL-DNN for Deep
Learning Framework Optimization
Intel® MKL Intel® MKL-DNN
DNN primitives + wide
variety of other math
functions
DNN primitives
C DNN APIs C/C++ DNN APIs
Binary distribution Open source DNN code*
Free community license.
Premium support
available as part of
Parallel Studio XE
Apache 2.0 license
Broad usage DNN
primitives; not specific to
individual frameworks
Multiple variants of DNN
primitives as required for
framework integrations
Quarterly update releases
Rapid development
ahead of Intel MKL
releases
INTEL CONFIDENTIAL
INTEL CONFIDENTIAL
Case Study I: Deep Learning
LeCloud* Illegal Video Detection
•  LeCloud : leading video cloud provider in China who
provides illegal video detection service
•  Originally: Adopted open source BVLC Caffe w/OpenBlas
as CNN framework
•  Now: Using Intel Optimized Caffe plus Intel® Math Kernel
Library, achieved 30x performance improvement for
training in production
* The test data is based on Intel® Xeon® E5 2680 V3 processor	
…
CNN Network based on CaffePicturesVideo
Video
Classification
Porn
Sexual
Normal
Picture Scoring
Extract
key frame Input Classification
0
10
20
30
BVLC Caffe +
OpenBlas (Baseline)
Intel Optimized
Caffe + MKL
Optimized Performance of LeCloud*
Illegal Video Detection - higher is better
40
41
Intel® DAAL+ Intel® MKL = Complementary Big Data
Libraries Solution
Intel MKL Intel DAAL
C and Fortran API
Primitive level
Python, Java & C++ API
High-level
Processing of homogeneous
data in single or double
precision
Processing heterogeneous data (mix of
integers and floating point), internal
conversions are hidden in the library
Type of intermediate
computations is defined by
type of input data (in some
library domains higher
precision can be used)
Type of intermediate computations can be
configured independently of the type of
input data
Most of MKL supports batch
computation mode only
3 computation modes: Batch, streaming
and distributed
Cluster functionality uses MPI
internally
Developer chooses communication
method for distributed computation (e.g.
Spark, MPI, etc.) Code samples provided.
“Initially, the Spark/Shark-based
solution required 40 hours to
compete a computation. Youku
improved performance
significantly by implementing
Intel® Math Kernel Library (Intel®
MKL) into its solution…After
implementation of Intel MKL,
Youku reduced the computation
time to less than three hours.”
Source: Youku Tudou Video
Sharing Recommendation Case
Study
INTEL®DEEPLEARNINGSDK
Increased Productivity
Faster Time-to-market for
training and inference,
Improve model accuracy,
Reduce total cost of
ownership
Maximum Performance
Optimized performance for
training and inference on
Intel® Architecture
Intel® Deep Learning SDK
Accelerate Your Deep Learning Solution
A free set of tools for data scientists and software developers to develop,
train, and deploy deep learning solutions
“Plug & Train/Deploy”
Simplify installation &
preparation of deep learning
models using popular deep
learning frameworks on Intel
hardware
Deep Learning Training Tool
Intel® Deep Learning SDK
•  Simplify installation of Intel optimized Deep
Learning Frameworks
•  Easy and Visual way to Set-up, Tune and Run
Deep Learning Algorithms:
ü  Create training dataset
ü  Design model with automatically optimized
hyper-parameters
ü  Launch and monitor training of multiple
candidate models
ü  Visualize training performance and accuracy
DL Framework
Datacenter
MKL-DNN
DL Training Tool
Dataset
Install
Configure
Run
Accuracy
Utilization
Model
.prototxt
.caffemodel
Trained
Model
Data Scientist
Label
Deep Learning Deployment Tool
Intel® Deep Learning SDK
Unleash fast scoring performance on Intel
products while abstracting the HW from
developers
•  Imports trained models from all popular DL
framework regardless of training HW
•  Compresses model for improved execution,
storage & transmission (pruning, quantization)
•  Generates scoring HW-specific code (C/C++,
OpenVX graphs, OpenCL, etc.)
•  Enables seamless integration with full
system / application software stack
45
.prototxt
.caffemodel
Trained
Model
Model Optimizer
FP Quantize
Model Compress
End-Point
SW Developer
Import
Inference Run-Time
OpenVX
Application Logic
Forward Result
Real-time Data
Validation Data
Model Analysis
MKL-DNN
Deploy-ready
model
46
Deep Learning Tools for End-to-End Workflow
Intel® Deep Learning SDK
MKL-DNN Optimized Machine Learning Frameworks
Intel DL Training Tool Intel DL Deployment Tool
•  IMPORT Trained Model (trained on Intel or 3rd Party HW)
•  COMPRESS Model for Inference on Target Intel HW
•  GENERATE Inference HW-Specific Code (OpenVX, C/C++)
•  INTEGRATE with System SW / Application Stack & TUNE
•  EVALUATE Results and ITERATE
•  INSTALL / SELECT
IA-Optimized Frameworks
•  PREPARE / CREATE Dataset with Ground-truth
•  DESIGN / TRAIN Model(s) with IA-Opt. Hyper-Parameters
•  MONITOR Training Progress across Candidate Models
•  EVALUATE Results and ITERATE
configure_nn(fpga/,…)
allocate_buffer(…)
fpga_conv(input,output);
fpga_conv(…);
mkl_SoftMax(…);
mkl_SoftMax(…);
…
Xeon (local or cloud)
Optimized libraries & run-times (MKL-DNN, OpenVX, OpenCL)
Data acquisition (sensors) and acceleration HW (FPGA, etc)
Target Inference Hardware Platform (physical or simulated)
Intel® Math Kernel Library
Xeon Xeon Phi FPGA
Intel® MKL-DNN
Host &
Offload
47
Intel Deep Learning Software Stack
SW building block to extract max Intel HW performance and provide common
interface to all Intel accelerators.
Popular Deep Learning frameworks
Open source Intel x86 optimized DNN APIs, combined with Intel® MKL and
build tools designed for scalable, high-velocity integration with ML/DL
frameworks.
Includes:
•  New algorithms ahead of MKL releases
•  IA optimizations contributed by community
Deep Learning Frameworks
Caffe
Tools to accelerate design, training and deployment of deep learning solutionsIntel® Deep Learning SDK
Targeted release: early Q4’2016
*Other names and brands may be claimed as property of others. Software.intel.com/machine-learning
Intel libraries as path to bring optimized ML/DL frameworks to Intel hardware
INTEL®DISTRIBUTIONFORPYTHON
OURAPPROACH
1. Enable hooks to Intel® MKL, Intel® DAAL, Intel® IPP functions in the most popular numerical packages
§  NumPy, SciPy, Scikit-Learn, PyTables, Scikit-Image, …
2. Available through Intel® Distribution for Python* and as Conda packages
§  Most optimizations eventually upstreamed to home open source projects
3. Provide Python interfaces for Intel® DAAL (a.k.a PyDAAL)
More cores à More Threads à Wider vectors
Numpy & Scipy optimizations with Intel® MKL
Linear Algebra
•  BLAS
•  LAPACK
•  ScaLAPACK
•  Sparse BLAS
•  Sparse Solvers
•  Iterative
•  PARDISO SMP & Cluster
Fast Fourier Transforms
•  Multidimensional
•  FFTW interfaces
•  Cluster FFT
Vector Math
•  Trigonometric
•  Hyperbolic
•  Exponential
•  Log
•  Power
•  Root
Vector RNGs
•  Multiple BRNG
•  Support methods for
independent streams
creation
•  Support all key probability
distributions
Summary Statistics
•  Kurtosis
•  Variation coefficient
•  Order statistics
•  Min/max
•  Variance-covariance
And More
•  Splines
•  Interpolation
•  Trust Region
•  Fast Poisson Solver
50
Up to
100x
faster
Up to
10x
faster!
Up to
10x
faster!
Up to
60x
faster!
Configuration info: - Versions: Intel® Distribution for Python
2017 Beta, icc 15.0; Hardware: Intel® Xeon® CPU E5-2698 v3
@ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM,
8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04
LTS.
Near native performance on Intel® Xeon™ and Intel® Xeon Phi™
51
•  Runs out-of-the-box
with any Python
•  Intel Distribution for
Python delivers much
greater efficiency
than “system” Python
•  Potential for future
multi-threaded
performance tunings
in numpy and scipy
Roadmap &
Reviews
Available as free standalone download
Commercial support through Intel®
Parallel Studio 2017
Download at https://software.intel.com/en-us/python-distribution
“I expected Intel’s numpy to be
fast but it is significant that plain
old python code is much faster
with the Intel version too.“
Dr. Donald Kinghorn,
Puget Systems Review
HPC Podcast Looks at
Intel’s Pending
Distribution of
Python
Yes, Intel is doing their own Python build! It is still
in beta but I think it’s a great idea. ……….Yeah, it’s
important!
Intel's Python
distribution provides a
major math boost
The still-in-beta Python distribution uses Math Kernel
Library to speed up processing on Intel hardware
The distribution's main touted advantage is speed -- but
not a PyPy-style general speedup via a JIT. Instead, the
MKL speeds up certain math operations so that they run
faster on one thread and multiple threads.
CALL-TO-ACTION
54
Summary
I.  Deep Learning framework optimizations on Xeon, Xeon Phi – Session #
II.  Intel DAAL
III.  Intel MKL, MKL-DNN
IV.  Intel Python optimizations – Session #
V.  Intel Deep Learning SDK
Learn more at www.intel.com/machinelearning
Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not
guarantee any costs or cost reduction. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware,
software or configuration may affect your actual performance.
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product
when combined with other products.
Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are
reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
5555
Accelerate Machine Learning Software on Intel Architecture

Weitere ähnliche Inhalte

Was ist angesagt?

A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Deep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesDeep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesMindos Cheng
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibDataWorks Summit
 

Was ist angesagt? (20)

A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
2. Cnnecst-Why the use of FPGA?
2. Cnnecst-Why the use of FPGA? 2. Cnnecst-Why the use of FPGA?
2. Cnnecst-Why the use of FPGA?
 
Deep Learning Accelerator Design Techniques
Deep Learning Accelerator Design TechniquesDeep Learning Accelerator Design Techniques
Deep Learning Accelerator Design Techniques
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlib
 

Ähnlich wie Accelerate Machine Learning Software on Intel Architecture

Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoMichelle Holley
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersSri Ambati
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeIntel® Software
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI TodayDESMOND YUEN
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyIntel IT Center
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)byteLAKE
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewDESMOND YUEN
 
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014StampedeCon
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneDatabricks
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Intel IT Center
 
AIDC Summit LA- Hands-on Training
AIDC Summit LA- Hands-on Training AIDC Summit LA- Hands-on Training
AIDC Summit LA- Hands-on Training Intel® Software
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chipinside-BigData.com
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 

Ähnlich wie Accelerate Machine Learning Software on Intel Architecture (20)

Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for Telco
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the Edge
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
 
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013
 
AIDC Summit LA- Hands-on Training
AIDC Summit LA- Hands-on Training AIDC Summit LA- Hands-on Training
AIDC Summit LA- Hands-on Training
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
The Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoTThe Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoT
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 

Mehr von Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Software
 

Mehr von Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 

Kürzlich hochgeladen

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Accelerate Machine Learning Software on Intel Architecture

  • 1. ACCELERATINGMACHINELEARNING SOFTWAREONIA Ananth Sankaranarayanan Director of Engineering, Analytics & AI Solutions Group, Intel November 2016
  • 2. 2 This session reviews Intel’s hardware and software strategy for machine learning. Intel is offering a range of tools and partnering with the open source community to help developers deliver optimized machine learning applications for Intel Architecture systems. This session spans a range of software development frameworks, libraries and other tools for machine learning with a focus on performance. Topics include IA-optimized frameworks such as Apache Spark*, Berkeley Caffe*, Google Tensorflow, and Nervana Neon and related performance benchmarks on the latest Intel Xeon and Intel Xeon Phi™ processors. Machine learning libraries Intel® Math Kernel Library and Intel® Data Analytics Acceleration Library will also be highlighted along with the new Intel® Distribution for Python and Intel® Deep Learning SDK. Abstract
  • 3. 3 Ananth Sankaranarayanan is currently the Director of Engineering with Intel in the Analytics and Artificial Intelligence Solutions group. His team is responsible for performance, partner and solution engineering functions, driving new platform initiatives jointly with leading Cloud Service Providers, Hardware Manufacturers and Software Vendors worldwide delivering engineered solutions to simplify implementations. Ananth previously led the HPC program for Intel silicon design/manufacturing and delivered 5 successful generations of supercomputers that directly contributed to reducing Intel Silicon Time to Market, and he has been with Intel since 2001. Ananth received his bachelors in computer science and engineering from Bharathidasan University in India and his Masters in Business Administration from City University of Seattle, USA. Speaker Bio
  • 4. Bigger Data Better Hardware Smarter Algorithms 4 Drivers for fast emergence of AI Image: 1 MB / picture Audio: 5 MB / song Video: 5 GB / movie Significant compute performance increases year over year Parallel processing norm now Advances in algorithm innovation, including neural networks, leading to better accuracy in training models
  • 5. 5 Data + Analytics Creates Unique Opportunities Competitive Advantage Impactful Insights Measurable Business Value 2X 5X 3X 2X Companies that use analytics best are… …more likely to Make data-driven decisions Make decisions faster than others Execute on decisions faster Have top- quartile financial results Source: Bain
  • 6. 6 The Evolution of Artificial Intelligence What happened? Where is the problem? What will happen in the future? Why is this happening? How can we achieve the best outcome? Businessvalue Complexity/ Opportunity Cognitive AnalyticsPrescriptive Analytics Predictive Analytics Advanced Operational Analytics Integrated OLTP and Analytics Early BI Transactional Reporting Spread sheets Late Mainstream Early Analytics using Artificial Intelligence
  • 7. Taxonomy 7 Training: Build a mathematical model based on a data set Inference: Use trained model to make predictions about new data Deep Learning (DL) Algorithms where abstract ideas are represented by multiple (deep) layers of graphs RBM …RNNCNN Classical Machine Learning Algorithms based on statistical or other techniques for estimating functions from examples GA Linear Regression Support Vector Machines Naïve Bayes Artificial Intelligence (AI) Machines that can sense, reason, act without explicit programming  Machine Learning (ML), a key tool for AI, is the development, and application of algorithms that improve their performance at some task based on experience (previous iterations)
  • 8. 8 Step 1: Training (In Data Center – Over Hours/Days/Weeks) AI: Deep Learning Example: Person 90% person 8% traffic light Massive data sets: labeled or tagged input data Output Classification Create “Deep neural net” math model Step 2: Inference (At the Edge or in the Data Center - Instantaneous) New input from camera and sensors Output Classification Trained neural network model 97% person Trained Model
  • 9. 9 AI: Example Use Cases Cloud Service Providers Financial Services Healthcare Automotive •  Image classification and detection for accurate diagnosis •  Image recognition/ tagging for defect identification •  Natural language recognition (digital assistants) •  Big data pattern detection •  Targeted ads to increase revenue •  Fraud prevention/ face detection •  Gaming, check processing •  Computer server monitoring •  Safe navigation for autonomous vehicles •  Financial forecasting and prediction to avoid risk •  Network intrusion detection
  • 10. 10 AI Market Opportunity $8.2B 2013 $70B 2020 Hardware & Software 7% 93%   1Source: DCG Market Intelligence team AI servers Other servers Server Market (2015)1 7%93% Intel Intel+GPU Other Architecture MSS 94.3% 2.4% 30MB/DAY Smartphone PRESENT FUTURE Data is the next disrupter By 2020, Machine to Machine connections will be 47% of total devices & connections 90MB/DAY PC 4TB/DAY Connected Car 1PB/DAY Connected Factory 40TB/DAY Connected Plane 2Source: IDC, IOT market related to analytics AI-based analytics market2 3.3%
  • 12. 12 Intel AI strategy Intel® Math Kernel Library (Intel® MKL & MKL-DNN) Intel® Data Analytics Acceleration Library (Intel® DAAL) Trusted Analytics Platform +Network +Memory +StorageDatacenter Endpoint Solution blueprints for reference across industries Tools/Platforms to accelerate deployment of IA solution stack Optimized Open Frameworks that scale to multi-node and deliver best performance Free Libraries/Languages featuring optimized ML/DL building blocks to enable developers Best in class hardware Cross compatible portfolio spanning from data center to edge delivering high perf, perf/TCO, perf/w Making AI more pervasive by enabling deployment ready AI solutions through a large, open ecosystem Intel Deep Learning Training & Deployment Tools (SDK) Intel Distribution for Python
  • 14. Training Inference 14 Intel AI Products for the Datacenter Intel® Xeon Phi™ Processors •  Optimized for performance •  Scales with cluster size for shorter time to model •  x86 architecture, consistent programming model for training and inference Intel® Xeon® Processors •  Optimized for performance/TCO •  Most widely deployed inference solution Intel® Xeon® Processors + FPGA (discrete) •  Optimized for performance/watt •  Reconfigurable – can be used to accelerate many DC workloads •  Programmable with industry standard OpenCL
  • 15. 15 Intel® Xeon Phi™ Processor Family •  Up to ~6 SGEMM TFLOPs 3 per socket •  Great scaling efficiency resulting in lower time to train for multi-node •  Eliminates add-in card PCIe* offload bottleneck and utilization constraints •  Integrated Intel® Omni-Path fabric (dual-port; 50 GB/s) increases price- performance and reduces communication latency for deep learning networks •  Binary-compatible with Intel® Xeon® processors •  Open standards, libraries and frameworks Breakthrough Highly- Parallel Performance Removes Barriers through Integration Better Programmability All specifications refer to the future Intel® Xeon Phi™ processor (Knights Landing) unless otherwise noted All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. Enables shorter time to train Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance *Other brands and names are the property of others. Configurations: 1-8 see page 16.
  • 16. 16 Mainstream Training: Intel® Xeon Phi™ Processor 7250 Competitive Deep Learning Image Classification SINGLE-NODE Training With Mainstream cuDNN 1 1.04 1.29 1.03 1.10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 2S Intel® Xeon® processor E5-2620 v3 + Nvidia® Tesla M40 Intel® Xeon Phi™ processor 7250 NormalizedImages/Second Higherisbetter cuDNN v4 BVLC/Caffe cuDNN v5 NVDA/Caffe cuDNN v5 Intel Caffe - MKL 2017 Beta Update 1 Intel Caffe - MKL 2017 Gold** Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance . *Other names and brands may be property of others Configurations: 1.  Nvidia Tesla M40* (core@923MHz, 12GB, mem@3004MHz, 250W), DIGITS Deep Learning Machine* hosted on 2S Intel® Xeon® processor E5-2620 v3, 64GB memory, Ubuntu* 14.04, Nvidia* Driver v352.41, cuDNN v4, BVLC/Caffe cuDNN v5 or NVIDIA/Caffe cuDNN v5 2.  Intel® Xeon Phi™ processor 7250 (68 Cores, 1.4 GHz, 16GB), 128GB memory, Red Hat* Enterprise Linux 6.7, Intel® Caffe: https://github.com/intelcaffe AlexNet: https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf, Batch Size:256; ** Q4’16 is estimated based on MKL engineering version Topology: Caffe*/AlexNet1 Dataset: Large image database NDA Only Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel Confidential – NDA USE ONLY
  • 17. 17 Why does Scaling Matter? Train Up to 50x faster with Intel® Xeon Phi™ Processor Deep Learning Image Classification Training Performance - MULTI-NODE Scaling Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance/datacenter. Configurations: Up to 50x faster training on 128-node as compared to single-node based on AlexNet* topology workload (batch size = 1024) training time using a large image database running one node Intel Xeon Phi processor 7250 (16 GB MCDRAM, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat Enterprise Linux* 6.7 (Santiago), 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk, running Intel® Optimized DNN Framework, training in 39.17 hours compared to 128-node identically configured with Intel® Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe x16 connectors training in 0.75 hours. Contact your Intel representative for more information on how to obtain the binary. For information on workload, see https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional- neural-networks.pdf. Topology: AlexNet* Dataset: Large image database 1.0x 1.9x 3.7x 6.6x 12.8x 23.5x 33.7x 52.2x 1 node 2 nodes 4 nodes 8 nodes 16 nodes 32 nodes 64 nodes 128 nodes NormalizedTrainingTime– Higherisbetter
  • 18. 18 Great Scaling Efficiency: Intel® Xeon Phi™ Processor Deep Learning Image Classification Training Performance - MULTI-NODE Scaling Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance . *Other names and brands may be property of others Configurations: Intel® Xeon Phi™ Processor 7250 (68 Cores, 1.4 GHz, 16GB MCDRAM), 128 GB memory, Red Hat* Enterprise Linux 6.7, Intel® Optimized Frameworks Dataset: Large image database 0   10   20   30   40   50   60   70   80   90   100   1   2   4   8   16   32   64   128   SCALING  EFFICIENCY  %   #  OF  INTEL®  XEON  PHI™  PROCESSOR  7250  (68-­‐CORES,  1.4  GHZ,  16  GB)  NODES   OverFeat     AlexNet   VGG-­‐A   GoogLeNet   62 87
  • 19. 19 Knights Mill: Next Gen Intel® Xeon Phi™ processor Enables shorter time to train KNC   (2013)   KNL   (2016)   KNM   (2017)        KNH   (10nm  TBA)   Trains  Machines  Faster   •  >2X*  Single  Precision  &  >4X*  16-­‐bit  Mixed  Precision  faster   deep  learning  training  performance   •  Highly  distributed  processing  with  efficient  scaling  over  mule-­‐ node  offers  flexible  infrastructure  for  ML/DL  workloads   Consistent  Programming  Model   •  Common  Xeon  &  Xeon  Phi  programming  for  developers     •  Opemized  for  industry  standard  Open  Source  frameworks   •  Bootable  Host-­‐CPU  avoids  offloading  latency  &  bomleneck   Memory  Flexibility   •  High  memory  bandwidth  with  integrated  DRAM  increases   performance  for  complex  neural  datasets  by  reducing  latency   •  Large  DDR4  memory  capacity  for  massive  AI  use  cases   Single-­‐Precision  Teraflops  (Peak)     Common  Groveport  Pla6orm   Bootable  Host  CPU   *NOTE: Performance theoretical wrt KNL7250 SKU based on KNM architectural changes. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Performance estimate wrt KNL 7250 SKU SGEMM. Performance Calculation= AVX freq X Cores X Flops per Core X Efficiency Intel Confidential – NDA USE ONLY
  • 20. 20 Intel® Xeon® Processor E5 Family •  Classify 1115 images/second •  Industry standard server features: high reliability, hardware enhanced security •  Standard server infrastructure •  Open standards, libraries & frameworks •  Optimized to run wide variety of data center workloads Server Class Reliability Lowest TCO With Good Infrastructure Flexibility High throughput inference on existing server class infrastructure Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. *Other names and brands may be claimed as property of others. Leadership Throughput Configuration: 2S Intel® Xeon® Processor E5-2699 v4, 22C, 2.3GHz, 128GB, Red Hat Enterprise Linux* 6.7, Intel® Caffe : https://github.com/intelcaffe AlexNet: https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf, Batch Size:256; ** Q3’16 is estimated based on MKL engineering version
  • 21. 21 Scoring on Intel® Xeon® Processor E5-2699 v4Deep Learning Image Classification SINGLE-NODE SCORING Performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.  For more information go to http://www.intel.com/performance . *Other names and brands may be property of others Configurations (more details see slide 10): 1.  Nvidia Tesla M4* (core@923MHz, 4GB, mem@3004MHz, 75W), DIGITS* Deep Learning Machine , 1S Intel® Xeon® Processor E5-2620 v3, 2.4GHz, 64GB, Ubuntu 14.04, Nvidia* Driver version 352.68, cuDNN v4, BVLC/ Caffe cuDNN v5 or NVIDIA/Caffe cuDNN v5 2.  2S Intel® Xeon® Processor E5-2699 v4, 22C, 2.3GHz, 128GB, Red Hat Enterprise Linux* 6.7, Intel® Caffe : https://github.com/intelcaffe AlexNet: https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf, Batch Size:256; ** Q4’16 based on MKL engineering version Topology: Caffe*/AlexNet1 Dataset: Large image database 1 1.14 1.53 1.54 1.59 0 0.5 1 1.5 2 Nvidia* Tesla M4 2S Intel® Xeon® processor E5-2699 v4 NormalizedImages/Second Higherisbetter cuDNN v4 BVLC/Caffe cuDNN v5 NVDA/Caffe cuDNN v5 Intel Caffe - MKL 2017 Beta update Intel Caffe - MKL 2017 Gold** Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel Confidential – NDA USE ONLY
  • 24. 24 Intel® Data Analytics Acceleration Library (Intel® DAAL) §  Python, Java & C++ APIs §  Can be used with many platforms (Hadoop*, Spark*, R*, …) but not tied to any of them §  Flexible interface to connect to different data sources (CSV, SQL, HDFS, …) §  Windows*, Linux*, and OS X* §  Developed by same team as the industry- leading Intel® Math Kernel Library §  Open source, Free community-supported and commercial premium-supported options §  Also included in Parallel Studio XE suites An Intel-optimized library that provides building blocks for all data analytics stages, from data preparation to data mining & machine learning
  • 25. Intel DAAL Overview Industry leading performance, C++/Java/Python library for machine learning and deep learning optimized for Intel® Architectures. (De-)Compression PCA Statistical moments Variance matrix QR, SVD, Cholesky Apriori Linear regression Naïve Bayes SVM Classifier boosting Kmeans EM GMM Collaborative filtering Neural Networks Pre-processing Transformation Analysis Modeling Decision Making Scientific/Engineering Web/Social Business Validation
  • 26. INTEL CONFIDENTIAL 26 Big data frameworks: Hadoop, Spark, Cassandra, etc. SQLstores NoSQL stores In-memory stores Connectors Data mining Recommendation engines Customer behavior modeling BI analytics Real time analytics Big data analytics Current common practice •  Limited performance •  Many layers of dependencies •  Low ROI on HW investment •  Run on state-of-art hardware •  Built with a patchwork of math libs •  Under-exploiting hardware performance features Data sources Finance Social media Marketing IoT Mfg … Problem Statement
  • 27. INTEL CONFIDENTIAL 27 Big data frameworks: Hadoop, Spark, Cassandra, etc. SQLstores NoSQL stores In-memory stores Connectors Data mining Recommendation engines Customer behavior modeling BI analytics Real time analytics Big data analytics Current common practice •  Limited performance •  Many layers of dependencies •  Low ROI on HW investment •  Run on state-of-art hardware •  Built with a patchwork of math libs •  Under-exploiting hardware performance features Data sources Finance Social media Marketing IoT Mfg … Problem Statement Spark* MLLib Breeze Netlib- Java JVM JNI Netlib BLAS
  • 28. INTEL CONFIDENTIAL 28 Big data frameworks: Hadoop, Spark, Cassandra, etc. SQLstores NoSQL stores In-memory stores Connectors Data mining Recommendation engines Customer behavior modeling BI analytics Real time analytics Big data analytics Desired practice •  Optimized performance •  Simpler development & deployment •  High ROI on HW investment •  Run on state-of-art hardware •  Single library to cover all stages of data analytics •  Fully optimized for underlying hardware Data sources Finance Social media Marketing IoT Mfg … Desired Solution
  • 30. Intel DAAL Components Data Processing Algorithms Optimized analytics building blocks for all data analysis stages, from data acquisition to data mining and machine learning Data Modeling Algorithms Data structures for model representation, and operations to derive model-based predictions and conclusions Data Management Interfaces for data representation and access. Connectors to a variety of data sources and data formats, such HDFS, SQL, CSV, ARFF, and user-defined data source/format Data Sources Numeric Tables Compression / Decompression Serialization / Deserialization
  • 31. Data Transformation and Analysis in Intel® DAAL Basic statistics for datasets Low order moments Variance- Covariance matrix Correlation and dependence Cosine distance Correlation distance Matrix factorizations SVD QR Cholesky Dimensionality reduction PCA Outlier detection Association rule mining (Apriori) Univariate Multivariate Algorithms supporting streaming and distributed processing in initial release Quantiles
  • 32. Machine Learning in Intel® DAAL Supervised learning Regression Linear Regression Classification Weak learner Boosting (Ada, Brown, Logit) Naïve Bayes Support Vector Machine Unsupervised learning K-Means Clustering EM for GMM Collaborative filtering Alternating Least Squares Algorithms supporting streaming and distributed processing Neural Networks
  • 33. 33 What’s New: Intel DAAL 2017 •  Neural Networks •  Python API (a.k.a. PyDAAL) –  Easy installation through Anaconda or pip •  Open source project on GitHub Fork me on GitHub: https://github.com/01org/daal
  • 35. 35 Highly optimized threaded math routines §  Performance, Performance, Performance! Industry’s leading math library §  Widely used in science, engineering, data processing Tuned for Intel® processors – current and next generation Intel® Math Kernel Library (Intel® MKL) Introduction More math library users depend on MKL than any other library EDC North America Development Survey 2016, Volume I Be multiprocessor aware •  Cross-Platform Support •  Be vectorised , threaded, and distributed multiprocessor aware Intel Engineering Algorithm Experts Optimized code path dispatch auto Intel® Compatible Processors Past Intel® Processors Current Intel® Processors Future Intel® Processors
  • 36. 36 Components of Intel MKL 2017 Linear Algebra •  BLAS •  LAPACK •  ScaLAPACK •  Sparse BLAS •  Sparse Solvers •  Iterative •  PARDISO* •  Cluster Sparse Solver Fast Fourier Transforms •  Multidimensional •  FFTW interfaces •  Cluster FFT Vector Math •  Trigonometric •  Hyperbolic •  Exponential •  Log •  Power •  Root •  Vector RNGs Summary Statistics •  Kurtosis •  Variation coefficient •  Order statistics •  Min/max •  Variance- covariance And More… •  Splines •  Interpolation •  Trust Region •  Fast Poisson Solver Deep Neural Networks •  Convolution •  Pooling •  Normalization •  ReLU •  Inner Product New
  • 37. Xeon Xeon Phi FPGA Deep Learning Frameworks Caffe Intel® MKL-DNN Intel® Math Kernel Library (Intel® MKL) * GEMM matrix multiply building blocks are binary Intel® Math Kernel Library and Intel® MKL-DNN for Deep Learning Framework Optimization Intel® MKL Intel® MKL-DNN DNN primitives + wide variety of other math functions DNN primitives C DNN APIs C/C++ DNN APIs Binary distribution Open source DNN code* Free community license. Premium support available as part of Parallel Studio XE Apache 2.0 license Broad usage DNN primitives; not specific to individual frameworks Multiple variants of DNN primitives as required for framework integrations Quarterly update releases Rapid development ahead of Intel MKL releases
  • 40. Case Study I: Deep Learning LeCloud* Illegal Video Detection •  LeCloud : leading video cloud provider in China who provides illegal video detection service •  Originally: Adopted open source BVLC Caffe w/OpenBlas as CNN framework •  Now: Using Intel Optimized Caffe plus Intel® Math Kernel Library, achieved 30x performance improvement for training in production * The test data is based on Intel® Xeon® E5 2680 V3 processor … CNN Network based on CaffePicturesVideo Video Classification Porn Sexual Normal Picture Scoring Extract key frame Input Classification 0 10 20 30 BVLC Caffe + OpenBlas (Baseline) Intel Optimized Caffe + MKL Optimized Performance of LeCloud* Illegal Video Detection - higher is better 40
  • 41. 41 Intel® DAAL+ Intel® MKL = Complementary Big Data Libraries Solution Intel MKL Intel DAAL C and Fortran API Primitive level Python, Java & C++ API High-level Processing of homogeneous data in single or double precision Processing heterogeneous data (mix of integers and floating point), internal conversions are hidden in the library Type of intermediate computations is defined by type of input data (in some library domains higher precision can be used) Type of intermediate computations can be configured independently of the type of input data Most of MKL supports batch computation mode only 3 computation modes: Batch, streaming and distributed Cluster functionality uses MPI internally Developer chooses communication method for distributed computation (e.g. Spark, MPI, etc.) Code samples provided. “Initially, the Spark/Shark-based solution required 40 hours to compete a computation. Youku improved performance significantly by implementing Intel® Math Kernel Library (Intel® MKL) into its solution…After implementation of Intel MKL, Youku reduced the computation time to less than three hours.” Source: Youku Tudou Video Sharing Recommendation Case Study
  • 43. Increased Productivity Faster Time-to-market for training and inference, Improve model accuracy, Reduce total cost of ownership Maximum Performance Optimized performance for training and inference on Intel® Architecture Intel® Deep Learning SDK Accelerate Your Deep Learning Solution A free set of tools for data scientists and software developers to develop, train, and deploy deep learning solutions “Plug & Train/Deploy” Simplify installation & preparation of deep learning models using popular deep learning frameworks on Intel hardware
  • 44. Deep Learning Training Tool Intel® Deep Learning SDK •  Simplify installation of Intel optimized Deep Learning Frameworks •  Easy and Visual way to Set-up, Tune and Run Deep Learning Algorithms: ü  Create training dataset ü  Design model with automatically optimized hyper-parameters ü  Launch and monitor training of multiple candidate models ü  Visualize training performance and accuracy DL Framework Datacenter MKL-DNN DL Training Tool Dataset Install Configure Run Accuracy Utilization Model .prototxt .caffemodel Trained Model Data Scientist Label
  • 45. Deep Learning Deployment Tool Intel® Deep Learning SDK Unleash fast scoring performance on Intel products while abstracting the HW from developers •  Imports trained models from all popular DL framework regardless of training HW •  Compresses model for improved execution, storage & transmission (pruning, quantization) •  Generates scoring HW-specific code (C/C++, OpenVX graphs, OpenCL, etc.) •  Enables seamless integration with full system / application software stack 45 .prototxt .caffemodel Trained Model Model Optimizer FP Quantize Model Compress End-Point SW Developer Import Inference Run-Time OpenVX Application Logic Forward Result Real-time Data Validation Data Model Analysis MKL-DNN Deploy-ready model
  • 46. 46 Deep Learning Tools for End-to-End Workflow Intel® Deep Learning SDK MKL-DNN Optimized Machine Learning Frameworks Intel DL Training Tool Intel DL Deployment Tool •  IMPORT Trained Model (trained on Intel or 3rd Party HW) •  COMPRESS Model for Inference on Target Intel HW •  GENERATE Inference HW-Specific Code (OpenVX, C/C++) •  INTEGRATE with System SW / Application Stack & TUNE •  EVALUATE Results and ITERATE •  INSTALL / SELECT IA-Optimized Frameworks •  PREPARE / CREATE Dataset with Ground-truth •  DESIGN / TRAIN Model(s) with IA-Opt. Hyper-Parameters •  MONITOR Training Progress across Candidate Models •  EVALUATE Results and ITERATE configure_nn(fpga/,…) allocate_buffer(…) fpga_conv(input,output); fpga_conv(…); mkl_SoftMax(…); mkl_SoftMax(…); … Xeon (local or cloud) Optimized libraries & run-times (MKL-DNN, OpenVX, OpenCL) Data acquisition (sensors) and acceleration HW (FPGA, etc) Target Inference Hardware Platform (physical or simulated)
  • 47. Intel® Math Kernel Library Xeon Xeon Phi FPGA Intel® MKL-DNN Host & Offload 47 Intel Deep Learning Software Stack SW building block to extract max Intel HW performance and provide common interface to all Intel accelerators. Popular Deep Learning frameworks Open source Intel x86 optimized DNN APIs, combined with Intel® MKL and build tools designed for scalable, high-velocity integration with ML/DL frameworks. Includes: •  New algorithms ahead of MKL releases •  IA optimizations contributed by community Deep Learning Frameworks Caffe Tools to accelerate design, training and deployment of deep learning solutionsIntel® Deep Learning SDK Targeted release: early Q4’2016 *Other names and brands may be claimed as property of others. Software.intel.com/machine-learning Intel libraries as path to bring optimized ML/DL frameworks to Intel hardware
  • 49. OURAPPROACH 1. Enable hooks to Intel® MKL, Intel® DAAL, Intel® IPP functions in the most popular numerical packages §  NumPy, SciPy, Scikit-Learn, PyTables, Scikit-Image, … 2. Available through Intel® Distribution for Python* and as Conda packages §  Most optimizations eventually upstreamed to home open source projects 3. Provide Python interfaces for Intel® DAAL (a.k.a PyDAAL) More cores à More Threads à Wider vectors
  • 50. Numpy & Scipy optimizations with Intel® MKL Linear Algebra •  BLAS •  LAPACK •  ScaLAPACK •  Sparse BLAS •  Sparse Solvers •  Iterative •  PARDISO SMP & Cluster Fast Fourier Transforms •  Multidimensional •  FFTW interfaces •  Cluster FFT Vector Math •  Trigonometric •  Hyperbolic •  Exponential •  Log •  Power •  Root Vector RNGs •  Multiple BRNG •  Support methods for independent streams creation •  Support all key probability distributions Summary Statistics •  Kurtosis •  Variation coefficient •  Order statistics •  Min/max •  Variance-covariance And More •  Splines •  Interpolation •  Trust Region •  Fast Poisson Solver 50 Up to 100x faster Up to 10x faster! Up to 10x faster! Up to 60x faster! Configuration info: - Versions: Intel® Distribution for Python 2017 Beta, icc 15.0; Hardware: Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04 LTS.
  • 51. Near native performance on Intel® Xeon™ and Intel® Xeon Phi™ 51 •  Runs out-of-the-box with any Python •  Intel Distribution for Python delivers much greater efficiency than “system” Python •  Potential for future multi-threaded performance tunings in numpy and scipy
  • 52. Roadmap & Reviews Available as free standalone download Commercial support through Intel® Parallel Studio 2017 Download at https://software.intel.com/en-us/python-distribution “I expected Intel’s numpy to be fast but it is significant that plain old python code is much faster with the Intel version too.“ Dr. Donald Kinghorn, Puget Systems Review HPC Podcast Looks at Intel’s Pending Distribution of Python Yes, Intel is doing their own Python build! It is still in beta but I think it’s a great idea. ……….Yeah, it’s important! Intel's Python distribution provides a major math boost The still-in-beta Python distribution uses Math Kernel Library to speed up processing on Intel hardware The distribution's main touted advantage is speed -- but not a PyPy-style general speedup via a JIT. Instead, the MKL speeds up certain math operations so that they run faster on one thread and multiple threads.
  • 54. 54 Summary I.  Deep Learning framework optimizations on Xeon, Xeon Phi – Session # II.  Intel DAAL III.  Intel MKL, MKL-DNN IV.  Intel Python optimizations – Session # V.  Intel Deep Learning SDK Learn more at www.intel.com/machinelearning Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
  • 55. Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 5555