SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
© Copyright 2017 Intel Corporation
Atanas Atanasov
Intel Confidential | NDA Required
Intel technologies may require enabled hardware, specific software, or services activation. Performance varies depending on system configuration. Check
with your system manufacturer or retailer.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating
your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit
http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may
affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
For more information go to http://www.intel.com/performance.
All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and
roadmaps.
No computer system can be absolutely secure.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve
a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including
the annual report on Form 10-K.
Intel, the Intel logo, Xeon, Intel vPro, Intel Xeon Phi, Look Inside., are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.
© 2017 Intel Corporation.
LegalDisclaimers
2
Disclaimers
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced
data are accurate.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the
OEM or retailer.
The cost reduction scenarios described are intended to enable you to get a better understanding of how the purchase of a given Intel based product, combined with a number of
situation-specific variables, might affect future costs and savings. Circumstances will vary and there may be unaccounted-for costs related to the use and deployment of a given
product. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs or cost reduction.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice. Notice Revision #20110804.
No computer system can be absolutely secure.
Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX
instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo
frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.
Intel processors of the same SKU may vary in frequency or power as a result of natural variability in the production process.
SPEC, SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation (SPEC).
© 2016 Intel Corporation. Intel, the Intel logo, Xeon, Xeon Phi, Xeon Phi logos and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names
and brands may be claimed as the property of others.
3Intel Confidential
4
Agenda
• Challenges in HPC/AI and SSF
• Compute: Xeon Scalable Family
• Fabric: Omni-Path
• Storage: Optane
• AI: Nervana
2
HPCisFoundationaltoInsight
Aerospace Biology Brain Modeling Chemistry/Chemical Engineering Climate Computer Aided Engineering Cosmology Cybersecurity Defense
Pharmacology Particle Physics Metallurgy Manufacturing / Design Life Sciences Government Lab Geosciences / Oil & Gas Genomics Fluid Dynamics
1Source: IDC HPC and ROI Study Update (September 2015)
2Source: IDC 2015 Q1 World Wide x86 Sever Tracker vs IDC 2015 Q1 World Wide HPC Sever Tracker
DigitalContentCreationEDAEconomics/FinancialServicesFraudDetection
SocialSciences;Literature,linguistics,marketingUniversityAcademicWeather
Business
Innovation
A New Science
Paradigm
Fundamental
Discovery
High ROI:
$515
Average Return Per $1 of HPC
Investment1
Advancing Science
And Our Understanding
of the Universe
Data-Driven Analytics
Joins Theory, Experimentation, and
Computational Science
2
Growing Challenges in HPC
“The Walls”
System Bottlenecks
Memory | I/O | Storage
Energy Efficient Performance
Space | Resiliency |
Unoptimized Software
Divergent
Infrastructure
Barriers to
Extending Usage
Resources Split Among
Modeling and Simulation | Big
Data Analytics | Machine
Learning | Visualization
HPC
Optimized
Democratization at Every
Scale | Cloud Access |
Exploration of New Parallel
Programming Models
Big
Datahpc
Machine learning
visualization
11
What Makes a Great HPC Solution?
Parallel File SystemSwitch Fabric
Login and
Management Nodes
. . .
Actual configurations depend on specific OEM offerings and implementation.
Intel® Omni-Path Fabric
1GbE for
administration
IBA
10/40 GbE
Networking
Gateways
Intel® Software Tools
Intel® Parallel Studio
Intel® Node Manager
Intel® Trace Analyzer
I/O Nodes
Intel® Networking
Intel® Omni-Path Fabric
Intel® Silicon Photonics
Burst Buffer
Intel® Xeon® Processors
Intel® Omni-Path Fabric
Intel® Optane™
Technology
Compute Nodes
Intel® Compute
Intel® Xeon Phi™ Processors
Intel® Xeon® Processors
Intel® Optane™ Technology
Intel® Omni-Path Fabric
Intel® Solutions for Lustre*
Intel® Enterprise Edition for Lustre*
Intel® Foundation Edition for Lustre*
Intel® Cloud Edition for Lustre*
Reference Architecture
Intel® Cluster Ready
Intel® Scalable
System Framework
3
A Holistic Architectural Approach is Required
Compute
Memory
Fabric
Storage
PERFORMANCEICAPABILITY
TIME
System
Software
Innovative Technologies Tighter Integration
Application
Modernized Code
Community
ISV
Proprietary
System
Memory
Cores
Graphics
Fabric
FPGA
I/O
5
Intel® Scalable System Framework
A Holistic Design Solution for All HPC Needs
Small Clusters Through Supercomputers
Compute and Data-Centric Computing
Standards-Based Programmability
On-Premise and Cloud-Based
Intel® Xeon® Processors
Intel® Xeon Phi™ Processors
Intel® Xeon Phi™ Coprocessors
Intel® Server Boards and Platforms
Intel® Solutions for Lustre*
Intel® Optane™ Technology
3D XPoint™ Technology
Intel® SSDs
Intel® Omni-Path Architecture
Intel® True Scale Fabric
Intel® Ethernet
Intel® Silicon Photonics
HPC System Software Stack
Intel® Software Tools
Intel® Cluster Ready Program
Intel Supported SDVis
Compute Memory/Storage
Fabric Software
Intel Silicon
Photonics
XEON
Scalable Family
10
Intel®Xeon®ScalableplatformThe foundation of Data Center Innovation:
Agile & Trusted Infrastructure
delivers1.65xaverageperformanceboostoverpriorGeneration1
11
1 Up to 1.65x Geomean based on Normalized Generational Performance going from Intel® Xeon® processor E5-26xx v4 to Intel® Xeon® Scalable processor (estimated based on Intel internal testing of OLTP
Brokerage, SAP SD 2-Tier, HammerDB, Server-side Java, SPEC*int_rate_base2006, SPEC*fp_rate_base2006, Server Virtualization, STREAM* triad, LAMMPS, DPDK L3 Packet Forwarding, Black-Scholes, Intel
Distribution for LINPACK
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating
your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Intel does not control or audit the design or
implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are
reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.
Performance
Pervasive through compute,
storage, and network
Agility
Rapid service delivery
Security
Pervasive data security with near
zero performance overhead
12
Typical2-socketconfiguration
CPU
x8
CPU
x8x4 x4
DMI 2
Intel®
QPI
Intel Xeon E5 v4 (2016) Purley (2017)
PCIe*
 Four DDR4 memory channels
 up to 24 DIMMs
 Up to 80 PCIe lanes
 Two QPI links (up to 9.6 GT/s)
 Six DDR4 memory channels
 up to 24 DIMMs
 Up to 96 PCIe lanes
 Two UPI links (up to 10.4 GT/s); up to 3 UPI links
in 4S and 8S configurations
 Integrated Intel® Omni-Path Architecture (Fabric)
DDR4 DIMMs
PCIe* uplink connection for Intel® QuickAssist Technology and Intel® Ethernet**
CPU Intel®
UPI
LBG
DMI
3x16
PCIe* 1x100G
Intel® OP Fabric
x4
3x16
PCIe* 1x100G
Intel® OP Fabric
CPU
**
Intel Xeon Scalable (2017)
13
INTEL®XEON®SCALABLEprocessors
TheFoundationforAgile,Secure,Workload-OptimizedHybridCloud
MAINSTREAM
Good
LightTASKS
SCALABLEPERFORMANCE
ATLOWPOWER ENTRY
SCALABLEPERFORMANCE
HARDWARE-ENHANCEDSECURITY
STANDARDRASSTANDARDRAS
MODERATETASKS
INTEL®TURBOBOOSTTECHNOLOGYAND
INTEL®HYPER-THREADINGTECHNOLOGY
FORMODERATEWORKLOADS FORLIGHTWORKLOADS
22CORESUPTO
SOCKET
SUPPORT2&4
3UPTO UPILINKS
RELIABILITY,AVAILABILITY
ANDSERVICEABILITYADVANCED
28CORESUPTO
SOCKET
SUPPORT8
1.5TBTOPLINEMEMORY
CHANNELBANDWIDTH
3LINKS
UPIUP
2,4&
2666DDR4 M
H
Z
WITH
UPTO
TO
WITH
UPTO
HIGHESTACCELERATOR
THROUGHPUT
ENTRYEfficient
ENTRYPERFORMANCE,PriceSensitive
14
 Maximizes performance
 Enables consistent, low latencies
 Optimized for data sharing and
memory access between all CPU
cores/threads for ideal memory
bandwidth and capacity
 Data flows scale efficiently for
2, 4 & 8+ socket configurations
 Designed for modern virtualized and
hybrid cloud implementations
Designedfornext-generationDataCenters
Ring Architecture Mesh Architecture
2009-2017+ New in 2017
Re-ArchitectedL2&L3CacheHierarchy
Shared L3
2.5MB/core
(inclusive)
Core
L2
(256KB private)
Core
L2
(256KB private)
Core
L2
(256KB private)
Shared L3
1.375MB/core
(non-inclusive)
Core
L2
(1MB private)
Core
L2
(1MB private)
Core
L2
(1MB private)
Previous Architectures
Intel® Xeon® Scalable Processor
Architecture
• On-chip cache balance shifted from shared-distributed (prior architectures) to private-local (Skylake architecture):
• Shared-distributed  shared-distributed L3 is primary cache
• Private-local  private L2 becomes primary cache with shared L3 used as overflow cache
• Shared L3 changed from inclusive to non-inclusive:
• Inclusive (prior architectures)  L3 has copies of all lines in L2
• Non-inclusive (Skylake architecture)  lines in L2 may not exist in L3
Skylake-SPcachehierarchyarchitectedspecificallyforDatacenterusecase
15
Intel®Xeon®ScalableProcessorsforTechnicalComputing(HPC)
powerfulandbalancedperformancefor
diversehpcworkloads
Powerful performance
 Up to 28 cores vs. 24 cores/22 cores (on Intel® Xeon® processor E7
v4 / Intel Xeon processor E5-2600 v4 families)
 Intel® AVX-512 delivers up to 2X FLOPs/clock-cycle peak
performance capability optimized for HPC, data analytics, and
cryptography workloads1
 New Intel® Mesh architecture with 3 Intel® Ultra Path Interconnect
lanes provides greater inter-CPU bandwidth for the most data-
hungry, latency-sensitive applications
Significantly increased memory and I/O bandwidth
 Up to 1.5x gen-to-gen memory bandwidth increase per CPU (6
memory channels) for extremely large compute- and data-intensive
workloads
 More IO bandwidth with 48 PCIe 3.0 lanes vs. 40 lanes on Intel Xeon
processor E5-2600 v4
 Intel® Optane™ and Intel® 3D NAND solid state drives deliver
industry-leading combination of high throughput, low latency, high
quality of service (QoS), and ultra high endurance6 to break data
access bottlenecks
integratedinterconnectfor
compellingefficiency
Integrated Intel® Omni-Path
Architecture designed for
today’s HPC systems
 Provides 100Gbps high-
bandwidth and low-latency fabric
for HPC clusters
 Reduces number of required
switches and lowers fabric costs7,
freeing up budget for up to 24%
more compute nodes8
 Denser 48-port switch chip
delivers a 33 percent increase9
over traditional InfiniBand switch,
resulting in power, space and
maintenance savings
convergedparallelprogramming
environmentforIntel®Xeon®scalable
processors&Intel®XeonPHi™processors
Highly integrated portfolio of
superior technologies and optimized
software tools ensures code
portability across IA solutions
 Intel AVX-512 enables converged
programming environment for Intel Xeon
Scalable Processor and Intel® Xeon Phi™
Processor compute nodes
 Intel® Modern Code Developer Program
enables the next decade of discovery
 Intel® Parallel Studio XE 2017 upgrades
developer toolkit for HPC and technical
computing
 Intel® HPC Orchestrator simplifies installation
and ongoing maintenance of HPC system
software stack
16
For footnotes and configurations, see slides 29-30.
17
Intel®AdvancedVectorExtensions-512(AVX-512)End Customer Value: Workload-optimized performance, throughput increases, and H/W-enhanced security
improvements for familiar analytics, HPC, video transcode, cryptography, and compression software.
Problems Solved:
1. Achieve more work per cycle (doubles width of data registers)
2. Minimize latency & overhead (doubles the number of registers) with ultra-wide (512-bit) vector processing capabilities
(that that 2x FMA processing engines are available on Intel® Xeon® Platinum and Intel® Xeon® Gold Processors)
Up to 2xFLOPS/clock cycle1
Segments ProofpointsValuepillars
Accelerates performance for your most demanding computational tasks
Up to 4xgreater throughput2
performance security
Cloud Service
Providers
Comms Service
Providers
* FLOPs = Floating Point Operations
1 Peak performance vs. Intel® AVX2. As measured by Intel® Xeon® Processor Scalable Family with Intel® AVX-512 compared to an Intel® Xeon® E5 v4 with Intel® AVX2
2 Vectorized floating-point throughput. As measured by Intel® Xeon® Processor Scalable Family with Intel® AVX-512 compared to an Intel® Xeon® E5 v4 with Intel® AVX2
Enterprise
• 512-bit wide vectors
• 32 operand registers
• 8 64b mask registers
• Embedded broadcast
• Embedded rounding
Microarchitecture Instruction Set SP FLOPs / cycle DP FLOPs / cycle
Skylake
Intel® AVX-512 &
FMA
64 32
Haswell / Broadwell Intel AVX2 & FMA 32 16
Sandybridge Intel AVX (256b) 16 8
Nehalem SSE (128b) 8 4
Intel AVX-512 Instruction Types
AVX-512-F AVX-512 Foundation Instructions
AVX-512-VL Vector Length Orthogonality : ability to operate on sub-512 vector sizes
AVX-512-BW 512-bit Byte/Word support
AVX-512-DQ Additional D/Q/SP/DP instructions (converts, transcendental support, etc.)
AVX-512-CD Conflict Detect : used in vectorizing loops with potential address conflicts
Powerfulinstructionsetfordata-parallelcomputation
18
Intel®AdvancedVectorExtensions-512(AVX-512)
PerformanceandEfficiencywithIntel®AVX-512
Source as of June 2017: Intel internal measurements on platform with Xeon Platinum 8180, Turbo enabled, UPI=10.4, SNC1, 6x32GB DDR4-2666 per CPU, 1 DPC. Software and workloads used in performance
tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products.
669
1178
2034
3259
760 768 791 767
3.1
2.8
2.5
2.1
0
0.5
1
1.5
2
2.5
3
3.5
0
500
1000
1500
2000
2500
3000
3500
SSE4.2 AVX AVX2 AVX512
CoreFrequency
GFLOPs,SystemPower
LINPACK Performance
GFLOPs Power (W) Frequency (GHz)
1.00
1.74
2.92
4.83
0.00
1.00
2.00
3.00
4.00
5.00
6.00
SSE4.2 AVX AVX2 AVX512
NormalizedtoSSE4.2
GFLOPs/Watt
GFLOPs / Watt
1.00
1.95
3.77
7.19
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
SSE4.2 AVX AVX2 AVX512
NormalizedtoSSE4.2
GFLOPs/GHz
GFLOPs / GHz
Intel®AVX-512deliverssignificantperformanceandefficiencygains
19
FaBRIC
OMNI-PATH
20
Intel® Omni-Path
Architecture
In 30 secs
21
The Interconnect Landscape: Why Intel® OPA?
1 Source: Internal analysis based on a 256-node to 2048-node clusters configured with Mellanox FDR and EDR InfiniBand products. Mellanox component pricing from www.kernelsoftware.com Prices as of November 3, 2015. Compute node pricing
based on Dell PowerEdge R730 server from www.dell.com. Prices as of May 26, 2015. Intel® OPA (x8) utilizes a 2-1 over-subscribed Fabric. Intel® OPA pricing based on estimated reseller pricing using projected Intel MSRP pricing on day of launch.
Performance
I/O struggling to keep up with
CPU innovation
Increasing Scale
From 10K nodes….to
200K+
Previous solutions reaching limits
of scalability, manageability and
reliability
Fabric: Cluster Budget1
Fabric an increasing % of HPC
hardware costs
21 3
SU14
1 2 3
SU15
1 2 3
SU16
1 2 3
SU17
1 2 3
SU18
1 2 3
SU10
1 2 3
SU11
1 2 3
SU12
1 2 3
SU13
1 2 3
SU05
1 2 3
SU06
1 2 3
SU07
1 2 3
SU08
1 2 3
SU09
1 2 3
SU01
1 2 3
SU02
1 2 3
SU03
1 2 3
SU04
1 2 3
Tomorrow
30 to 40%
Today
20%-30%
Goal: Keep cluster costs in check  maximize COMPUTE power per dollar
7
Intel® Omni-Path Architecture
The Future of High Performance Fabrics
Better Scaling vs EDR
48 Radix Chip Ports
Up to 26% More Servers than InfiniBand* EDR within the Same Budget1
Up to 60% Lower Power and Cooling Costs2
Configurable / Resilient
Job Prioritization (Traffic Flow Optimization)
No-Compromise Resiliency (Packet Integrity Protection and Dynamic Lane Scaling)
Market Adoption
>100 OEM and HPC Storage Vendor Offerings Expected for Platforms, Switches,
and Adapters3
Intel®
Omni-Path
Architecture
HPC’s
NextGeneration
Fabric
1. Assumes a 750-node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of 648-port director switches and 36-port edge switches. Mellanox
componentpricing from www.kernelsoftware.com, with prices as of November 3, 2015.Computenode pricing based onDellPowerEdge R730 server from www.dell.com,with prices as of May 26,2015.Intel®OPA pricing based onestimated resellerpricing based on Intel MSRP pricing on ark.intel.com. 2. Assumes a 750-
node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of director switches and edge switches. Mellanox power data based on Mellanox CS7500
DirectorSwitch, MellanoxSB7700/SB7790Edgeswitch, and MellanoxConnectX-4VPI adapter card installation documentationposted on www.mellanox.comas ofNovember 1,2015. IntelOPA power databased on productbriefs postedon www.intel.comasofNovember16, 2015.Intel®OPA pricing based onestimated
reseller pricing based on Intel MSRP pricing on ark.intel.com. 3. Intel internal information. Design win count based on OEM and HPC storage vendors who are planning to offer either Intel-branded or custom switch products, along with the total number of OEM platforms that are currently planned to support custom
and/or standardIntel®OPA adapters. Design win countas ofNovember 1,2015 and subjectto changewithout noticebased on vendorproductplans.*Othernamesand brands maybe claimed as property of others.
Intel® Scalable
System Framework
600
500
400
300
200
100
0
SwitchChipsRequired
Nodes
Intel® OPA
48-port switch
InfiniBand*
36-port switch
FEWER
SWITCHES
REQUIRED
1. Assumes a 750-node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of 648-port director switches and 36-port edge switches. Mellanox
component pricing from www.kernelsoftware.com, with prices as of November 3, 2015. Compute node pricing based on Dell PowerEdge R730 server from www.dell.com, with prices as of May 26, 2015. Intel® OPA pricing based on estimated reseller pricing based on Intel MSRP pricing on ark.intel.com. 2. Assumes a 750-
node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of director switches and edge switches. Mellanox power data based on Mellanox CS7500
Director Switch, Mellanox SB7700/SB7790 Edge switch, and Mellanox ConnectX-4 VPI adapter card installation documentation posted on www.mellanox.com as of November 1, 2015. Intel OPA power data based on product briefs posted on www.intel.com as of November 16, 2015. Intel® OPA pricing based on estimated
resellerpricing based onIntelMSRP pricing onark.intel.com.3Numberof switch chips required, switch density,and fabric scalability are based ona fullbisectional bandwidth (FBB) Fat-Tree configuration,using a48-portswitch for Intel®Omni-PathArchitectureand 36-portswitchASICforeither Mellanoxor Intel® True
ScaleFabric. *Othernamesand brands maybe claimed asthe property ofothers. 2.3Xfabric scalability based on a27,648-nodeclusterconfiguredwith the Intel®Omni-Path Architectureusing48-portswitch ASICs,ascompared with a36-port switch chip thatcansupport upto11,664 nodes.
26%More
Servers
than EDR1
60%Lower
Cooling
Costs2
2.3XGreater
Fabric
Scalability3
7
Intel® Omni-Path Architecture
HPC’s Next-Generation Fabric Intel® Scalable
System Framework
Intel® Omni-Path
Architecture
Xeon Phi™
Processor-F
(KNL-F)
Maximizing Support for Heterogeneous Clusters
Intel Xeon
Processor
(HSW, BDW
& SKL)
PCI
Card
Xeon Phi™
Processor
(KNL)
HFI
Greater flexibility for creating compute islands depending on user requirements
24
WFR HFI
Intel Xeon
Processor-F
(SKL-F)
HFI
WFR HFI
Intel Xeon
Processor-F
(SKL-F)
HFI
GPU GPU
GPU memory GPU memory
PCI bus
Intel Xeon
Processor
(SKL)
GPU Direct v3 provided in
Intel® OPA 10.3 release
PCI
Card
PCI
Card
WFR HFI
Intel® Omni-Path
Architecture
Next Up for Intel® OPA: Artificial Intelligence
Intel offers a complete AI Portfolio
 From CPUs to software to computer vision to
libraries and tools
Intel® OPA offers breakthrough
performance on scale-out apps
 Low latency
 High bandwidth
 High message rate
 GPU Direct RDMA support
 Xeon Phi Integration
25
Things
&devices
Cloud
DATACenter
Accelerant
Technologies
World-class interconnect solution for shorter time to train
Intel® Omni-Path
Architecture
NVMe* over OPA
Intel® OPA + Intel® SSD and Optane™
Technology
 High Endurance
 Low latency
 High Efficiency
 Complete NVMe over Fabric Solution
NVMe-over-OPA status
 Supported in 10.4.3 IFS release
 Compliant with NVMeF spec 1.0
Target and Host system configuration: 2 x Intel® Xeon® CPU E5-2699 v3 @ 2.30Ghz, Intel® Server Board S2600WT, 128GB DDR4, CentOS 7.3.1611, kernel 4.10.12, IFS 10.4.1, NULL-
BLK, FIO 2.19 options hfi1 krcvqs=8 sge_copy_mode=2 wss_threshold=70
26
*Other names and brands may be claimed as the property of others.
Only Intel is delivering a total NVMe over Fabric solution!
NVMe Host
Driver
RDMA
Transport
Intel®
OPA HFI
NVMe Host
Driver
NVMe Target
Driver
RDMA
Transport
NVMe
Storage
Intel®
OPA HFI
Host Target
PCIe
Transport
~1.5M 4k Random IOPS
99% Bandwidth Efficiency
STORAGE
OPTANE
27
9
Tighter System-Level Integration
Innovative Memory-Storage Hierarchy
*cache, memory or hybrid mode
Compute
Node
Processor
Memory Bus
I/O Node
Remote
Storage
Compute
Today
Caches
Local Memory
Local Storage
Parallel File System
(Hard Drive Storage)
HigherBandwidth.
LowerLatencyandCapacity
Much larger memory capacities
keep data in local memory
Local memory is now faster
& in processor package
Compute
Future
Caches
Intel® DIMMs based on
3D XPoint™ Technology
Burst Buffer Node with
Intel® Optane™ Technology SSDs
Parallel File System
(Hard Drive Storage)
On-Package High
Bandwidth Memory*
SSD Storage
Intel® Optane™ Technology
SSDsI/O Node storage moves
to compute node
Some remote data moves
onto I/O node
Local
Memory
Intel® Scalable
System Framework
4
Bridging the Memory-Storage Gap
Intel® Optane™ Technology Based on 3D XPoint™
SSD
Intel® Optane™ SSDs 5-7x Current Flagship
NAND-Based SSDs (IOPS)1
DRAM-like performance
Intel® DIMMs Based on 3D-XPoint™
1,000x Faster than NAND1
1,000x the Endurance of NAND2
Hard drive capacities
10x More Dense than Conventional
Memory3
1Performancedifferencebased oncomparison between 3DXPoint™ Technologyandother industryNAND
2Densitydifference based oncomparison between 3DXPoint™ Technologyandother industryDRAM
2Endurancedifference based oncomparison between 3DXPoint™ Technologyandother industryNAND
Intel® Scalable
System Framework
30NVM SOLUTIONS GROUP 30NVM SOLUTIONS GROUP
Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products
against internal Intel specifications. Intel® Optane™ SSD prototype compared to the Intel® SSD DC P3700 Series (NAND)
Intel® Optane™ SSDs for Data Center
Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications.
Intel® Optane™ SSD prototype compared to the Intel® SSD DC P3700 Series (NAND)
=
Ultra-high
Endurance
Responsive Under Load
Low Latency
Predictably Fast Service
QoS
Breakthrough
Performance
IOPS
NVM Solutions Group 31
Intel® Optane™ SSD Use Cases
DRAM
PCIe*
PCIe
Intel® 3D NAND SSDs
Intel®
Optane™ SSD
Fast Storage and Cache
Intel®
Xeon®
‘memory
pool’DRAM
PCIe
Intel® 3D NAND SSDs
Intel® Optane™
SSD
DDR
DDR
PCIe
Extend Memory
Intel®
Xeon®
*Other names and brands may be claimed as the property of others
NVM Solutions Group 32
5-8x faster at low Queue
Depths1
Vast majority of applications
generate low QD storage
workloads
1. Common Configuration - Intel 2U Server System, OS CentOS 7.2, kernel 3.10.0-327.el7.x86_64, CPU 2 x Intel® Xeon® E5-2699 v4 @ 2.20GHz (22 cores), RAM 396GB DDR @ 2133MHz. Configuration – Intel® Optane™ SSD
DC P4800X 375GB and Intel® SSD DC P3700 1600GB. Performance – measured under 4K 70-30 workload at QD1-16 using fio-2.15.
Breakthrough Performance
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance.
NVM Solutions Group 33
up to 60x better at 99% QoS1
Ideal for critical applications
with aggressive latency
requirements
1. Common Configuration – Intel 2U Server System, OS CentOS 7.2, kernel 3.10.0-327.el7.x86_64, CPU 2 x Intel® Xeon® E5-2699 v4 @ 2.20GHz (22 cores), RAM 396GB DDR @ 2133MHz. Configuration – Intel® Optane™ SSD
DC P4800X 375GB and Intel® SSD DC P3700 1600GB. QoS – measures 99% QoS under 4K 70-30 workload at QD1 using fio-2.15.
Predictably Fast Service
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance.
NVM Solutions Group 34
Ultra Endurance
MLC/TLC
2D/3D NAND SSD
Intel® Optane™ SSD
Endurance
(DWPD)
0.5
3
30
Up to 10x more Total Bytes
Written at similar capacity1
Architected for endurance scaling
 ‘Write in place’ technology
 Non-destructive write process
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance.
1. Comparing projected Intel® Optane™ SSD 750GB specifications to actual Intel® SSD DC P4600 1.6TB specifications.
Total Bytes Written (TBW) calculated by multiplying specified or projected DWPD x specified or projected warranty duration x 365 days/year.
NVM Solutions Group
AI:
NERVANA
35
36
By2020…
The average internet user will generate
~1.5GBoftrafficperday
Smart hospitals will generate over
3,000GBperday
Self driving cars will be generating over
4,000GBperday…each
All numbers are approximated
http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html
https://datafloq.com/read/self-driving-cars-create-2-petabytes-data-annually/172
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html
A connected plane will generate over
40,000GBperday
A connected factory will generate over
1,000,000GBperday
radar ~10-100KB persecond
sonar ~10-100KB persecond
gps ~50KB persecond
lidar ~10-70MB persecond
cameras ~20-40MB persecond
Self driving cars will generate over
4,000GBperday…each
Thecomingfloodofdata
37
Analyticsneedsai
Hindsight
What Happened
Insight
What Happened and Why
Foresight
What Will Happen,
When, and Why
Simulation-Driven Analysis
and Decision-Making
Self-Learning and Completely Automated Enterprise
Mature Data Lake
Computerized Human Thought Simulation and Actions
Towards Autonomic Enterprise
Descriptive
Analytics
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
Cognitive
Analytics
AI
is a large category
all on its own,
and a vital tool for
reaching higher
maturity & scale
data analytics
Advanced Analytics
Operational Analytics
TodayEmerging
38
AIComputeCycleswillgrow by202012X
mainframes Standards-
basedservers
Cloud
computing
Artificial
intelligence
Source: Intel forecast
Thenextbigwave
Datadeluge
COMPUTEbreakthrough
Innovationsurge
39
MACHINE/DEEPLEARNING
REASONINGSYSTEMS
TOOLS&STANDARDS
COMPUTERVISION
Programmablesolutions
Memory/storage
Networking
communications 5G
Things
&devices
Cloud
DATACenter
Accelerant
Technologies
…
End-to-endai
Intel Has a Complete End-to-End Portfolio
40
IntelStrategy:OptimizedDeepLearningEnvironment
Fuel the development of vertical solutions
Deliver best single node and multi-node
performance
Accelerate design, training, and deployment
Drive optimizations across open source
machine learning frameworks
Nervana Cloud™
Maximum performance on Intel architectureIntel® Math Kernel
Library (Intel® MKL)
Training Inference
Intel® MKL-DNN
Intel®
Nervana™
Graph
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
41
✝Codename for product that is coming soon
All performance positioning claims are relative to other processor technologies in Intel’s AI datacenter portfolio
*Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc.
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
AI Datacenter
Allpurpose Highly-parallel Flexibleacceleration DeepLearning
Crest
Family✝
Deeplearningbydesign
Scalable acceleration with
best performance for
intensive deep learning
training & inference
Intel®
FPGA
EnhancedDLInference
Scalable acceleration for deep
learning inference in real-time
with higher efficiency, and
wide range of workloads &
configurations
Intel® Xeon®
Processor Family
Training&Inference
Scalable performance for
widest variety of AI & other
datacenter workloads –
including deep learning
training & inference
Intel® Xeon Phi™
Processor (Knights Mill✝)
FasterDLTraining
Scalable performance
optimized for even faster
deep learning training and
select highly-parallel
datacenter workloads*
✝
MostagileAIplatform
Intel®Xeon®ScalableprocessorsforAI
Scalable performance for widest variety of AI & other datacenter workloads – including deep learning
Built-inROI
Begin your AI journey today using
existing, familiar infrastructure
Potentperformance
Train in days HOURS with up to 113X2 perf
vs. Intel Xeon E5 v3 (2.2x excluding optimized SW1)
Production-ready
Robust support for full range of
AI deployments
1,2Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016. Optimization Notice: Intel's compilers may or may not
optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice Revision #20110804. See slide 15 for configuration details.
42
4343
Intel®Xeon®Inference&trainingperformance
INFERENCE THROUGHPUT
Up to
2.4x
Intel® Xeon® Platinum 8180 Processor
higher Neon ResNet 18 inference throughput
compared to
Intel® Xeon® Processor E5-2699 v4
TRAINING THROUGHPUT
Up to
2.2x
Intel® Xeon® Platinum 8180 Processor
higher Neon ResNet 18 training throughput
compared to
Intel® Xeon® Processor E5-2699 v4
Advance previous generation AI workload performance with Intel® Xeon® Scalable Processors
Inference throughput batch size: 1 Training throughput batch size: 256 Configuration Details on Slide: 18, 20 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products. For more complete information visit http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations
include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher.
4444
Intel®Xeon®PlatformPerformance
INFERENCE THROUGHPUT
Up to
138x
Intel® Xeon® Platinum 8180 Processor
higher Intel optimized Caffe GoogleNet v1 with Intel® MKL
inference throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
INFERENCE using FP32 Batch Size Caffe GoogleNet v1 256 AlexNet 256 Configuration Details on Slide: 18, 25
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause
the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of
June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,
functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
TRAINING THROUGHPUT
Up to
113x
Intel® Xeon® Platinum 8180 Processor
higher Intel Optimized Caffe AlexNet with Intel® MKL
training throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
Deliver significant AI performance with hardware and software optimizations on Intel® Xeon® Scalable Processors
Optimized
Frameworks
Optimized Intel®
MKL Libraries
Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher.
Hardware plus optimized software
45
Scalable performance
optimized for even faster deep
learning training and select
highly-parallel datacenter
workloads*
Intel®XeonPhi™processor(KnightsMill)
 Delivers up to 4Xdeep learning
performance over Knights Landing✝
 New instructions sets deliver enhanced
lower precision performance
 Time-to-train reduction is the primary
benchmark to judge deep learning
training performance
 Direct access of up to 400 GB of memory
with no PCIe performance lag (vs.
GPU:16GB)
 Efficient scaling further reduces time-to-
train when utilizing scaled Knights Mill
systems
 Up to 400Xdeep learning performance
on existing HW via Intel SW optimization
 Share deep learning software investments
across Intel Platforms via Intel deep
learning software tools
 Binary-compatible with Intel® Xeon®
processor
Fastertime-to-train Efficientscaling Futureready
✝Knights Landing is the former codename for the Intel® Xeon Phi™ processor family that was released in 2016
Configuration details on final slides
*Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations
and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when
combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice Revision #20110804
Faster
DLTraining
Highly-parallel
46
Deeplearning
Bydesign
Scalable acceleration with best
performance for intensive deep
learning training & inference,
period
Crestfamily
 Unprecedented compute density
 Large reduction in time-to-train
 32 GB of in package memory via
HBM2 technology
 8 Tera-bits/s of memory access
speed
 12 bi-directional high-bandwidth
links
 Seamless data transfer
via interconnects
Customhardware Blazingdataaccess High-speedscalability
1Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations
and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined
with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice Revision #20110804
2017
47
optimizedforIntelarchitecture
BigDL MLliB
Aiframeworks
and more frameworks enabled via Intel® Nervana™ Graph (future)
See Roadmap
for availability
Other names and brands may be claimed as the property of others.
Intel®'s reference deep
learning framework
committed to best
performance on all
hardware
intelnervana.com/neon
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Architecture
NVM Solutions Group 49
Legal Notices and Disclaimers
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation.
Learn more at intel.com, or from the OEM or retailer.
No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will
affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and
configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
This document contains information on products, services and/or processes in development. All information provided here is subject to change
without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web
site and confirm whether referenced data are accurate.
Intel, the Intel logo, Xeon, Intel Optane, and 3D XPoint are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2017 Intel Corporation.

Weitere ähnliche Inhalte

Was ist angesagt?

Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Igor José F. Freitas
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
Application Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsApplication Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsIT Brand Pulse
 
Intel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 OverviewIntel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 OverviewPauline Nist
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSTyrone Systems
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryMemVerge
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
SUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing ArchitectureSUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing ArchitectureIntel IT Center
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardwareinside-BigData.com
 
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...Amazon Web Services
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...Anand Haridass
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...Alluxio, Inc.
 

Was ist angesagt? (20)

Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
Application Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsApplication Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster Interconnects
 
Intel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 OverviewIntel Itanium Hotchips 2011 Overview
Intel Itanium Hotchips 2011 Overview
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big Memory
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
SUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing ArchitectureSUPERMICRO Innovative Computing Architecture
SUPERMICRO Innovative Computing Architecture
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
 

Andere mochten auch

LinuxKit and OpenOverlay
LinuxKit and OpenOverlayLinuxKit and OpenOverlay
LinuxKit and OpenOverlayMoby Project
 
Libnetwork updates
Libnetwork updatesLibnetwork updates
Libnetwork updatesMoby Project
 
Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...
Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...
Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...Benoit Combemale
 
HPC DAY 2017 | Prometheus - energy efficient supercomputing
HPC DAY 2017 | Prometheus - energy efficient supercomputingHPC DAY 2017 | Prometheus - energy efficient supercomputing
HPC DAY 2017 | Prometheus - energy efficient supercomputingHPC DAY
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Dmitry Alexandrov
 
Database Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best PracticesDatabase Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best PracticesMariaDB plc
 
Latency tracing in distributed Java applications
Latency tracing in distributed Java applicationsLatency tracing in distributed Java applications
Latency tracing in distributed Java applicationsConstantine Slisenka
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsArnon Shimoni
 
Design patterns in Java - Monitis 2017
Design patterns in Java - Monitis 2017Design patterns in Java - Monitis 2017
Design patterns in Java - Monitis 2017Arsen Gasparyan
 
Getting Started with Embedded Python: MicroPython and CircuitPython
Getting Started with Embedded Python: MicroPython and CircuitPythonGetting Started with Embedded Python: MicroPython and CircuitPython
Getting Started with Embedded Python: MicroPython and CircuitPythonAyan Pahwa
 
세션1. block chain as a platform
세션1. block chain as a platform세션1. block chain as a platform
세션1. block chain as a platformJay JH Park
 
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...ScyllaDB
 
Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...
Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...
Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...Ontico
 
Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)
Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)
Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)Ontico
 
Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)Ontico
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 

Andere mochten auch (20)

LinuxKit and OpenOverlay
LinuxKit and OpenOverlayLinuxKit and OpenOverlay
LinuxKit and OpenOverlay
 
Libnetwork updates
Libnetwork updatesLibnetwork updates
Libnetwork updates
 
Raspberry home server
Raspberry home serverRaspberry home server
Raspberry home server
 
Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...
Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...
Model Simulation, Graphical Animation, and Omniscient Debugging with EcoreToo...
 
HPC DAY 2017 | Prometheus - energy efficient supercomputing
HPC DAY 2017 | Prometheus - energy efficient supercomputingHPC DAY 2017 | Prometheus - energy efficient supercomputing
HPC DAY 2017 | Prometheus - energy efficient supercomputing
 
Java on the GPU: Where are we now?
Java on the GPU: Where are we now?Java on the GPU: Where are we now?
Java on the GPU: Where are we now?
 
Database Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best PracticesDatabase Security Threats - MariaDB Security Best Practices
Database Security Threats - MariaDB Security Best Practices
 
Latency tracing in distributed Java applications
Latency tracing in distributed Java applicationsLatency tracing in distributed Java applications
Latency tracing in distributed Java applications
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
 
Design patterns in Java - Monitis 2017
Design patterns in Java - Monitis 2017Design patterns in Java - Monitis 2017
Design patterns in Java - Monitis 2017
 
Getting Started with Embedded Python: MicroPython and CircuitPython
Getting Started with Embedded Python: MicroPython and CircuitPythonGetting Started with Embedded Python: MicroPython and CircuitPython
Getting Started with Embedded Python: MicroPython and CircuitPython
 
An Introduction to OMNeT++ 5.1
An Introduction to OMNeT++ 5.1An Introduction to OMNeT++ 5.1
An Introduction to OMNeT++ 5.1
 
Drive into calico architecture
Drive into calico architectureDrive into calico architecture
Drive into calico architecture
 
Vertx
VertxVertx
Vertx
 
세션1. block chain as a platform
세션1. block chain as a platform세션1. block chain as a platform
세션1. block chain as a platform
 
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
 
Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...
Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...
Пиксельные шейдеры для Web-разработчиков. Программируем GPU / Денис Радин (Li...
 
Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)
Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)
Key transparency: Blockchain meets NoiseSocket / Алексей Ермишкин (Virgil)
 
Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 

Ähnlich wie HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Architecture

E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case Intel IT Center
 
Introduction to container networking in K8s - SDN/NFV London meetup
Introduction to container networking in K8s - SDN/NFV  London meetupIntroduction to container networking in K8s - SDN/NFV  London meetup
Introduction to container networking in K8s - SDN/NFV London meetupHaidee McMahon
 
AWS Summit Singapore - Make Business Intelligence Scalable and Adaptable
AWS Summit Singapore - Make Business Intelligence Scalable and AdaptableAWS Summit Singapore - Make Business Intelligence Scalable and Adaptable
AWS Summit Singapore - Make Business Intelligence Scalable and AdaptableAmazon Web Services
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleIntel IT Center
 
Xeon E5 Making the Business Case PowerPoint
Xeon E5 Making the Business Case PowerPointXeon E5 Making the Business Case PowerPoint
Xeon E5 Making the Business Case PowerPointIntel IT Center
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developersMichelle Holley
 
Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Ceph Community
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureJim St. Leger
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chipinside-BigData.com
 
Crooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumCrooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumAlan Frost
 
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...Amazon Web Services
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyIntel IT Center
 
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...Edge AI and Vision Alliance
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...Intel IT Center
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewDESMOND YUEN
 
Cloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process PhaseCloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process Phasefinteligent
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel IT Center
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Intel IT Center
 
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011Pauline Nist
 
Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Intel® Software
 

Ähnlich wie HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Architecture (20)

E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
 
Introduction to container networking in K8s - SDN/NFV London meetup
Introduction to container networking in K8s - SDN/NFV  London meetupIntroduction to container networking in K8s - SDN/NFV  London meetup
Introduction to container networking in K8s - SDN/NFV London meetup
 
AWS Summit Singapore - Make Business Intelligence Scalable and Adaptable
AWS Summit Singapore - Make Business Intelligence Scalable and AdaptableAWS Summit Singapore - Make Business Intelligence Scalable and Adaptable
AWS Summit Singapore - Make Business Intelligence Scalable and Adaptable
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
 
Xeon E5 Making the Business Case PowerPoint
Xeon E5 Making the Business Case PowerPointXeon E5 Making the Business Case PowerPoint
Xeon E5 Making the Business Case PowerPoint
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
Crooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumCrooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinum
 
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
Extend HPC Workloads to Amazon EC2 Instances with Intel and Rescale (CMP373-S...
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
 
Cloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process PhaseCloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process Phase
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
 
Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013Inside story on Intel Data Center @ IDF 2013
Inside story on Intel Data Center @ IDF 2013
 
Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011Accelerating Mission Critical Transformation at Red Hat Summit 2011
Accelerating Mission Critical Transformation at Red Hat Summit 2011
 
Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop Microsoft Build 2019- Intel AI Workshop
Microsoft Build 2019- Intel AI Workshop
 

Kürzlich hochgeladen

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 

Kürzlich hochgeladen (20)

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Architecture

  • 1. © Copyright 2017 Intel Corporation Atanas Atanasov
  • 2. Intel Confidential | NDA Required Intel technologies may require enabled hardware, specific software, or services activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. For more information go to http://www.intel.com/performance. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. No computer system can be absolutely secure. Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. Intel, the Intel logo, Xeon, Intel vPro, Intel Xeon Phi, Look Inside., are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. © 2017 Intel Corporation. LegalDisclaimers 2
  • 3. Disclaimers Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. The cost reduction scenarios described are intended to enable you to get a better understanding of how the purchase of a given Intel based product, combined with a number of situation-specific variables, might affect future costs and savings. Circumstances will vary and there may be unaccounted-for costs related to the use and deployment of a given product. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs or cost reduction. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. No computer system can be absolutely secure. Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo. Intel processors of the same SKU may vary in frequency or power as a result of natural variability in the production process. SPEC, SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). © 2016 Intel Corporation. Intel, the Intel logo, Xeon, Xeon Phi, Xeon Phi logos and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 3Intel Confidential
  • 4. 4 Agenda • Challenges in HPC/AI and SSF • Compute: Xeon Scalable Family • Fabric: Omni-Path • Storage: Optane • AI: Nervana
  • 5. 2 HPCisFoundationaltoInsight Aerospace Biology Brain Modeling Chemistry/Chemical Engineering Climate Computer Aided Engineering Cosmology Cybersecurity Defense Pharmacology Particle Physics Metallurgy Manufacturing / Design Life Sciences Government Lab Geosciences / Oil & Gas Genomics Fluid Dynamics 1Source: IDC HPC and ROI Study Update (September 2015) 2Source: IDC 2015 Q1 World Wide x86 Sever Tracker vs IDC 2015 Q1 World Wide HPC Sever Tracker DigitalContentCreationEDAEconomics/FinancialServicesFraudDetection SocialSciences;Literature,linguistics,marketingUniversityAcademicWeather Business Innovation A New Science Paradigm Fundamental Discovery High ROI: $515 Average Return Per $1 of HPC Investment1 Advancing Science And Our Understanding of the Universe Data-Driven Analytics Joins Theory, Experimentation, and Computational Science
  • 6. 2 Growing Challenges in HPC “The Walls” System Bottlenecks Memory | I/O | Storage Energy Efficient Performance Space | Resiliency | Unoptimized Software Divergent Infrastructure Barriers to Extending Usage Resources Split Among Modeling and Simulation | Big Data Analytics | Machine Learning | Visualization HPC Optimized Democratization at Every Scale | Cloud Access | Exploration of New Parallel Programming Models Big Datahpc Machine learning visualization
  • 7. 11 What Makes a Great HPC Solution? Parallel File SystemSwitch Fabric Login and Management Nodes . . . Actual configurations depend on specific OEM offerings and implementation. Intel® Omni-Path Fabric 1GbE for administration IBA 10/40 GbE Networking Gateways Intel® Software Tools Intel® Parallel Studio Intel® Node Manager Intel® Trace Analyzer I/O Nodes Intel® Networking Intel® Omni-Path Fabric Intel® Silicon Photonics Burst Buffer Intel® Xeon® Processors Intel® Omni-Path Fabric Intel® Optane™ Technology Compute Nodes Intel® Compute Intel® Xeon Phi™ Processors Intel® Xeon® Processors Intel® Optane™ Technology Intel® Omni-Path Fabric Intel® Solutions for Lustre* Intel® Enterprise Edition for Lustre* Intel® Foundation Edition for Lustre* Intel® Cloud Edition for Lustre* Reference Architecture Intel® Cluster Ready Intel® Scalable System Framework
  • 8. 3 A Holistic Architectural Approach is Required Compute Memory Fabric Storage PERFORMANCEICAPABILITY TIME System Software Innovative Technologies Tighter Integration Application Modernized Code Community ISV Proprietary System Memory Cores Graphics Fabric FPGA I/O
  • 9. 5 Intel® Scalable System Framework A Holistic Design Solution for All HPC Needs Small Clusters Through Supercomputers Compute and Data-Centric Computing Standards-Based Programmability On-Premise and Cloud-Based Intel® Xeon® Processors Intel® Xeon Phi™ Processors Intel® Xeon Phi™ Coprocessors Intel® Server Boards and Platforms Intel® Solutions for Lustre* Intel® Optane™ Technology 3D XPoint™ Technology Intel® SSDs Intel® Omni-Path Architecture Intel® True Scale Fabric Intel® Ethernet Intel® Silicon Photonics HPC System Software Stack Intel® Software Tools Intel® Cluster Ready Program Intel Supported SDVis Compute Memory/Storage Fabric Software Intel Silicon Photonics
  • 11. Intel®Xeon®ScalableplatformThe foundation of Data Center Innovation: Agile & Trusted Infrastructure delivers1.65xaverageperformanceboostoverpriorGeneration1 11 1 Up to 1.65x Geomean based on Normalized Generational Performance going from Intel® Xeon® processor E5-26xx v4 to Intel® Xeon® Scalable processor (estimated based on Intel internal testing of OLTP Brokerage, SAP SD 2-Tier, HammerDB, Server-side Java, SPEC*int_rate_base2006, SPEC*fp_rate_base2006, Server Virtualization, STREAM* triad, LAMMPS, DPDK L3 Packet Forwarding, Black-Scholes, Intel Distribution for LINPACK Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase. Performance Pervasive through compute, storage, and network Agility Rapid service delivery Security Pervasive data security with near zero performance overhead
  • 12. 12 Typical2-socketconfiguration CPU x8 CPU x8x4 x4 DMI 2 Intel® QPI Intel Xeon E5 v4 (2016) Purley (2017) PCIe*  Four DDR4 memory channels  up to 24 DIMMs  Up to 80 PCIe lanes  Two QPI links (up to 9.6 GT/s)  Six DDR4 memory channels  up to 24 DIMMs  Up to 96 PCIe lanes  Two UPI links (up to 10.4 GT/s); up to 3 UPI links in 4S and 8S configurations  Integrated Intel® Omni-Path Architecture (Fabric) DDR4 DIMMs PCIe* uplink connection for Intel® QuickAssist Technology and Intel® Ethernet** CPU Intel® UPI LBG DMI 3x16 PCIe* 1x100G Intel® OP Fabric x4 3x16 PCIe* 1x100G Intel® OP Fabric CPU ** Intel Xeon Scalable (2017)
  • 13. 13 INTEL®XEON®SCALABLEprocessors TheFoundationforAgile,Secure,Workload-OptimizedHybridCloud MAINSTREAM Good LightTASKS SCALABLEPERFORMANCE ATLOWPOWER ENTRY SCALABLEPERFORMANCE HARDWARE-ENHANCEDSECURITY STANDARDRASSTANDARDRAS MODERATETASKS INTEL®TURBOBOOSTTECHNOLOGYAND INTEL®HYPER-THREADINGTECHNOLOGY FORMODERATEWORKLOADS FORLIGHTWORKLOADS 22CORESUPTO SOCKET SUPPORT2&4 3UPTO UPILINKS RELIABILITY,AVAILABILITY ANDSERVICEABILITYADVANCED 28CORESUPTO SOCKET SUPPORT8 1.5TBTOPLINEMEMORY CHANNELBANDWIDTH 3LINKS UPIUP 2,4& 2666DDR4 M H Z WITH UPTO TO WITH UPTO HIGHESTACCELERATOR THROUGHPUT ENTRYEfficient ENTRYPERFORMANCE,PriceSensitive
  • 14. 14  Maximizes performance  Enables consistent, low latencies  Optimized for data sharing and memory access between all CPU cores/threads for ideal memory bandwidth and capacity  Data flows scale efficiently for 2, 4 & 8+ socket configurations  Designed for modern virtualized and hybrid cloud implementations Designedfornext-generationDataCenters Ring Architecture Mesh Architecture 2009-2017+ New in 2017
  • 15. Re-ArchitectedL2&L3CacheHierarchy Shared L3 2.5MB/core (inclusive) Core L2 (256KB private) Core L2 (256KB private) Core L2 (256KB private) Shared L3 1.375MB/core (non-inclusive) Core L2 (1MB private) Core L2 (1MB private) Core L2 (1MB private) Previous Architectures Intel® Xeon® Scalable Processor Architecture • On-chip cache balance shifted from shared-distributed (prior architectures) to private-local (Skylake architecture): • Shared-distributed  shared-distributed L3 is primary cache • Private-local  private L2 becomes primary cache with shared L3 used as overflow cache • Shared L3 changed from inclusive to non-inclusive: • Inclusive (prior architectures)  L3 has copies of all lines in L2 • Non-inclusive (Skylake architecture)  lines in L2 may not exist in L3 Skylake-SPcachehierarchyarchitectedspecificallyforDatacenterusecase 15
  • 16. Intel®Xeon®ScalableProcessorsforTechnicalComputing(HPC) powerfulandbalancedperformancefor diversehpcworkloads Powerful performance  Up to 28 cores vs. 24 cores/22 cores (on Intel® Xeon® processor E7 v4 / Intel Xeon processor E5-2600 v4 families)  Intel® AVX-512 delivers up to 2X FLOPs/clock-cycle peak performance capability optimized for HPC, data analytics, and cryptography workloads1  New Intel® Mesh architecture with 3 Intel® Ultra Path Interconnect lanes provides greater inter-CPU bandwidth for the most data- hungry, latency-sensitive applications Significantly increased memory and I/O bandwidth  Up to 1.5x gen-to-gen memory bandwidth increase per CPU (6 memory channels) for extremely large compute- and data-intensive workloads  More IO bandwidth with 48 PCIe 3.0 lanes vs. 40 lanes on Intel Xeon processor E5-2600 v4  Intel® Optane™ and Intel® 3D NAND solid state drives deliver industry-leading combination of high throughput, low latency, high quality of service (QoS), and ultra high endurance6 to break data access bottlenecks integratedinterconnectfor compellingefficiency Integrated Intel® Omni-Path Architecture designed for today’s HPC systems  Provides 100Gbps high- bandwidth and low-latency fabric for HPC clusters  Reduces number of required switches and lowers fabric costs7, freeing up budget for up to 24% more compute nodes8  Denser 48-port switch chip delivers a 33 percent increase9 over traditional InfiniBand switch, resulting in power, space and maintenance savings convergedparallelprogramming environmentforIntel®Xeon®scalable processors&Intel®XeonPHi™processors Highly integrated portfolio of superior technologies and optimized software tools ensures code portability across IA solutions  Intel AVX-512 enables converged programming environment for Intel Xeon Scalable Processor and Intel® Xeon Phi™ Processor compute nodes  Intel® Modern Code Developer Program enables the next decade of discovery  Intel® Parallel Studio XE 2017 upgrades developer toolkit for HPC and technical computing  Intel® HPC Orchestrator simplifies installation and ongoing maintenance of HPC system software stack 16 For footnotes and configurations, see slides 29-30.
  • 17. 17 Intel®AdvancedVectorExtensions-512(AVX-512)End Customer Value: Workload-optimized performance, throughput increases, and H/W-enhanced security improvements for familiar analytics, HPC, video transcode, cryptography, and compression software. Problems Solved: 1. Achieve more work per cycle (doubles width of data registers) 2. Minimize latency & overhead (doubles the number of registers) with ultra-wide (512-bit) vector processing capabilities (that that 2x FMA processing engines are available on Intel® Xeon® Platinum and Intel® Xeon® Gold Processors) Up to 2xFLOPS/clock cycle1 Segments ProofpointsValuepillars Accelerates performance for your most demanding computational tasks Up to 4xgreater throughput2 performance security Cloud Service Providers Comms Service Providers * FLOPs = Floating Point Operations 1 Peak performance vs. Intel® AVX2. As measured by Intel® Xeon® Processor Scalable Family with Intel® AVX-512 compared to an Intel® Xeon® E5 v4 with Intel® AVX2 2 Vectorized floating-point throughput. As measured by Intel® Xeon® Processor Scalable Family with Intel® AVX-512 compared to an Intel® Xeon® E5 v4 with Intel® AVX2 Enterprise
  • 18. • 512-bit wide vectors • 32 operand registers • 8 64b mask registers • Embedded broadcast • Embedded rounding Microarchitecture Instruction Set SP FLOPs / cycle DP FLOPs / cycle Skylake Intel® AVX-512 & FMA 64 32 Haswell / Broadwell Intel AVX2 & FMA 32 16 Sandybridge Intel AVX (256b) 16 8 Nehalem SSE (128b) 8 4 Intel AVX-512 Instruction Types AVX-512-F AVX-512 Foundation Instructions AVX-512-VL Vector Length Orthogonality : ability to operate on sub-512 vector sizes AVX-512-BW 512-bit Byte/Word support AVX-512-DQ Additional D/Q/SP/DP instructions (converts, transcendental support, etc.) AVX-512-CD Conflict Detect : used in vectorizing loops with potential address conflicts Powerfulinstructionsetfordata-parallelcomputation 18 Intel®AdvancedVectorExtensions-512(AVX-512)
  • 19. PerformanceandEfficiencywithIntel®AVX-512 Source as of June 2017: Intel internal measurements on platform with Xeon Platinum 8180, Turbo enabled, UPI=10.4, SNC1, 6x32GB DDR4-2666 per CPU, 1 DPC. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 669 1178 2034 3259 760 768 791 767 3.1 2.8 2.5 2.1 0 0.5 1 1.5 2 2.5 3 3.5 0 500 1000 1500 2000 2500 3000 3500 SSE4.2 AVX AVX2 AVX512 CoreFrequency GFLOPs,SystemPower LINPACK Performance GFLOPs Power (W) Frequency (GHz) 1.00 1.74 2.92 4.83 0.00 1.00 2.00 3.00 4.00 5.00 6.00 SSE4.2 AVX AVX2 AVX512 NormalizedtoSSE4.2 GFLOPs/Watt GFLOPs / Watt 1.00 1.95 3.77 7.19 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 SSE4.2 AVX AVX2 AVX512 NormalizedtoSSE4.2 GFLOPs/GHz GFLOPs / GHz Intel®AVX-512deliverssignificantperformanceandefficiencygains 19
  • 21. Intel® Omni-Path Architecture In 30 secs 21 The Interconnect Landscape: Why Intel® OPA? 1 Source: Internal analysis based on a 256-node to 2048-node clusters configured with Mellanox FDR and EDR InfiniBand products. Mellanox component pricing from www.kernelsoftware.com Prices as of November 3, 2015. Compute node pricing based on Dell PowerEdge R730 server from www.dell.com. Prices as of May 26, 2015. Intel® OPA (x8) utilizes a 2-1 over-subscribed Fabric. Intel® OPA pricing based on estimated reseller pricing using projected Intel MSRP pricing on day of launch. Performance I/O struggling to keep up with CPU innovation Increasing Scale From 10K nodes….to 200K+ Previous solutions reaching limits of scalability, manageability and reliability Fabric: Cluster Budget1 Fabric an increasing % of HPC hardware costs 21 3 SU14 1 2 3 SU15 1 2 3 SU16 1 2 3 SU17 1 2 3 SU18 1 2 3 SU10 1 2 3 SU11 1 2 3 SU12 1 2 3 SU13 1 2 3 SU05 1 2 3 SU06 1 2 3 SU07 1 2 3 SU08 1 2 3 SU09 1 2 3 SU01 1 2 3 SU02 1 2 3 SU03 1 2 3 SU04 1 2 3 Tomorrow 30 to 40% Today 20%-30% Goal: Keep cluster costs in check  maximize COMPUTE power per dollar
  • 22. 7 Intel® Omni-Path Architecture The Future of High Performance Fabrics Better Scaling vs EDR 48 Radix Chip Ports Up to 26% More Servers than InfiniBand* EDR within the Same Budget1 Up to 60% Lower Power and Cooling Costs2 Configurable / Resilient Job Prioritization (Traffic Flow Optimization) No-Compromise Resiliency (Packet Integrity Protection and Dynamic Lane Scaling) Market Adoption >100 OEM and HPC Storage Vendor Offerings Expected for Platforms, Switches, and Adapters3 Intel® Omni-Path Architecture HPC’s NextGeneration Fabric 1. Assumes a 750-node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of 648-port director switches and 36-port edge switches. Mellanox componentpricing from www.kernelsoftware.com, with prices as of November 3, 2015.Computenode pricing based onDellPowerEdge R730 server from www.dell.com,with prices as of May 26,2015.Intel®OPA pricing based onestimated resellerpricing based on Intel MSRP pricing on ark.intel.com. 2. Assumes a 750- node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of director switches and edge switches. Mellanox power data based on Mellanox CS7500 DirectorSwitch, MellanoxSB7700/SB7790Edgeswitch, and MellanoxConnectX-4VPI adapter card installation documentationposted on www.mellanox.comas ofNovember 1,2015. IntelOPA power databased on productbriefs postedon www.intel.comasofNovember16, 2015.Intel®OPA pricing based onestimated reseller pricing based on Intel MSRP pricing on ark.intel.com. 3. Intel internal information. Design win count based on OEM and HPC storage vendors who are planning to offer either Intel-branded or custom switch products, along with the total number of OEM platforms that are currently planned to support custom and/or standardIntel®OPA adapters. Design win countas ofNovember 1,2015 and subjectto changewithout noticebased on vendorproductplans.*Othernamesand brands maybe claimed as property of others. Intel® Scalable System Framework
  • 23. 600 500 400 300 200 100 0 SwitchChipsRequired Nodes Intel® OPA 48-port switch InfiniBand* 36-port switch FEWER SWITCHES REQUIRED 1. Assumes a 750-node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of 648-port director switches and 36-port edge switches. Mellanox component pricing from www.kernelsoftware.com, with prices as of November 3, 2015. Compute node pricing based on Dell PowerEdge R730 server from www.dell.com, with prices as of May 26, 2015. Intel® OPA pricing based on estimated reseller pricing based on Intel MSRP pricing on ark.intel.com. 2. Assumes a 750- node cluster, and number of switch chips required is based on a full bisectional bandwidth (FBB) Fat-Tree configuration. Intel® OPA uses one fully-populated 768-port director switch, and Mellanox EDR solution uses a combination of director switches and edge switches. Mellanox power data based on Mellanox CS7500 Director Switch, Mellanox SB7700/SB7790 Edge switch, and Mellanox ConnectX-4 VPI adapter card installation documentation posted on www.mellanox.com as of November 1, 2015. Intel OPA power data based on product briefs posted on www.intel.com as of November 16, 2015. Intel® OPA pricing based on estimated resellerpricing based onIntelMSRP pricing onark.intel.com.3Numberof switch chips required, switch density,and fabric scalability are based ona fullbisectional bandwidth (FBB) Fat-Tree configuration,using a48-portswitch for Intel®Omni-PathArchitectureand 36-portswitchASICforeither Mellanoxor Intel® True ScaleFabric. *Othernamesand brands maybe claimed asthe property ofothers. 2.3Xfabric scalability based on a27,648-nodeclusterconfiguredwith the Intel®Omni-Path Architectureusing48-portswitch ASICs,ascompared with a36-port switch chip thatcansupport upto11,664 nodes. 26%More Servers than EDR1 60%Lower Cooling Costs2 2.3XGreater Fabric Scalability3 7 Intel® Omni-Path Architecture HPC’s Next-Generation Fabric Intel® Scalable System Framework
  • 24. Intel® Omni-Path Architecture Xeon Phi™ Processor-F (KNL-F) Maximizing Support for Heterogeneous Clusters Intel Xeon Processor (HSW, BDW & SKL) PCI Card Xeon Phi™ Processor (KNL) HFI Greater flexibility for creating compute islands depending on user requirements 24 WFR HFI Intel Xeon Processor-F (SKL-F) HFI WFR HFI Intel Xeon Processor-F (SKL-F) HFI GPU GPU GPU memory GPU memory PCI bus Intel Xeon Processor (SKL) GPU Direct v3 provided in Intel® OPA 10.3 release PCI Card PCI Card WFR HFI
  • 25. Intel® Omni-Path Architecture Next Up for Intel® OPA: Artificial Intelligence Intel offers a complete AI Portfolio  From CPUs to software to computer vision to libraries and tools Intel® OPA offers breakthrough performance on scale-out apps  Low latency  High bandwidth  High message rate  GPU Direct RDMA support  Xeon Phi Integration 25 Things &devices Cloud DATACenter Accelerant Technologies World-class interconnect solution for shorter time to train
  • 26. Intel® Omni-Path Architecture NVMe* over OPA Intel® OPA + Intel® SSD and Optane™ Technology  High Endurance  Low latency  High Efficiency  Complete NVMe over Fabric Solution NVMe-over-OPA status  Supported in 10.4.3 IFS release  Compliant with NVMeF spec 1.0 Target and Host system configuration: 2 x Intel® Xeon® CPU E5-2699 v3 @ 2.30Ghz, Intel® Server Board S2600WT, 128GB DDR4, CentOS 7.3.1611, kernel 4.10.12, IFS 10.4.1, NULL- BLK, FIO 2.19 options hfi1 krcvqs=8 sge_copy_mode=2 wss_threshold=70 26 *Other names and brands may be claimed as the property of others. Only Intel is delivering a total NVMe over Fabric solution! NVMe Host Driver RDMA Transport Intel® OPA HFI NVMe Host Driver NVMe Target Driver RDMA Transport NVMe Storage Intel® OPA HFI Host Target PCIe Transport ~1.5M 4k Random IOPS 99% Bandwidth Efficiency
  • 28. 9 Tighter System-Level Integration Innovative Memory-Storage Hierarchy *cache, memory or hybrid mode Compute Node Processor Memory Bus I/O Node Remote Storage Compute Today Caches Local Memory Local Storage Parallel File System (Hard Drive Storage) HigherBandwidth. LowerLatencyandCapacity Much larger memory capacities keep data in local memory Local memory is now faster & in processor package Compute Future Caches Intel® DIMMs based on 3D XPoint™ Technology Burst Buffer Node with Intel® Optane™ Technology SSDs Parallel File System (Hard Drive Storage) On-Package High Bandwidth Memory* SSD Storage Intel® Optane™ Technology SSDsI/O Node storage moves to compute node Some remote data moves onto I/O node Local Memory Intel® Scalable System Framework
  • 29. 4 Bridging the Memory-Storage Gap Intel® Optane™ Technology Based on 3D XPoint™ SSD Intel® Optane™ SSDs 5-7x Current Flagship NAND-Based SSDs (IOPS)1 DRAM-like performance Intel® DIMMs Based on 3D-XPoint™ 1,000x Faster than NAND1 1,000x the Endurance of NAND2 Hard drive capacities 10x More Dense than Conventional Memory3 1Performancedifferencebased oncomparison between 3DXPoint™ Technologyandother industryNAND 2Densitydifference based oncomparison between 3DXPoint™ Technologyandother industryDRAM 2Endurancedifference based oncomparison between 3DXPoint™ Technologyandother industryNAND Intel® Scalable System Framework
  • 30. 30NVM SOLUTIONS GROUP 30NVM SOLUTIONS GROUP Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications. Intel® Optane™ SSD prototype compared to the Intel® SSD DC P3700 Series (NAND) Intel® Optane™ SSDs for Data Center Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications. Intel® Optane™ SSD prototype compared to the Intel® SSD DC P3700 Series (NAND) = Ultra-high Endurance Responsive Under Load Low Latency Predictably Fast Service QoS Breakthrough Performance IOPS
  • 31. NVM Solutions Group 31 Intel® Optane™ SSD Use Cases DRAM PCIe* PCIe Intel® 3D NAND SSDs Intel® Optane™ SSD Fast Storage and Cache Intel® Xeon® ‘memory pool’DRAM PCIe Intel® 3D NAND SSDs Intel® Optane™ SSD DDR DDR PCIe Extend Memory Intel® Xeon® *Other names and brands may be claimed as the property of others
  • 32. NVM Solutions Group 32 5-8x faster at low Queue Depths1 Vast majority of applications generate low QD storage workloads 1. Common Configuration - Intel 2U Server System, OS CentOS 7.2, kernel 3.10.0-327.el7.x86_64, CPU 2 x Intel® Xeon® E5-2699 v4 @ 2.20GHz (22 cores), RAM 396GB DDR @ 2133MHz. Configuration – Intel® Optane™ SSD DC P4800X 375GB and Intel® SSD DC P3700 1600GB. Performance – measured under 4K 70-30 workload at QD1-16 using fio-2.15. Breakthrough Performance Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance.
  • 33. NVM Solutions Group 33 up to 60x better at 99% QoS1 Ideal for critical applications with aggressive latency requirements 1. Common Configuration – Intel 2U Server System, OS CentOS 7.2, kernel 3.10.0-327.el7.x86_64, CPU 2 x Intel® Xeon® E5-2699 v4 @ 2.20GHz (22 cores), RAM 396GB DDR @ 2133MHz. Configuration – Intel® Optane™ SSD DC P4800X 375GB and Intel® SSD DC P3700 1600GB. QoS – measures 99% QoS under 4K 70-30 workload at QD1 using fio-2.15. Predictably Fast Service Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance.
  • 34. NVM Solutions Group 34 Ultra Endurance MLC/TLC 2D/3D NAND SSD Intel® Optane™ SSD Endurance (DWPD) 0.5 3 30 Up to 10x more Total Bytes Written at similar capacity1 Architected for endurance scaling  ‘Write in place’ technology  Non-destructive write process Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. 1. Comparing projected Intel® Optane™ SSD 750GB specifications to actual Intel® SSD DC P4600 1.6TB specifications. Total Bytes Written (TBW) calculated by multiplying specified or projected DWPD x specified or projected warranty duration x 365 days/year.
  • 36. 36 By2020… The average internet user will generate ~1.5GBoftrafficperday Smart hospitals will generate over 3,000GBperday Self driving cars will be generating over 4,000GBperday…each All numbers are approximated http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html https://datafloq.com/read/self-driving-cars-create-2-petabytes-data-annually/172 http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html A connected plane will generate over 40,000GBperday A connected factory will generate over 1,000,000GBperday radar ~10-100KB persecond sonar ~10-100KB persecond gps ~50KB persecond lidar ~10-70MB persecond cameras ~20-40MB persecond Self driving cars will generate over 4,000GBperday…each Thecomingfloodofdata
  • 37. 37 Analyticsneedsai Hindsight What Happened Insight What Happened and Why Foresight What Will Happen, When, and Why Simulation-Driven Analysis and Decision-Making Self-Learning and Completely Automated Enterprise Mature Data Lake Computerized Human Thought Simulation and Actions Towards Autonomic Enterprise Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics Cognitive Analytics AI is a large category all on its own, and a vital tool for reaching higher maturity & scale data analytics Advanced Analytics Operational Analytics TodayEmerging
  • 38. 38 AIComputeCycleswillgrow by202012X mainframes Standards- basedservers Cloud computing Artificial intelligence Source: Intel forecast Thenextbigwave Datadeluge COMPUTEbreakthrough Innovationsurge
  • 40. 40 IntelStrategy:OptimizedDeepLearningEnvironment Fuel the development of vertical solutions Deliver best single node and multi-node performance Accelerate design, training, and deployment Drive optimizations across open source machine learning frameworks Nervana Cloud™ Maximum performance on Intel architectureIntel® Math Kernel Library (Intel® MKL) Training Inference Intel® MKL-DNN Intel® Nervana™ Graph © 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
  • 41. 41 ✝Codename for product that is coming soon All performance positioning claims are relative to other processor technologies in Intel’s AI datacenter portfolio *Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc. All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. AI Datacenter Allpurpose Highly-parallel Flexibleacceleration DeepLearning Crest Family✝ Deeplearningbydesign Scalable acceleration with best performance for intensive deep learning training & inference Intel® FPGA EnhancedDLInference Scalable acceleration for deep learning inference in real-time with higher efficiency, and wide range of workloads & configurations Intel® Xeon® Processor Family Training&Inference Scalable performance for widest variety of AI & other datacenter workloads – including deep learning training & inference Intel® Xeon Phi™ Processor (Knights Mill✝) FasterDLTraining Scalable performance optimized for even faster deep learning training and select highly-parallel datacenter workloads* ✝
  • 42. MostagileAIplatform Intel®Xeon®ScalableprocessorsforAI Scalable performance for widest variety of AI & other datacenter workloads – including deep learning Built-inROI Begin your AI journey today using existing, familiar infrastructure Potentperformance Train in days HOURS with up to 113X2 perf vs. Intel Xeon E5 v3 (2.2x excluding optimized SW1) Production-ready Robust support for full range of AI deployments 1,2Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. See slide 15 for configuration details. 42
  • 43. 4343 Intel®Xeon®Inference&trainingperformance INFERENCE THROUGHPUT Up to 2.4x Intel® Xeon® Platinum 8180 Processor higher Neon ResNet 18 inference throughput compared to Intel® Xeon® Processor E5-2699 v4 TRAINING THROUGHPUT Up to 2.2x Intel® Xeon® Platinum 8180 Processor higher Neon ResNet 18 training throughput compared to Intel® Xeon® Processor E5-2699 v4 Advance previous generation AI workload performance with Intel® Xeon® Scalable Processors Inference throughput batch size: 1 Training throughput batch size: 256 Configuration Details on Slide: 18, 20 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher.
  • 44. 4444 Intel®Xeon®PlatformPerformance INFERENCE THROUGHPUT Up to 138x Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1 with Intel® MKL inference throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe INFERENCE using FP32 Batch Size Caffe GoogleNet v1 256 AlexNet 256 Configuration Details on Slide: 18, 25 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. TRAINING THROUGHPUT Up to 113x Intel® Xeon® Platinum 8180 Processor higher Intel Optimized Caffe AlexNet with Intel® MKL training throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe Deliver significant AI performance with hardware and software optimizations on Intel® Xeon® Scalable Processors Optimized Frameworks Optimized Intel® MKL Libraries Inference and training throughput measured with FP32 instructions. Inference with INT8 will be higher. Hardware plus optimized software
  • 45. 45 Scalable performance optimized for even faster deep learning training and select highly-parallel datacenter workloads* Intel®XeonPhi™processor(KnightsMill)  Delivers up to 4Xdeep learning performance over Knights Landing✝  New instructions sets deliver enhanced lower precision performance  Time-to-train reduction is the primary benchmark to judge deep learning training performance  Direct access of up to 400 GB of memory with no PCIe performance lag (vs. GPU:16GB)  Efficient scaling further reduces time-to- train when utilizing scaled Knights Mill systems  Up to 400Xdeep learning performance on existing HW via Intel SW optimization  Share deep learning software investments across Intel Platforms via Intel deep learning software tools  Binary-compatible with Intel® Xeon® processor Fastertime-to-train Efficientscaling Futureready ✝Knights Landing is the former codename for the Intel® Xeon Phi™ processor family that was released in 2016 Configuration details on final slides *Knights Mill (KNM); select = single-precision highly-parallel workloads generally scale to >100 threads and benefit from more vectorization, and may also benefit from greater memory bandwidth e.g. energy (reverse time migration), deep learning training, etc. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Faster DLTraining Highly-parallel
  • 46. 46 Deeplearning Bydesign Scalable acceleration with best performance for intensive deep learning training & inference, period Crestfamily  Unprecedented compute density  Large reduction in time-to-train  32 GB of in package memory via HBM2 technology  8 Tera-bits/s of memory access speed  12 bi-directional high-bandwidth links  Seamless data transfer via interconnects Customhardware Blazingdataaccess High-speedscalability 1Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 2017
  • 47. 47 optimizedforIntelarchitecture BigDL MLliB Aiframeworks and more frameworks enabled via Intel® Nervana™ Graph (future) See Roadmap for availability Other names and brands may be claimed as the property of others. Intel®'s reference deep learning framework committed to best performance on all hardware intelnervana.com/neon
  • 49. NVM Solutions Group 49 Legal Notices and Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, Xeon, Intel Optane, and 3D XPoint are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2017 Intel Corporation.