SlideShare ist ein Scribd-Unternehmen logo
1 von 27
© 2017 Arm Limited
Peter Greenhalgh
VP and GM of Central Technology
Arm DynamIQ:
Intelligent Solutions
Using Cluster Based
Multiprocessing
© 2017 Arm Limited2
Drones Wearable technology Smartwatch
3D printing Voice recognition Social media
Technology innovations of 2013
© 2017 Arm Limited3
Looking ahead from edge to cloud
The future requires a new approach to CPU design
Safe and autonomous Hyper-efficient
Secure private compute
Cortex beyond mobile Mixed reality
Confidential © Arm 20174
Arm DynamIQ
Rearchitecting the compute experience
Multi-core redefined for
broad market
Massive system
performance uplift
More intelligent
systems
© 2017 Arm Limited5
Innovating for the scalable future
Up to 8 CPUs
‘Octacore’ smartphones
Dual cluster
Heterogeneous processing
Nearly “Unlimited”
design spectrum
Covers all existing use
cases
DynamIQ cluster
Dynamic flexibility
2013 2017
Expanding Arm technology
processor architecture for
broad market
Arm AMBA
Arm big.LITTLE
Arm CoreLink
Arm TrustZone
Arm NEON
Key Arm technologies
© 2017 Arm Limited
DSU – Broadening the
reach of technology
© 2017 Arm Limited7
DynamIQ: New cluster design for new cores
DynamIQ big.LITTLE systems:
• Greater product differentiation and scalability
• Improved energy efficiency and performance
• SW compatibility with Energy Aware Scheduling (EAS)
Private L2 and shared L3 caches
• Local cache close to processors
• L3 cache shared between all cores
DynamIQ Shared Unit (DSU)
• Contains L3, Snoop Control Unit (SCU) and all cluster
interfaces
1b+4L1b+3L1b+2L
1b+7L
Example DynamIQ big.LITTLE configurations
..
AMBA4 ACE
SCU
Shared L3 cacheACP
Cortex-A55
32b/64b Core
Private L2 cache
Async BridgesPeripheral Port
Cortex-A75
32b/64b Core
Private L2 cache
DynamIQ Shared Unit (DSU)
2b+6L
4b+4L
© 2017 Arm Limited8
DynamIQ cluster
0 - 7 CoresCore 0
Snoop
filter
Power
Management
L3
Cache
Bus
I/F
ACP and
peripheral
port I/F
Core 7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
DynamIQ Shared Unit (DSU)
Streamlines
traffic across
bridges
Advanced power
management
features
Latency and bandwidth
optimizations
Support for multiple
performance domains
Scalable interfaces for edge to
cloud applications
Supports large amounts
of local memory
Low latency interfaces for
closely coupled accelerators
© 2017 Arm Limited9
Level 3 cache memory system
New memory system for Cortex-A clusters
Integrated snoop filter to improve efficiency
Enabling lower cache latencies
DynamIQ cluster
0–7 Cores
Core0
Snoop
filter
Power
Mngmt
L3
Cache
Bus
I/F
ACP and
peripheral
port I/F
Core7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
L1 cache
L2 cache
Load to Use Cycles* Cortex-A53 Cortex-A55 Cortex-A73 Cortex-A75
L1 hit 3 2 3 3
L2 hit 13 6 19 8
L3 hit - 21 - 25
Interconnect boundary 20 21 26 25
L1 cache
L2 cache
© 2017 Arm Limited10
Level 3 cache partition
Infrastructure
• Process 1 = data plane
• Process 2 = control
plane
• Packet processing data
sent through low
latency ACP interface
Sensors or
I/O agents ACP
Process 2
Core group 2
Example configuration with two Core groups in a DynamIQ cluster
Group 1 Group 2
Core group 1
Process 1
L3 cache
Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7
Reserved for external
accelerators via ACP
© 2017 Arm Limited11
Level 3 cache partition
Automotive
• Each process could
represent an
independent ADAS
algorithm
• Sensors linked through
low latency ACP
interface
Sensors or
I/O agents ACP
Process 4
Core group 4
Example configuration with four Core groups in a DynamIQ cluster
Group 1,2 Group 4
Core group 1
Process 1
L3 cache
Core0 Core1 Core6 Core7
Core group 2
Process 2
Core2 Core3
Core group 3
Process 3
Core4 Core5
Group 3
Reserved for external
accelerators via ACP
© 2017 Arm Limited12
Increasing performance through cache stashing
Enables reads/writes into the shared L3 cache or
per-core L2 cache
Allows closely coupled accelerators and I/O agents
to gain access to core memory
AMBA 5 CHI and Accelerator Coherency Port (ACP)
can be used for cache stashing
More throughput with Peripheral Port (PP) for
acceleration, network, storage use-cases Accelerator
or I/O
CoreLink CMN-600
DMC-620 DMC-620
Agile System Cache
DDR4 DDR4
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Agile System Cache
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Stash critical data to
any cache level
DynamIQ cluster
0–7 Cores
Core0
Snoop
filter
Power
Mngmt
L3
Cache
Bus
I/F
ACP and
peripheral
port I/F
Core7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
L1 cache
L2 cache
L1 cache
L2 cache
© 2017 Arm Limited13
Increasing performance through tight integration
Offload acceleration
Example application:
Offload crypto acceleration
I/O processing
Example application:
Packet processing in network systems
DynamIQ cluster
Accelerator
(4) Writes result into
Core memory
(1) Configure
registers for task
(2) Fetches data from
Core memory
(3) Carries out
acceleration
ACP
PP
DynamIQ cluster
I/O agent
(4) Reads result from Core
memory
or sends data for onward
processing
(3) Processing
completed
(1) Writes data into
Core memory
(2) Carries out
computation
ACP
PP
© 2017 Arm Limited14
Automotive and industrial safety and reliability
ADAS and IVI compute performance
• DynamIQ provides performance required for
autonomous cars
• Faster responsiveness
DynamIQ: Functional Safety
• Following ASIL D systematic flow
• Provides higher safety integrity
Industry’s broadest functional safety
capable CPU portfolio
Autonomous system
Sense Perceive Decide Actuate
Cortex-M Cortex-R
Safety IslandApplication cores
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Sensors
SoC
Lock-step core
© 2017 Arm Limited
Cortex-A75 – Increasing
Performance
Cortex-A55 – Improving
Efficiency
© 2017 Arm Limited16
New levels of performance for smart solutions
Cortex-A75 Cortex-A55
All comparisons at ISO
process and frequency
Baseline to Cortex-A73 Baseline to Cortex-A53
1.21x
1.42x
1.97x
1.14x
1.22x
SPECINT2006
SPECFP2006
LMBench memcpy
Octane 2.0
Geekbench v4
1.22x
1.33x
1.16x
1.48x
1.34x
SPECINT2006
SPECFP2006
LMBench memcpy
Octane 2.0
Geekbench v4
All comparisons at ISO
process and frequency
© 2017 Arm Limited17
Architecture and Pipelines
Common features
• Armv8.2-A Architecture
• DynamIQ big.LITTLE
Cortex-A75 – performance focussed
• Out-of-Order, 11-13 stage integer pipeline
Cortex-A55 – efficiency focussed
• In-order, 8 stage integer pipeline
ALU/INT (MAC)
NEON/FP F0
Decode
NEON/FP F1
Instruction
Fetch
Writeback
Issue
ALU/INT (DIV)
Branch
AGU Load
AGU Store
Cortex-A55
Int I0 (MUL)
Decode
Instruction
Fetch
Writeback
Int I1 (DIV)
AGU LD/ST
AGU LD/ST
Branch B
Cortex-A75
Instruction
Queue
Writeback
Rename
Dispatch
IsQ (12)
IsQ (12)
IsQ (8)
IsQ (8)
IsQ (20)
Decode
Rename
IsQ (8)
IsQ (8)
IsQ (8)
NEION/FP F1
NE/FP Store
NEON/FP F0
Writeback
© 2017 Arm Limited18
Instruction
Extraction & Parsing
Instruction
Queue
FillBuffer
Conditional
PredictorL1
Instruction
Cache
AGU
Indirect
Predictor
Branch
Predictor
Instruction fetch
Common features
• 4-way set associative
• Virtually indexed, physically tagged (VIPT)
• Decoupled from Cores thru instruction queue
Cortex-A75
• 64KB
• 4-wide instruction fetch
Cortex-A55
• 16KB / 32KB / 64KB
• 2-wide instruction fetch
© 2017 Arm Limited19
Instruction
Extraction & Parsing
Instruction
Queue
FillBuffer
Conditional
PredictorL1
Instruction
Cache
AGU
Indirect
Predictor
Branch
Predictor
Branch prediction
Cortex-A75
• Fine-tuned 0-cycle prediction
• State of the art, mobile focussed, table based
conditional prediction
Cortex-A55
• Brand new 0-cycle predictors
• New main conditional predictor - Neural network
based
• New loop predictors
© 2017 Arm Limited20
Cortex-A75: Datapaths
3-way superscalar high-performance pipeline
• Single-cycle decode with instruction fusing and
micro-ops
7 independent high-performance issue queues
• 2x Load/Store, 2x NEON/FPU, 1x Branch and 2x
Integer core
Increased capacity to sustain operation under
L1 miss / L2 hit
• 12 entries for integer core to maximise on in-
flight instructions and out-of-order capabilities
• 8 entries for Load/Store and NEON/FPU
Cortex-A75
Private L2 Cache
Instruction
Fetch
Main
TLB
Arm
Register
File
D.E.
Register
File
Dispatch
Issue
64k
D-Cache
STB
64k
I-Cache
Branch
Prediction
Decode
Rename
Load/Store
Advanced NEON
Floating Point
ALUs
iDIV
MAC
AGUs
Writeback
© 2017 Arm Limited21
Cortex-A55: Datapaths
Dual issue of loads and stores
Improved latency for forwarding ALU
results to the AGU
• Reduced by one cycle for many common ALU
operations
Reduced L1 cache load-to-use latency for
pointer chasing to two cycles
Integer
Register
File
NEON-FP Regfile NEON Pipe
Decode
Store Pipe
x2
Cortex-A55
ALU Pipe
ALU Pipe
Integer Pipe
Divide Pipe
Mult Acc
Shift ALU
Shift ALU
Load PipeAGU
Data Cache
Output
Data Cache
Address
© 2017 Arm Limited22
L1 memory system
Common features
• 4-way set associative
• VIPT with PIPT programmer’s view
• Improved prefetchers
Cortex-A75
• 64KB
• Wider load-store than Cortex-A73
• Support Read-after-Write OoO with filtering
Cortex-A55
• 16KB / 32KB / 64KB
• Improved store buffer bandwidth to L1
• Larger 16-entry L1-TLB
Store Buffer
L1
Data
Cache
Prefetcher
L1 TLB
L2 TLB
L2
Cache
© 2017 Arm Limited23
Store Buffer
L1
Data
Cache
Prefetcher
L1 TLB
L2 TLB
L2
Cache
L2 memory system
Common features
• Private L2 cache in each Core
• Running at Core speed
• Exclusive data cache
• Cache stashing into the L2
• Non-blocking 1024-entry TLB for hit-under-miss
Cortex-A75
• 256KB / 512 KB
Cortex-A55
• 0KB / 64KB / 128KB / 256KB
© 2017 Arm Limited24
Next-generation features
Dot product and half-precision float for AI/ML processing
Virtualized Host Extensions (VHE) offering Type-2 hypervisor
(KVM) performance improvements
Cache stashing and atomic operations improves multicore
networking performance and improves latency
Cache clean to persistence to support storage class memory
Infrastructure class RAS enhancement including data poisoning
and improved error management
© 2017 Arm Limited25
Innovating for the scalable future
2013-2017: The nature of compute is changing the landscape
Expanding Arm technologies for broad market applicability
New cluster design with new DynamIQ cores:
• Cortex-A75: Breakthrough performance
• Cortex-A55: Efficiency redefined
Functional safety for industrial and automotive applications
New features expanding microarchitecture capabilities:
• DynamIQ Shared Unit , new cache features, new branch prediction
2626
Thank You!
Danke!
Merci!
谢谢!
ありがとう!
Gracias!
Kiitos!
© 2017 Arm Limited
2727 © 2017 Arm Limited
The Arm trademarks featured in this
presentation are registered trademarks or
trademarks of Arm Limited (or its
subsidiaries) in the US and/or elsewhere. All
rights reserved. All other marks featured may
be trademarks of their respective owners.
www.arm.com/company/policies/trademarks

Weitere ähnliche Inhalte

Was ist angesagt?

Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityCXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityMemory Fabric Forum
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIAllan Cantle
 
The Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on LinuxThe Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on LinuxPicker Weng
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VScyllaDB
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Jace Liang
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUAMD
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDPDaniel T. Lee
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
 

Was ist angesagt? (20)

Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent ConnectivityCXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent Connectivity
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 
The Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on LinuxThe Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on Linux
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Embedded Hypervisor for ARM
Embedded Hypervisor for ARMEmbedded Hypervisor for ARM
Embedded Hypervisor for ARM
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 

Ähnlich wie Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing

Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXYoshihiro Horie
 
Plan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certificationPlan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certificationMassimo Talia
 
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019Paula Koziol
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...Edge AI and Vision Alliance
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm updateinwin stack
 
Arm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdfArm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdfPaul Yang
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesArm
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Netronome
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreinside-BigData.com
 
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampPCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampFPGA Central
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT Project
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Workgroup
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCLinaro
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...Edge AI and Vision Alliance
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power EdgeSashikris
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationNetronome
 

Ähnlich wie Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing (20)

Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
 
Plan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certificationPlan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certification
 
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm update
 
Arm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdfArm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdf
 
ARM cortex A15
ARM cortex A15ARM cortex A15
ARM cortex A15
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devices
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
 
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampPCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
RDMA on ARM
RDMA on ARMRDMA on ARM
RDMA on ARM
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 Presentation
 

Mehr von Arm

Project Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning PlatformProject Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning PlatformArm
 
IoTs Place in the World of 5G
IoTs Place in the World of 5GIoTs Place in the World of 5G
IoTs Place in the World of 5GArm
 
AI Today, AI Tomorrow
AI Today, AI TomorrowAI Today, AI Tomorrow
AI Today, AI TomorrowArm
 
An Amazing World of Possibilities (Computex 2017)
An Amazing World of Possibilities (Computex 2017)An Amazing World of Possibilities (Computex 2017)
An Amazing World of Possibilities (Computex 2017)Arm
 
The importance of strong entropy for iot
The importance of strong entropy for iotThe importance of strong entropy for iot
The importance of strong entropy for iotArm
 
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Arm
 
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?Arm
 
Developing functional safety systems with arm architecture solutions stroud
Developing functional safety systems with arm architecture solutions   stroudDeveloping functional safety systems with arm architecture solutions   stroud
Developing functional safety systems with arm architecture solutions stroudArm
 
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuSoftware development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuArm
 
A practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platformsA practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platformsArm
 
Sustainably Connecting a Global Community
Sustainably Connecting a Global CommunitySustainably Connecting a Global Community
Sustainably Connecting a Global CommunityArm
 

Mehr von Arm (11)

Project Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning PlatformProject Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning Platform
 
IoTs Place in the World of 5G
IoTs Place in the World of 5GIoTs Place in the World of 5G
IoTs Place in the World of 5G
 
AI Today, AI Tomorrow
AI Today, AI TomorrowAI Today, AI Tomorrow
AI Today, AI Tomorrow
 
An Amazing World of Possibilities (Computex 2017)
An Amazing World of Possibilities (Computex 2017)An Amazing World of Possibilities (Computex 2017)
An Amazing World of Possibilities (Computex 2017)
 
The importance of strong entropy for iot
The importance of strong entropy for iotThe importance of strong entropy for iot
The importance of strong entropy for iot
 
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
 
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?
 
Developing functional safety systems with arm architecture solutions stroud
Developing functional safety systems with arm architecture solutions   stroudDeveloping functional safety systems with arm architecture solutions   stroud
Developing functional safety systems with arm architecture solutions stroud
 
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuSoftware development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiu
 
A practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platformsA practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platforms
 
Sustainably Connecting a Global Community
Sustainably Connecting a Global CommunitySustainably Connecting a Global Community
Sustainably Connecting a Global Community
 

Kürzlich hochgeladen

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 

Kürzlich hochgeladen (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing

  • 1. © 2017 Arm Limited Peter Greenhalgh VP and GM of Central Technology Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
  • 2. © 2017 Arm Limited2 Drones Wearable technology Smartwatch 3D printing Voice recognition Social media Technology innovations of 2013
  • 3. © 2017 Arm Limited3 Looking ahead from edge to cloud The future requires a new approach to CPU design Safe and autonomous Hyper-efficient Secure private compute Cortex beyond mobile Mixed reality
  • 4. Confidential © Arm 20174 Arm DynamIQ Rearchitecting the compute experience Multi-core redefined for broad market Massive system performance uplift More intelligent systems
  • 5. © 2017 Arm Limited5 Innovating for the scalable future Up to 8 CPUs ‘Octacore’ smartphones Dual cluster Heterogeneous processing Nearly “Unlimited” design spectrum Covers all existing use cases DynamIQ cluster Dynamic flexibility 2013 2017 Expanding Arm technology processor architecture for broad market Arm AMBA Arm big.LITTLE Arm CoreLink Arm TrustZone Arm NEON Key Arm technologies
  • 6. © 2017 Arm Limited DSU – Broadening the reach of technology
  • 7. © 2017 Arm Limited7 DynamIQ: New cluster design for new cores DynamIQ big.LITTLE systems: • Greater product differentiation and scalability • Improved energy efficiency and performance • SW compatibility with Energy Aware Scheduling (EAS) Private L2 and shared L3 caches • Local cache close to processors • L3 cache shared between all cores DynamIQ Shared Unit (DSU) • Contains L3, Snoop Control Unit (SCU) and all cluster interfaces 1b+4L1b+3L1b+2L 1b+7L Example DynamIQ big.LITTLE configurations .. AMBA4 ACE SCU Shared L3 cacheACP Cortex-A55 32b/64b Core Private L2 cache Async BridgesPeripheral Port Cortex-A75 32b/64b Core Private L2 cache DynamIQ Shared Unit (DSU) 2b+6L 4b+4L
  • 8. © 2017 Arm Limited8 DynamIQ cluster 0 - 7 CoresCore 0 Snoop filter Power Management L3 Cache Bus I/F ACP and peripheral port I/F Core 7 Asynchronous bridges DynamIQ Shared Unit (DSU) DynamIQ Shared Unit (DSU) Streamlines traffic across bridges Advanced power management features Latency and bandwidth optimizations Support for multiple performance domains Scalable interfaces for edge to cloud applications Supports large amounts of local memory Low latency interfaces for closely coupled accelerators
  • 9. © 2017 Arm Limited9 Level 3 cache memory system New memory system for Cortex-A clusters Integrated snoop filter to improve efficiency Enabling lower cache latencies DynamIQ cluster 0–7 Cores Core0 Snoop filter Power Mngmt L3 Cache Bus I/F ACP and peripheral port I/F Core7 Asynchronous bridges DynamIQ Shared Unit (DSU) L1 cache L2 cache Load to Use Cycles* Cortex-A53 Cortex-A55 Cortex-A73 Cortex-A75 L1 hit 3 2 3 3 L2 hit 13 6 19 8 L3 hit - 21 - 25 Interconnect boundary 20 21 26 25 L1 cache L2 cache
  • 10. © 2017 Arm Limited10 Level 3 cache partition Infrastructure • Process 1 = data plane • Process 2 = control plane • Packet processing data sent through low latency ACP interface Sensors or I/O agents ACP Process 2 Core group 2 Example configuration with two Core groups in a DynamIQ cluster Group 1 Group 2 Core group 1 Process 1 L3 cache Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7 Reserved for external accelerators via ACP
  • 11. © 2017 Arm Limited11 Level 3 cache partition Automotive • Each process could represent an independent ADAS algorithm • Sensors linked through low latency ACP interface Sensors or I/O agents ACP Process 4 Core group 4 Example configuration with four Core groups in a DynamIQ cluster Group 1,2 Group 4 Core group 1 Process 1 L3 cache Core0 Core1 Core6 Core7 Core group 2 Process 2 Core2 Core3 Core group 3 Process 3 Core4 Core5 Group 3 Reserved for external accelerators via ACP
  • 12. © 2017 Arm Limited12 Increasing performance through cache stashing Enables reads/writes into the shared L3 cache or per-core L2 cache Allows closely coupled accelerators and I/O agents to gain access to core memory AMBA 5 CHI and Accelerator Coherency Port (ACP) can be used for cache stashing More throughput with Peripheral Port (PP) for acceleration, network, storage use-cases Accelerator or I/O CoreLink CMN-600 DMC-620 DMC-620 Agile System Cache DDR4 DDR4 L3 Cache L2 Cache CPU L2 Cache CPU L2 Cache CPU L2 Cache Cortex-A Agile System Cache L3 Cache L2 Cache CPU L2 Cache CPU L2 Cache CPU L2 Cache Cortex-A Stash critical data to any cache level DynamIQ cluster 0–7 Cores Core0 Snoop filter Power Mngmt L3 Cache Bus I/F ACP and peripheral port I/F Core7 Asynchronous bridges DynamIQ Shared Unit (DSU) L1 cache L2 cache L1 cache L2 cache
  • 13. © 2017 Arm Limited13 Increasing performance through tight integration Offload acceleration Example application: Offload crypto acceleration I/O processing Example application: Packet processing in network systems DynamIQ cluster Accelerator (4) Writes result into Core memory (1) Configure registers for task (2) Fetches data from Core memory (3) Carries out acceleration ACP PP DynamIQ cluster I/O agent (4) Reads result from Core memory or sends data for onward processing (3) Processing completed (1) Writes data into Core memory (2) Carries out computation ACP PP
  • 14. © 2017 Arm Limited14 Automotive and industrial safety and reliability ADAS and IVI compute performance • DynamIQ provides performance required for autonomous cars • Faster responsiveness DynamIQ: Functional Safety • Following ASIL D systematic flow • Provides higher safety integrity Industry’s broadest functional safety capable CPU portfolio Autonomous system Sense Perceive Decide Actuate Cortex-M Cortex-R Safety IslandApplication cores L3 Cache L2 Cache CPU L2 Cache CPU L2 Cache CPU L2 Cache Cortex-A Sensors SoC Lock-step core
  • 15. © 2017 Arm Limited Cortex-A75 – Increasing Performance Cortex-A55 – Improving Efficiency
  • 16. © 2017 Arm Limited16 New levels of performance for smart solutions Cortex-A75 Cortex-A55 All comparisons at ISO process and frequency Baseline to Cortex-A73 Baseline to Cortex-A53 1.21x 1.42x 1.97x 1.14x 1.22x SPECINT2006 SPECFP2006 LMBench memcpy Octane 2.0 Geekbench v4 1.22x 1.33x 1.16x 1.48x 1.34x SPECINT2006 SPECFP2006 LMBench memcpy Octane 2.0 Geekbench v4 All comparisons at ISO process and frequency
  • 17. © 2017 Arm Limited17 Architecture and Pipelines Common features • Armv8.2-A Architecture • DynamIQ big.LITTLE Cortex-A75 – performance focussed • Out-of-Order, 11-13 stage integer pipeline Cortex-A55 – efficiency focussed • In-order, 8 stage integer pipeline ALU/INT (MAC) NEON/FP F0 Decode NEON/FP F1 Instruction Fetch Writeback Issue ALU/INT (DIV) Branch AGU Load AGU Store Cortex-A55 Int I0 (MUL) Decode Instruction Fetch Writeback Int I1 (DIV) AGU LD/ST AGU LD/ST Branch B Cortex-A75 Instruction Queue Writeback Rename Dispatch IsQ (12) IsQ (12) IsQ (8) IsQ (8) IsQ (20) Decode Rename IsQ (8) IsQ (8) IsQ (8) NEION/FP F1 NE/FP Store NEON/FP F0 Writeback
  • 18. © 2017 Arm Limited18 Instruction Extraction & Parsing Instruction Queue FillBuffer Conditional PredictorL1 Instruction Cache AGU Indirect Predictor Branch Predictor Instruction fetch Common features • 4-way set associative • Virtually indexed, physically tagged (VIPT) • Decoupled from Cores thru instruction queue Cortex-A75 • 64KB • 4-wide instruction fetch Cortex-A55 • 16KB / 32KB / 64KB • 2-wide instruction fetch
  • 19. © 2017 Arm Limited19 Instruction Extraction & Parsing Instruction Queue FillBuffer Conditional PredictorL1 Instruction Cache AGU Indirect Predictor Branch Predictor Branch prediction Cortex-A75 • Fine-tuned 0-cycle prediction • State of the art, mobile focussed, table based conditional prediction Cortex-A55 • Brand new 0-cycle predictors • New main conditional predictor - Neural network based • New loop predictors
  • 20. © 2017 Arm Limited20 Cortex-A75: Datapaths 3-way superscalar high-performance pipeline • Single-cycle decode with instruction fusing and micro-ops 7 independent high-performance issue queues • 2x Load/Store, 2x NEON/FPU, 1x Branch and 2x Integer core Increased capacity to sustain operation under L1 miss / L2 hit • 12 entries for integer core to maximise on in- flight instructions and out-of-order capabilities • 8 entries for Load/Store and NEON/FPU Cortex-A75 Private L2 Cache Instruction Fetch Main TLB Arm Register File D.E. Register File Dispatch Issue 64k D-Cache STB 64k I-Cache Branch Prediction Decode Rename Load/Store Advanced NEON Floating Point ALUs iDIV MAC AGUs Writeback
  • 21. © 2017 Arm Limited21 Cortex-A55: Datapaths Dual issue of loads and stores Improved latency for forwarding ALU results to the AGU • Reduced by one cycle for many common ALU operations Reduced L1 cache load-to-use latency for pointer chasing to two cycles Integer Register File NEON-FP Regfile NEON Pipe Decode Store Pipe x2 Cortex-A55 ALU Pipe ALU Pipe Integer Pipe Divide Pipe Mult Acc Shift ALU Shift ALU Load PipeAGU Data Cache Output Data Cache Address
  • 22. © 2017 Arm Limited22 L1 memory system Common features • 4-way set associative • VIPT with PIPT programmer’s view • Improved prefetchers Cortex-A75 • 64KB • Wider load-store than Cortex-A73 • Support Read-after-Write OoO with filtering Cortex-A55 • 16KB / 32KB / 64KB • Improved store buffer bandwidth to L1 • Larger 16-entry L1-TLB Store Buffer L1 Data Cache Prefetcher L1 TLB L2 TLB L2 Cache
  • 23. © 2017 Arm Limited23 Store Buffer L1 Data Cache Prefetcher L1 TLB L2 TLB L2 Cache L2 memory system Common features • Private L2 cache in each Core • Running at Core speed • Exclusive data cache • Cache stashing into the L2 • Non-blocking 1024-entry TLB for hit-under-miss Cortex-A75 • 256KB / 512 KB Cortex-A55 • 0KB / 64KB / 128KB / 256KB
  • 24. © 2017 Arm Limited24 Next-generation features Dot product and half-precision float for AI/ML processing Virtualized Host Extensions (VHE) offering Type-2 hypervisor (KVM) performance improvements Cache stashing and atomic operations improves multicore networking performance and improves latency Cache clean to persistence to support storage class memory Infrastructure class RAS enhancement including data poisoning and improved error management
  • 25. © 2017 Arm Limited25 Innovating for the scalable future 2013-2017: The nature of compute is changing the landscape Expanding Arm technologies for broad market applicability New cluster design with new DynamIQ cores: • Cortex-A75: Breakthrough performance • Cortex-A55: Efficiency redefined Functional safety for industrial and automotive applications New features expanding microarchitecture capabilities: • DynamIQ Shared Unit , new cache features, new branch prediction
  • 27. 2727 © 2017 Arm Limited The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks