Suche senden
Hochladen
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
•
Als PPTX, PDF herunterladen
•
2 gefällt mir
•
1,208 views
Arm
Folgen
Peter Greenhalgh, VP and GM of Central Technology, Arm.
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 27
Jetzt herunterladen
Empfohlen
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
AMD
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
AMD
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
AMD
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
Michelle Holley
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorial
RISC-V International
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
RISC-V International
Empfohlen
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
AMD
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
AMD
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
AMD
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
Michelle Holley
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
Andes RISC-V vector extension demystified-tutorial
Andes RISC-V vector extension demystified-tutorial
RISC-V International
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
RISC-V International
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
Hisaki Ohara
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent Connectivity
Memory Fabric Forum
Intel dpdk Tutorial
Intel dpdk Tutorial
Saifuddin Kaijar
The Spectre of Meltdowns
The Spectre of Meltdowns
Andriy Berestovskyy
DPDK In Depth
DPDK In Depth
Kernel TLV
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
Allan Cantle
Andes RISC-V processor solutions
Andes RISC-V processor solutions
RISC-V International
The Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on Linux
Picker Weng
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
Jace Liang
Understanding DPDK
Understanding DPDK
Denys Haryachyy
FD.IO Vector Packet Processing
FD.IO Vector Packet Processing
Kernel TLV
The linux networking architecture
The linux networking architecture
hugo lu
IPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
Bruno Cornec
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
AMD
Embedded Hypervisor for ARM
Embedded Hypervisor for ARM
National Cheng Kung University
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
Daniel T. Lee
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
Allan Cantle
RISC-V Introduction
RISC-V Introduction
RISC-V International
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
inside-BigData.com
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
Yoshihiro Horie
Weitere ähnliche Inhalte
Was ist angesagt?
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
Hisaki Ohara
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent Connectivity
Memory Fabric Forum
Intel dpdk Tutorial
Intel dpdk Tutorial
Saifuddin Kaijar
The Spectre of Meltdowns
The Spectre of Meltdowns
Andriy Berestovskyy
DPDK In Depth
DPDK In Depth
Kernel TLV
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
Allan Cantle
Andes RISC-V processor solutions
Andes RISC-V processor solutions
RISC-V International
The Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on Linux
Picker Weng
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
Jace Liang
Understanding DPDK
Understanding DPDK
Denys Haryachyy
FD.IO Vector Packet Processing
FD.IO Vector Packet Processing
Kernel TLV
The linux networking architecture
The linux networking architecture
hugo lu
IPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
Bruno Cornec
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
AMD
Embedded Hypervisor for ARM
Embedded Hypervisor for ARM
National Cheng Kung University
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
Daniel T. Lee
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
Allan Cantle
RISC-V Introduction
RISC-V Introduction
RISC-V International
Was ist angesagt?
(20)
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
CXL Consortium Update: Advancing Coherent Connectivity
CXL Consortium Update: Advancing Coherent Connectivity
Intel dpdk Tutorial
Intel dpdk Tutorial
The Spectre of Meltdowns
The Spectre of Meltdowns
DPDK In Depth
DPDK In Depth
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
Andes RISC-V processor solutions
Andes RISC-V processor solutions
The Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on Linux
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
Understanding DPDK
Understanding DPDK
FD.IO Vector Packet Processing
FD.IO Vector Packet Processing
The linux networking architecture
The linux networking architecture
IPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Embedded Hypervisor for ARM
Embedded Hypervisor for ARM
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
RISC-V Introduction
RISC-V Introduction
Ähnlich wie Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
inside-BigData.com
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
Yoshihiro Horie
Plan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certification
Massimo Talia
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019
Paula Koziol
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Edge AI and Vision Alliance
Arm - ceph on arm update
Arm - ceph on arm update
inwin stack
Arm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdf
Paul Yang
ARM cortex A15
ARM cortex A15
KOMAL YAMGAR
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devices
Arm
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Netronome
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
inside-BigData.com
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
FPGA Central
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT Project
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
ODSA Workgroup
Power 7 Overview
Power 7 Overview
lambertt
RDMA on ARM
RDMA on ARM
inside-BigData.com
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
Linaro
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Edge AI and Vision Alliance
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
Sashikris
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 Presentation
Netronome
Ähnlich wie Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
(20)
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
Plan with confidence: Route to a successful Do178c multicore certification
Plan with confidence: Route to a successful Do178c multicore certification
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Arm - ceph on arm update
Arm - ceph on arm update
Arm Neoverse market update_05122020.pdf
Arm Neoverse market update_05122020.pdf
ARM cortex A15
ARM cortex A15
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devices
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
Power 7 Overview
Power 7 Overview
RDMA on ARM
RDMA on ARM
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 Presentation
Mehr von Arm
Project Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning Platform
Arm
IoTs Place in the World of 5G
IoTs Place in the World of 5G
Arm
AI Today, AI Tomorrow
AI Today, AI Tomorrow
Arm
An Amazing World of Possibilities (Computex 2017)
An Amazing World of Possibilities (Computex 2017)
Arm
The importance of strong entropy for iot
The importance of strong entropy for iot
Arm
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Arm
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?
Arm
Developing functional safety systems with arm architecture solutions stroud
Developing functional safety systems with arm architecture solutions stroud
Arm
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiu
Arm
A practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platforms
Arm
Sustainably Connecting a Global Community
Sustainably Connecting a Global Community
Arm
Mehr von Arm
(11)
Project Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning Platform
IoTs Place in the World of 5G
IoTs Place in the World of 5G
AI Today, AI Tomorrow
AI Today, AI Tomorrow
An Amazing World of Possibilities (Computex 2017)
An Amazing World of Possibilities (Computex 2017)
The importance of strong entropy for iot
The importance of strong entropy for iot
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
Optimizing ARM cortex a and cortex-m based heterogeneous multiprocessor syste...
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?
Developing functional safety systems with arm architecture solutions stroud
Developing functional safety systems with arm architecture solutions stroud
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiu
A practical approach to securing embedded and io t platforms
A practical approach to securing embedded and io t platforms
Sustainably Connecting a Global Community
Sustainably Connecting a Global Community
Kürzlich hochgeladen
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
apidays
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
The Digital Insurer
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Deepika Singh
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
apidays
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Sandro Moreira
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Nanddeep Nachan
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Elevate Developer Efficiency & build GenAI Application with Amazon Q
Elevate Developer Efficiency & build GenAI Application with Amazon Q
Bhuvaneswari Subramani
Kürzlich hochgeladen
(20)
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Architecting Cloud Native Applications
Architecting Cloud Native Applications
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Elevate Developer Efficiency & build GenAI Application with Amazon Q
Elevate Developer Efficiency & build GenAI Application with Amazon Q
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
1.
© 2017 Arm
Limited Peter Greenhalgh VP and GM of Central Technology Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
2.
© 2017 Arm
Limited2 Drones Wearable technology Smartwatch 3D printing Voice recognition Social media Technology innovations of 2013
3.
© 2017 Arm
Limited3 Looking ahead from edge to cloud The future requires a new approach to CPU design Safe and autonomous Hyper-efficient Secure private compute Cortex beyond mobile Mixed reality
4.
Confidential © Arm
20174 Arm DynamIQ Rearchitecting the compute experience Multi-core redefined for broad market Massive system performance uplift More intelligent systems
5.
© 2017 Arm
Limited5 Innovating for the scalable future Up to 8 CPUs ‘Octacore’ smartphones Dual cluster Heterogeneous processing Nearly “Unlimited” design spectrum Covers all existing use cases DynamIQ cluster Dynamic flexibility 2013 2017 Expanding Arm technology processor architecture for broad market Arm AMBA Arm big.LITTLE Arm CoreLink Arm TrustZone Arm NEON Key Arm technologies
6.
© 2017 Arm
Limited DSU – Broadening the reach of technology
7.
© 2017 Arm
Limited7 DynamIQ: New cluster design for new cores DynamIQ big.LITTLE systems: • Greater product differentiation and scalability • Improved energy efficiency and performance • SW compatibility with Energy Aware Scheduling (EAS) Private L2 and shared L3 caches • Local cache close to processors • L3 cache shared between all cores DynamIQ Shared Unit (DSU) • Contains L3, Snoop Control Unit (SCU) and all cluster interfaces 1b+4L1b+3L1b+2L 1b+7L Example DynamIQ big.LITTLE configurations .. AMBA4 ACE SCU Shared L3 cacheACP Cortex-A55 32b/64b Core Private L2 cache Async BridgesPeripheral Port Cortex-A75 32b/64b Core Private L2 cache DynamIQ Shared Unit (DSU) 2b+6L 4b+4L
8.
© 2017 Arm
Limited8 DynamIQ cluster 0 - 7 CoresCore 0 Snoop filter Power Management L3 Cache Bus I/F ACP and peripheral port I/F Core 7 Asynchronous bridges DynamIQ Shared Unit (DSU) DynamIQ Shared Unit (DSU) Streamlines traffic across bridges Advanced power management features Latency and bandwidth optimizations Support for multiple performance domains Scalable interfaces for edge to cloud applications Supports large amounts of local memory Low latency interfaces for closely coupled accelerators
9.
© 2017 Arm
Limited9 Level 3 cache memory system New memory system for Cortex-A clusters Integrated snoop filter to improve efficiency Enabling lower cache latencies DynamIQ cluster 0–7 Cores Core0 Snoop filter Power Mngmt L3 Cache Bus I/F ACP and peripheral port I/F Core7 Asynchronous bridges DynamIQ Shared Unit (DSU) L1 cache L2 cache Load to Use Cycles* Cortex-A53 Cortex-A55 Cortex-A73 Cortex-A75 L1 hit 3 2 3 3 L2 hit 13 6 19 8 L3 hit - 21 - 25 Interconnect boundary 20 21 26 25 L1 cache L2 cache
10.
© 2017 Arm
Limited10 Level 3 cache partition Infrastructure • Process 1 = data plane • Process 2 = control plane • Packet processing data sent through low latency ACP interface Sensors or I/O agents ACP Process 2 Core group 2 Example configuration with two Core groups in a DynamIQ cluster Group 1 Group 2 Core group 1 Process 1 L3 cache Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7 Reserved for external accelerators via ACP
11.
© 2017 Arm
Limited11 Level 3 cache partition Automotive • Each process could represent an independent ADAS algorithm • Sensors linked through low latency ACP interface Sensors or I/O agents ACP Process 4 Core group 4 Example configuration with four Core groups in a DynamIQ cluster Group 1,2 Group 4 Core group 1 Process 1 L3 cache Core0 Core1 Core6 Core7 Core group 2 Process 2 Core2 Core3 Core group 3 Process 3 Core4 Core5 Group 3 Reserved for external accelerators via ACP
12.
© 2017 Arm
Limited12 Increasing performance through cache stashing Enables reads/writes into the shared L3 cache or per-core L2 cache Allows closely coupled accelerators and I/O agents to gain access to core memory AMBA 5 CHI and Accelerator Coherency Port (ACP) can be used for cache stashing More throughput with Peripheral Port (PP) for acceleration, network, storage use-cases Accelerator or I/O CoreLink CMN-600 DMC-620 DMC-620 Agile System Cache DDR4 DDR4 L3 Cache L2 Cache CPU L2 Cache CPU L2 Cache CPU L2 Cache Cortex-A Agile System Cache L3 Cache L2 Cache CPU L2 Cache CPU L2 Cache CPU L2 Cache Cortex-A Stash critical data to any cache level DynamIQ cluster 0–7 Cores Core0 Snoop filter Power Mngmt L3 Cache Bus I/F ACP and peripheral port I/F Core7 Asynchronous bridges DynamIQ Shared Unit (DSU) L1 cache L2 cache L1 cache L2 cache
13.
© 2017 Arm
Limited13 Increasing performance through tight integration Offload acceleration Example application: Offload crypto acceleration I/O processing Example application: Packet processing in network systems DynamIQ cluster Accelerator (4) Writes result into Core memory (1) Configure registers for task (2) Fetches data from Core memory (3) Carries out acceleration ACP PP DynamIQ cluster I/O agent (4) Reads result from Core memory or sends data for onward processing (3) Processing completed (1) Writes data into Core memory (2) Carries out computation ACP PP
14.
© 2017 Arm
Limited14 Automotive and industrial safety and reliability ADAS and IVI compute performance • DynamIQ provides performance required for autonomous cars • Faster responsiveness DynamIQ: Functional Safety • Following ASIL D systematic flow • Provides higher safety integrity Industry’s broadest functional safety capable CPU portfolio Autonomous system Sense Perceive Decide Actuate Cortex-M Cortex-R Safety IslandApplication cores L3 Cache L2 Cache CPU L2 Cache CPU L2 Cache CPU L2 Cache Cortex-A Sensors SoC Lock-step core
15.
© 2017 Arm
Limited Cortex-A75 – Increasing Performance Cortex-A55 – Improving Efficiency
16.
© 2017 Arm
Limited16 New levels of performance for smart solutions Cortex-A75 Cortex-A55 All comparisons at ISO process and frequency Baseline to Cortex-A73 Baseline to Cortex-A53 1.21x 1.42x 1.97x 1.14x 1.22x SPECINT2006 SPECFP2006 LMBench memcpy Octane 2.0 Geekbench v4 1.22x 1.33x 1.16x 1.48x 1.34x SPECINT2006 SPECFP2006 LMBench memcpy Octane 2.0 Geekbench v4 All comparisons at ISO process and frequency
17.
© 2017 Arm
Limited17 Architecture and Pipelines Common features • Armv8.2-A Architecture • DynamIQ big.LITTLE Cortex-A75 – performance focussed • Out-of-Order, 11-13 stage integer pipeline Cortex-A55 – efficiency focussed • In-order, 8 stage integer pipeline ALU/INT (MAC) NEON/FP F0 Decode NEON/FP F1 Instruction Fetch Writeback Issue ALU/INT (DIV) Branch AGU Load AGU Store Cortex-A55 Int I0 (MUL) Decode Instruction Fetch Writeback Int I1 (DIV) AGU LD/ST AGU LD/ST Branch B Cortex-A75 Instruction Queue Writeback Rename Dispatch IsQ (12) IsQ (12) IsQ (8) IsQ (8) IsQ (20) Decode Rename IsQ (8) IsQ (8) IsQ (8) NEION/FP F1 NE/FP Store NEON/FP F0 Writeback
18.
© 2017 Arm
Limited18 Instruction Extraction & Parsing Instruction Queue FillBuffer Conditional PredictorL1 Instruction Cache AGU Indirect Predictor Branch Predictor Instruction fetch Common features • 4-way set associative • Virtually indexed, physically tagged (VIPT) • Decoupled from Cores thru instruction queue Cortex-A75 • 64KB • 4-wide instruction fetch Cortex-A55 • 16KB / 32KB / 64KB • 2-wide instruction fetch
19.
© 2017 Arm
Limited19 Instruction Extraction & Parsing Instruction Queue FillBuffer Conditional PredictorL1 Instruction Cache AGU Indirect Predictor Branch Predictor Branch prediction Cortex-A75 • Fine-tuned 0-cycle prediction • State of the art, mobile focussed, table based conditional prediction Cortex-A55 • Brand new 0-cycle predictors • New main conditional predictor - Neural network based • New loop predictors
20.
© 2017 Arm
Limited20 Cortex-A75: Datapaths 3-way superscalar high-performance pipeline • Single-cycle decode with instruction fusing and micro-ops 7 independent high-performance issue queues • 2x Load/Store, 2x NEON/FPU, 1x Branch and 2x Integer core Increased capacity to sustain operation under L1 miss / L2 hit • 12 entries for integer core to maximise on in- flight instructions and out-of-order capabilities • 8 entries for Load/Store and NEON/FPU Cortex-A75 Private L2 Cache Instruction Fetch Main TLB Arm Register File D.E. Register File Dispatch Issue 64k D-Cache STB 64k I-Cache Branch Prediction Decode Rename Load/Store Advanced NEON Floating Point ALUs iDIV MAC AGUs Writeback
21.
© 2017 Arm
Limited21 Cortex-A55: Datapaths Dual issue of loads and stores Improved latency for forwarding ALU results to the AGU • Reduced by one cycle for many common ALU operations Reduced L1 cache load-to-use latency for pointer chasing to two cycles Integer Register File NEON-FP Regfile NEON Pipe Decode Store Pipe x2 Cortex-A55 ALU Pipe ALU Pipe Integer Pipe Divide Pipe Mult Acc Shift ALU Shift ALU Load PipeAGU Data Cache Output Data Cache Address
22.
© 2017 Arm
Limited22 L1 memory system Common features • 4-way set associative • VIPT with PIPT programmer’s view • Improved prefetchers Cortex-A75 • 64KB • Wider load-store than Cortex-A73 • Support Read-after-Write OoO with filtering Cortex-A55 • 16KB / 32KB / 64KB • Improved store buffer bandwidth to L1 • Larger 16-entry L1-TLB Store Buffer L1 Data Cache Prefetcher L1 TLB L2 TLB L2 Cache
23.
© 2017 Arm
Limited23 Store Buffer L1 Data Cache Prefetcher L1 TLB L2 TLB L2 Cache L2 memory system Common features • Private L2 cache in each Core • Running at Core speed • Exclusive data cache • Cache stashing into the L2 • Non-blocking 1024-entry TLB for hit-under-miss Cortex-A75 • 256KB / 512 KB Cortex-A55 • 0KB / 64KB / 128KB / 256KB
24.
© 2017 Arm
Limited24 Next-generation features Dot product and half-precision float for AI/ML processing Virtualized Host Extensions (VHE) offering Type-2 hypervisor (KVM) performance improvements Cache stashing and atomic operations improves multicore networking performance and improves latency Cache clean to persistence to support storage class memory Infrastructure class RAS enhancement including data poisoning and improved error management
25.
© 2017 Arm
Limited25 Innovating for the scalable future 2013-2017: The nature of compute is changing the landscape Expanding Arm technologies for broad market applicability New cluster design with new DynamIQ cores: • Cortex-A75: Breakthrough performance • Cortex-A55: Efficiency redefined Functional safety for industrial and automotive applications New features expanding microarchitecture capabilities: • DynamIQ Shared Unit , new cache features, new branch prediction
26.
2626 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! © 2017
Arm Limited
27.
2727 © 2017
Arm Limited The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks
Jetzt herunterladen