SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Deterministic Memory Abstraction and
Supporting Multicore System Architecture
Farzad Farshchi$, Prathap Kumar Valsan^, Renato Mancuso*, Heechul Yun$
$ University of Kansas, ^ Intel, * Boston University
Multicore Processors in CPS
• Provide high computing performance needed for intelligent CPS
• Allow consolidation reducing cost, size, weight, and power.
2
Challenge: Shared Memory Hierarchy
3
• Memory performance varies widely due to interference
• Task WCET can be extremely pessimistic
Core1 Core2 Core3 Core4
Memory Controller (MC)
Shared Cache
DRAM
Task 1 Task 2 Task 3 Task 4
I D I D I D I D
Challenge: Shared Memory Hierarchy
• Many shared resources: cache space, MSHRs, dram banks, MC buffers, …
• Each optimized for performance with no high-level insight
• Very poor worst-case behavior: >10X, >100X observed in real platforms
• even after cache partitioning is applied.
4
DRAM
LLC
Core1 Core2 Core3 Core4
subject co-runner(s)
IsolBench EEMBC, SD-VBS on Tegra TK1
- P. Valsan, et al., “Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems.” RTAS, 2016
- P. Valsan, et al., “Addressing Isolation Challenges of Non-blocking Caches for Multicore Real-Time Systems.” Real-time Systems Journal. 2017
The Virtual Memory Abstraction
• Program’s logical view of memory
• No concept of timing
• OS maps any available physical memory blocks
• Hardware treats all memory requests as same
• But some memory may be more important
• E.g., code and data memory in a time critical control loop
• Prevents OS and hardware from making informed allocation and
scheduling decisions that affect memory timing
5
New memory abstraction is needed!
Silberschatz et al.,
“Operating System Concepts.”
Outline
• Motivation
• Our Approach
• Deterministic Memory Abstraction
• DM-Aware Multicore System Design
• Implementation
• Evaluation
• Conclusion
6
Deterministic Memory (DM) Abstraction
• Cross-layer memory abstraction with bounded worst-case timing
• Focusing on tightly bounded inter-core interference
• Best-effort memory = conventional memory abstraction
7
Inter-core
interference
Inherent
timing
legend
Best-effort
memory
Deterministic
memory
Worst-case
memory
delay
DM-Aware Resource Management Strategies
• Deterministic memory: Optimize for time predictability
• Best effort memory: Optimize for average performance
8
Space allocation Request scheduling WCET bounds
Deterministic
memory
Dedicated resources Predictability
focused
Tight
Best-effort memory Shared resources Performance
focused
Pessimistic
Space allocation: e.g., cache space, DRAM banks
Request scheduling: e.g., DRAM controller, shared buses
System Design: Overview
• Declare all or part of RT task’s address space as deterministic memory
• End-to-end DM-aware resource management: from OS to hardware
• Assumption: partitioned fixed-priority real-time CPU scheduling
9
Application view (logical) System-level view (physical)
Deterministic
memory
Best-effort
memory
Deterministic Memory-Aware Memory Hierarchy
Core1 Core2 Core3 Core4
W
5
W
1
W
2
W
3
W
4
I D I D I D I D
B
1
B
2
B
3
B
4
B
5
B
6
B
7
B
8
DRAM banks
Cache ways
OS and Architecture Extensions for DM
• OS updates the DM bit in each page table entry (PTE)
• MMU/TLB and bus carries the DM bit.
• Shared memory hierarchy (cache, memory ctrl.) uses the DM bit.
10
OS
MMU/TLB
Page table entry
DM bit=1|0
LLC MCPaddr, DM Paddr, DM
Paddr, DM
Vaddr DRAM cmd/addr.
hardware
software
DM-Aware Shared Cache
• Based on cache way partitioning
• Improve space utilization via DM-aware cache replacement algorithm
• Deterministic memory: allocated on its own dedicated partition
• Best-effort memory: allocated on any non-DM lines in any partition.
• Cleanup: DM lines are recycled as best-effort lines at each OS context switch
11
Way 0 Way 1 Way 2 Way 3
Core 0
partition
Core 1
partition
best-effort line (any core, DM=0)
deterministic line (Core0, DM=1)
deterministic line (Core1, DM=1)
Way 4
shared
partition
Set 0
Set 1
Set 2
Set 3
0
DM Tag Line data
DM-Aware Memory (DRAM) Controller
• OS allocates DM pages on reserved banks, others on shared banks
• Memory-controller implements two-level scheduling (*)
12
Determinism
focused scheduling
(Round-Robin)
Throughput focused
scheduling
(FR-FCFS)
yes noDeterministic
memory request
exist?
(a) Memory controller (MC) architecture (b) Scheduling algorithm
(*) P. Valsan, et al., “MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore based Embedded Systems.” CPSNA, 2015
Implementation
• Fully functional end-to-end implementation in Linux and Gem5.
• Linux kernel extensions
• Use an unused bit combination in ARMv7 PTE to indicate a DM page.
• Modified the ELF loader and system calls to declare DM memory regions
• DM-aware page allocator, replacing the buddy allocator, based on PALLOC (*)
• Dedicated DRAM banks for DM are configured through Linux’s CGROUP interface.
ARMv7 page table entry (PTE)
13
(*) H. Yun et. al, “PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms.” RTAS’14
Implementation
• Gem5 (a cycle-accurate full-system simulator) extensions
• MMU/TLB: Add DM bit support
• Bus: Add the DM bit in each bus transaction
• Cache: implement way-partitioning and DM-aware replacement algorithm
• DRAM controller: implement DM-aware two-level scheduling algorithm.
• Hardware implementability
• MMU/TLB, Bus: adding 1 bit is not difficult. (e.g., Use AXI bus QoS bits)
• Cache and DRAM controllers: logic change and additional storage are
minimal.
14
Outline
• Motivation
• Our Approach
• Deterministic Memory Abstraction
• DM-Aware Multicore System Design
• Implementation
• Evaluation
• Conclusion
15
Simulation Setup
• Baseline Gem5 full-system simulator
• 4 out-of-order cores, 2 GHz
• 2 MB Shared L2 (16 ways)
• LPDDR2@533MHz, 1 rank, 8 banks
• Linux 3.14 (+DM support)
• Workload
• EEMBC, SD-VBS, SPEC2006, IsolBench
16
Real-Time Benchmark Characteristics
• Critical pages: pages accessed in the main loop
• Critical (T98/T90): pages accounting 98/90% L1 misses of all L1 misses of the critical pages.
• Only 38% of pages are critical pages
• Some pages contribute more to L1 misses (hence shared L2 accesses)
• Not all memory is equally important
svm L1 miss distribution
17
Effects of DM-Aware Cache
• Questions
• Does it provide strong cache isolation for real-time tasks using DM?
• Does it improve cache space utilization for best-effort tasks?
• Setup
• RT: EEMBC, SD-VBS, co-runners: 3x Bandwidth
• Comparisons
• NoP: free-for-all sharing. No partitioning
• WP: cache way partitioning (4 ways/core)
• DM(A): all critical pages of a benchmark are marked as DM
• DM(T98): critical pages accounting 98% L1 misses are marked as DM
• DM(T90): critical pages accounting 98% L1 misses are marked as DM
18
DRAM
LLC
Core1 Core2 Core3 Core4
RT co-runner(s)
Effects of DM-Aware Cache
• Does it provide strong cache isolation for real-time tasks using DM?
• Yes. DM(A) and WP (way-partitioning) offer equivalent isolation.
• Does it improve cache space utilization for best-effort tasks?
• Yes. DM(A) uses only 50% cache partition of WP.
19
Effects of DM-Aware DRAM Controller
• Questions
• Does it provide strong isolation for real-time tasks using DM?
• Does it reduce reserved DRAM bank space?
• Setup
• RT: SD-VBS (input: CIF), co-runners: 3x Bandwidth
• Comparisons
• BA & FR-FCFS: Linux’s default buddy allocator + FR-FCFS scheduling in MC
• DM(A): DM on private DRAM banks + two-level scheduling in MC
• DM(T98): same as DM(A), except pages accounting 98% L1 misses are DM
• DM(T90): same as DM(A), except pages accounting 90% L1 misses are DM
20
Effects of DM-Aware DRAM Controller
• Does it provide strong isolation for real-time tasks using DM?
• Yes. BA&FR-FCFS suffers 5.7X slowdown.
• Does it reduce reserved DRAM bank space?
• Yes. Only 51% of pages are marked deterministic in DM(T90)
21
Conclusion
• Challenge
• Balancing performance and predictability in multicore real-time systems
• Memory timing is important to WCET
• The current memory abstraction is limiting: no concept of timing.
• Deterministic Memory Abstraction
• Memory with tightly bounded worst-case timing.
• Enable predictable and high-performance multicore systems
• DM-aware multicore system designs
• OS, MMU/TLB, bus support
• DM-aware cache and DRAM controller designs
• Implemented and evaluated in Linux kernel and gem5
• Availability
• https://github.com/CSL-KU/detmem
22
Ongoing/Future Work
• SoC implementation in FPGA
• Based on open-source RISC-V quad-core SoC
• Basic DM support in bus protocol and Linux
• Implementing DM-aware cache and DRAM controllers
• Tool support and other applications
• Finding “optimal” deterministic memory blocks
• Better timing analysis integration (initial work in the paper)
• Closing micro-architectural side-channels.
23
Thank You
Disclaimer:
Our research has been supported by the National Science
Foundation (NSF) under the grant number CNS 1718880
24

Weitere ähnliche Inhalte

Was ist angesagt?

Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memoryMazin Alwaaly
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory managementMukesh Chinta
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsPeter Tröger
 
Windows memory management
Windows memory managementWindows memory management
Windows memory managementTech_MX
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating SystemRaj Mohan
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systemsvampugani
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OSC.U
 
Operating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementOperating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementAjit Nayak
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsDrishti Bhalla
 

Was ist angesagt? (20)

Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Memory management
Memory managementMemory management
Memory management
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory management
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
 
Windows memory management
Windows memory managementWindows memory management
Windows memory management
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating System
 
ppt
pptppt
ppt
 
cache
cachecache
cache
 
OSCh9
OSCh9OSCh9
OSCh9
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systems
 
Memory management OS
Memory management OSMemory management OS
Memory management OS
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
 
VIRTUAL MEMORY
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORY
 
Memory managment
Memory managmentMemory managment
Memory managment
 
Operating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementOperating Systems Part III-Memory Management
Operating Systems Part III-Memory Management
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating Systems
 

Ähnlich wie Deterministic Memory Abstraction and Supporting Multicore System Architecture

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsHeechul Yun
 
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsParallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsHeechul Yun
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Hsien-Hsin Sean Lee, Ph.D.
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization2022002857mbit
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internalsSisimon Soman
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Heechul Yun
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardHeechul Yun
 
Spectrum Scale Memory Usage
Spectrum Scale Memory UsageSpectrum Scale Memory Usage
Spectrum Scale Memory UsageTomer Perry
 
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...Heechul Yun
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxssuser30e7d2
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKofficeaiotfab
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 

Ähnlich wie Deterministic Memory Abstraction and Supporting Multicore System Architecture (20)

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore SystemsParallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuard
 
Memory
MemoryMemory
Memory
 
Spectrum Scale Memory Usage
Spectrum Scale Memory UsageSpectrum Scale Memory Usage
Spectrum Scale Memory Usage
 
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Memory (Computer Organization)
Memory (Computer Organization)Memory (Computer Organization)
Memory (Computer Organization)
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Kinetic basho public
Kinetic basho publicKinetic basho public
Kinetic basho public
 

Kürzlich hochgeladen

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Kürzlich hochgeladen (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Deterministic Memory Abstraction and Supporting Multicore System Architecture

  • 1. Deterministic Memory Abstraction and Supporting Multicore System Architecture Farzad Farshchi$, Prathap Kumar Valsan^, Renato Mancuso*, Heechul Yun$ $ University of Kansas, ^ Intel, * Boston University
  • 2. Multicore Processors in CPS • Provide high computing performance needed for intelligent CPS • Allow consolidation reducing cost, size, weight, and power. 2
  • 3. Challenge: Shared Memory Hierarchy 3 • Memory performance varies widely due to interference • Task WCET can be extremely pessimistic Core1 Core2 Core3 Core4 Memory Controller (MC) Shared Cache DRAM Task 1 Task 2 Task 3 Task 4 I D I D I D I D
  • 4. Challenge: Shared Memory Hierarchy • Many shared resources: cache space, MSHRs, dram banks, MC buffers, … • Each optimized for performance with no high-level insight • Very poor worst-case behavior: >10X, >100X observed in real platforms • even after cache partitioning is applied. 4 DRAM LLC Core1 Core2 Core3 Core4 subject co-runner(s) IsolBench EEMBC, SD-VBS on Tegra TK1 - P. Valsan, et al., “Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems.” RTAS, 2016 - P. Valsan, et al., “Addressing Isolation Challenges of Non-blocking Caches for Multicore Real-Time Systems.” Real-time Systems Journal. 2017
  • 5. The Virtual Memory Abstraction • Program’s logical view of memory • No concept of timing • OS maps any available physical memory blocks • Hardware treats all memory requests as same • But some memory may be more important • E.g., code and data memory in a time critical control loop • Prevents OS and hardware from making informed allocation and scheduling decisions that affect memory timing 5 New memory abstraction is needed! Silberschatz et al., “Operating System Concepts.”
  • 6. Outline • Motivation • Our Approach • Deterministic Memory Abstraction • DM-Aware Multicore System Design • Implementation • Evaluation • Conclusion 6
  • 7. Deterministic Memory (DM) Abstraction • Cross-layer memory abstraction with bounded worst-case timing • Focusing on tightly bounded inter-core interference • Best-effort memory = conventional memory abstraction 7 Inter-core interference Inherent timing legend Best-effort memory Deterministic memory Worst-case memory delay
  • 8. DM-Aware Resource Management Strategies • Deterministic memory: Optimize for time predictability • Best effort memory: Optimize for average performance 8 Space allocation Request scheduling WCET bounds Deterministic memory Dedicated resources Predictability focused Tight Best-effort memory Shared resources Performance focused Pessimistic Space allocation: e.g., cache space, DRAM banks Request scheduling: e.g., DRAM controller, shared buses
  • 9. System Design: Overview • Declare all or part of RT task’s address space as deterministic memory • End-to-end DM-aware resource management: from OS to hardware • Assumption: partitioned fixed-priority real-time CPU scheduling 9 Application view (logical) System-level view (physical) Deterministic memory Best-effort memory Deterministic Memory-Aware Memory Hierarchy Core1 Core2 Core3 Core4 W 5 W 1 W 2 W 3 W 4 I D I D I D I D B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 DRAM banks Cache ways
  • 10. OS and Architecture Extensions for DM • OS updates the DM bit in each page table entry (PTE) • MMU/TLB and bus carries the DM bit. • Shared memory hierarchy (cache, memory ctrl.) uses the DM bit. 10 OS MMU/TLB Page table entry DM bit=1|0 LLC MCPaddr, DM Paddr, DM Paddr, DM Vaddr DRAM cmd/addr. hardware software
  • 11. DM-Aware Shared Cache • Based on cache way partitioning • Improve space utilization via DM-aware cache replacement algorithm • Deterministic memory: allocated on its own dedicated partition • Best-effort memory: allocated on any non-DM lines in any partition. • Cleanup: DM lines are recycled as best-effort lines at each OS context switch 11 Way 0 Way 1 Way 2 Way 3 Core 0 partition Core 1 partition best-effort line (any core, DM=0) deterministic line (Core0, DM=1) deterministic line (Core1, DM=1) Way 4 shared partition Set 0 Set 1 Set 2 Set 3 0 DM Tag Line data
  • 12. DM-Aware Memory (DRAM) Controller • OS allocates DM pages on reserved banks, others on shared banks • Memory-controller implements two-level scheduling (*) 12 Determinism focused scheduling (Round-Robin) Throughput focused scheduling (FR-FCFS) yes noDeterministic memory request exist? (a) Memory controller (MC) architecture (b) Scheduling algorithm (*) P. Valsan, et al., “MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore based Embedded Systems.” CPSNA, 2015
  • 13. Implementation • Fully functional end-to-end implementation in Linux and Gem5. • Linux kernel extensions • Use an unused bit combination in ARMv7 PTE to indicate a DM page. • Modified the ELF loader and system calls to declare DM memory regions • DM-aware page allocator, replacing the buddy allocator, based on PALLOC (*) • Dedicated DRAM banks for DM are configured through Linux’s CGROUP interface. ARMv7 page table entry (PTE) 13 (*) H. Yun et. al, “PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms.” RTAS’14
  • 14. Implementation • Gem5 (a cycle-accurate full-system simulator) extensions • MMU/TLB: Add DM bit support • Bus: Add the DM bit in each bus transaction • Cache: implement way-partitioning and DM-aware replacement algorithm • DRAM controller: implement DM-aware two-level scheduling algorithm. • Hardware implementability • MMU/TLB, Bus: adding 1 bit is not difficult. (e.g., Use AXI bus QoS bits) • Cache and DRAM controllers: logic change and additional storage are minimal. 14
  • 15. Outline • Motivation • Our Approach • Deterministic Memory Abstraction • DM-Aware Multicore System Design • Implementation • Evaluation • Conclusion 15
  • 16. Simulation Setup • Baseline Gem5 full-system simulator • 4 out-of-order cores, 2 GHz • 2 MB Shared L2 (16 ways) • LPDDR2@533MHz, 1 rank, 8 banks • Linux 3.14 (+DM support) • Workload • EEMBC, SD-VBS, SPEC2006, IsolBench 16
  • 17. Real-Time Benchmark Characteristics • Critical pages: pages accessed in the main loop • Critical (T98/T90): pages accounting 98/90% L1 misses of all L1 misses of the critical pages. • Only 38% of pages are critical pages • Some pages contribute more to L1 misses (hence shared L2 accesses) • Not all memory is equally important svm L1 miss distribution 17
  • 18. Effects of DM-Aware Cache • Questions • Does it provide strong cache isolation for real-time tasks using DM? • Does it improve cache space utilization for best-effort tasks? • Setup • RT: EEMBC, SD-VBS, co-runners: 3x Bandwidth • Comparisons • NoP: free-for-all sharing. No partitioning • WP: cache way partitioning (4 ways/core) • DM(A): all critical pages of a benchmark are marked as DM • DM(T98): critical pages accounting 98% L1 misses are marked as DM • DM(T90): critical pages accounting 98% L1 misses are marked as DM 18 DRAM LLC Core1 Core2 Core3 Core4 RT co-runner(s)
  • 19. Effects of DM-Aware Cache • Does it provide strong cache isolation for real-time tasks using DM? • Yes. DM(A) and WP (way-partitioning) offer equivalent isolation. • Does it improve cache space utilization for best-effort tasks? • Yes. DM(A) uses only 50% cache partition of WP. 19
  • 20. Effects of DM-Aware DRAM Controller • Questions • Does it provide strong isolation for real-time tasks using DM? • Does it reduce reserved DRAM bank space? • Setup • RT: SD-VBS (input: CIF), co-runners: 3x Bandwidth • Comparisons • BA & FR-FCFS: Linux’s default buddy allocator + FR-FCFS scheduling in MC • DM(A): DM on private DRAM banks + two-level scheduling in MC • DM(T98): same as DM(A), except pages accounting 98% L1 misses are DM • DM(T90): same as DM(A), except pages accounting 90% L1 misses are DM 20
  • 21. Effects of DM-Aware DRAM Controller • Does it provide strong isolation for real-time tasks using DM? • Yes. BA&FR-FCFS suffers 5.7X slowdown. • Does it reduce reserved DRAM bank space? • Yes. Only 51% of pages are marked deterministic in DM(T90) 21
  • 22. Conclusion • Challenge • Balancing performance and predictability in multicore real-time systems • Memory timing is important to WCET • The current memory abstraction is limiting: no concept of timing. • Deterministic Memory Abstraction • Memory with tightly bounded worst-case timing. • Enable predictable and high-performance multicore systems • DM-aware multicore system designs • OS, MMU/TLB, bus support • DM-aware cache and DRAM controller designs • Implemented and evaluated in Linux kernel and gem5 • Availability • https://github.com/CSL-KU/detmem 22
  • 23. Ongoing/Future Work • SoC implementation in FPGA • Based on open-source RISC-V quad-core SoC • Basic DM support in bus protocol and Linux • Implementing DM-aware cache and DRAM controllers • Tool support and other applications • Finding “optimal” deterministic memory blocks • Better timing analysis integration (initial work in the paper) • Closing micro-architectural side-channels. 23
  • 24. Thank You Disclaimer: Our research has been supported by the National Science Foundation (NSF) under the grant number CNS 1718880 24