SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Improving Real-Time Performance on
Multicore Platforms Using MemGuard
University of Kansas
Dr. Heechul Yun
10/28/2013
Multicore

Server

Desktop

Mobile

RT/Embedded

2
Challenges: Shared Resources
T1

T2

CPU

T
1

T
2

Core
1

T
3

T
4

Core
2

T
5

T
6

Core
3

Memory Hierarchy

T
8

Core
4

Memory Hierarchy

Unicore

T
7

Multicore

Performance Impact
3
Case Study
• HRT
– Synthetic real-time video capture
– P=20, D=13ms
– Cache-insensitive

• X-server
– Scrolling text on a gnome-terminal

• Hardware platform
– Intel Xeon 3530
– 8MB shared L3 cache
– 4GB DDR3 1333MHz DIMM (1ch)

HRT

Xsrv.

Core1

Core2
L3 (8MB)

DRAM

• CPU cores are isolated

A desktop PC
(Intel Xeon 3530)
4
HRT Time Distribution

solo

w/ Xserver

99pct: 10.2ms

99pct: 14.3ms

• 28% deadline violations
• Due to contention in DRAM
5
Outline
• Motivation
• Background
– DRAM basics
– Worst-case memory performance
– MemGuard*RTAS’13+

• Improving Real-Time Performance with
MemGuard
6
Background: DRAM Organization
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Bank
1

Bank
2

Bank
3

Bank
4

• Have multiple banks
• Different banks can be
accessed in parallel
Best-case
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Fast
• Peak = 10.6 GB/s

Bank
1

Bank
2

Bank
3

Bank
4

– DDR3 1333Mhz
Best-case
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Fast
• Peak = 10.6 GB/s

Bank
1

Bank
2

Bank
3

Bank
4

– DDR3 1333Mhz

• Out-of-order processors
Most-cases
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Mess
• Performance = ??

Bank
1

Bank
2

Bank
3

(*) Intel® 64 and IA-32 Architectures Optimization Reference Manual

Bank
4
Worst-case
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Slow
• 1bank b/w

Bank
1

Bank
2

Bank
3

(*) Intel® 64 and IA-32 Architectures Optimization Reference Manual

Bank
4

– Less than peak b/w
– How much?
Background: DRAM Operation
Bank 1
Row 1
Row 2
Row 3
Row 4
Row 5
activate

READ (Bank 1, Row 3, Col 7)
precharge
Col7

Row Buffer
Read/write

• Stateful per-bank access time
– Row miss: 19 cycles
– Row hit: 9 cycles
(*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting)
Real Worst-case
Core
1

Core
2

Core
3

Core
4

Request order
time

L3

Memory Controller (MC)

DRAM DIMM

Bank
1

Bank
2

Bank
3

Bank
4

Row 1
Row 2
Row 3
Row 4
Row 1
Row 2
…

1 bank & always row miss  ~1.2GB/s
Each core = ¼ x 1.2GB/s = 300MB/s ?

(*) Intel® 64 and IA-32 Architectures Optimization Reference Manual
Background: Memory Controller(MC)
Bruce Jacob et al, “Memory Systems: Cache, DRAM, Disk” Fig 13.1.

• Request queue(s)
– Not fair (open-row first  re-ordering)
– Unpredictable queuing delay
14
Challenges for Real-Time Systems
• Multiple parallel resources (banks)
• Stateful bank access latency
• Queuing delay

• Unpredictable memory performance

15
MemGuard *RTAS’13+
MemGuard

Operating System

Reclaim Manager

BW
0.6GB/s
Regulator

BW
0.2GB/s
Regulator

BW
0.2GB/s
Regulator

BW
0.2GB/s
Regulator

PMC
Core1

PMC
Core2

PMC
Core3

PMC
Core4

Memory Controller

Multicore Processor

DRAM DIMM

• Goal: guarantee minimum memory b/w for each core
• How: b/w reservation + best effort sharing
16
Reservation
• Idea
– Scheduler regulates per-core memory b/w using h/w counters
– Period = 1 scheduler tick (e.g., 1ms)
Suspend the RT idle task
2

Budget 1
Core

activity
0

1ms
Schedule a RT idle task
computation

2ms

memory fetch
17
Reservation

18
Best-Effort Sharing
time(ms)

Core0

Core1

900MB/s

300MB/s

0
throttled
reschedule
1
guaranteed b/w

2

best-effort b/w

• Spare Sharing *RTAS’13+
• Proportional Sharing [Unpublished TR]
19
Case Study
• HRT
– Synthetic real-time video capture
– P=20, D=13ms
– Cache-insensitive

• X-server
– Scrolling text on a gnome-terminal

• Hardware platform
– Intel Xeon 3530
– 8MB shared cache
– 4GB DDR3 1333MHz DIMM

HRT

Xsrv.

Core1

Core2
L3 (8MB)

DRAM
A desktop PC
(Intel Xeon 3530)
20
w/o MemGuard

HRT (solo)
HRT’s 99pct: 10.2ms

HRT (w/ Xserver)
HRT’s 99pct: 14.3ms
X’s CPU util: 78%

21
MemGuard
reserve only (HRT=900MB/s, X=300MB/s)

HRT (solo)
HRT’s 99pct: 10.7ms

HRT (w/ Xserver)
HRT’s 99pct: 11.2ms
X’s CPU util: 4%

22
MemGuard
reserve (HRT=900MB/s, X=300MB/s)+ best-effort sharing

HRT (solo)
HRT’s 99pct: 10.7ms

HRT (w/ Xserver)
HRT’s 99pct: 10.7ms
X’s CPU util: 48%

23
MemGuard
reserve (HRT=600MB/s, X=600MB/s)+ best-effort sharing

HRT (solo)
HRT’s 99pct: 10.9 ms

HRT (w/ Xserver)
HRT’s 99pct: 12.1ms
X’s CPU util: 61%

24
Real-Time Performance Improvement
HRT

X-server

• Using MemGuard, we can achieve
– No deadline miss for HRT
– Good X-server performance
25
Conclusion
• Unpredictable memory performance
– multiple resources(banks), per-bank state, unpredictable queueing delay

• MemGuard
– Guarantee minimum memory bandwidth for each core
– b/w reservation (guaranteed part) + best-effort sharing

• Case-study
– On Intel Xeon multicore platform, using HRT + X-server
– MemGuard can improve real-time performance efficiently

• Limitations and Future Work
– Coarse grain (a OS tick) enforcement
– Small guaranteed b/w  DRAM bank partitioning (submitted to RTAS’14)

https://github.com/heechul/memguard
26
Thank you.

27
Evaluation on Intel Core2
• T1: Synthetic video capture task (HRT)
– Period=20ms(50Hz)
– Deadline=14ms,
– Metrics: ACET, WCET, stdev, deadline miss ratio (out of 1000 periods)

• T2: Xserver, update screen (SRT)
– Metric: CPU utilization
• Higher CPU utilization  faster screen update

• Platform
– Intel Core2Quad 8400, 2MB L2 cache x 2,
tunable H/W prefetchers
– PC6400 DDR2 DRAM DIMM x 1

• Three platform configurations
– Exp1: Private L2, Prefetch=off
– Exp2: Private L2, Prefetch=on
– Exp3: Shared L2, Prefetch=on

Core0

Core1

Core2

L2 (pref.)

Core3

L2 (pref.)
DRAM

Intel Core2Quad based PC
28
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

deadline

solo

corun

T1

Private L2
Prefetch=off

Performance guarantee

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
29
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

30%
WCET

WCET

ACET

solo

corun

T1

Private L2
Prefetch=off

Performance guarantee

deadline

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
30
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

deadline

solo

corun

T1

Private L2
Prefetch=off

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
31
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

deadline

solo

corun

T1

Private L2
Prefetch=off

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
32
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

Performance target

solo

corun

T1

Private L2
Prefetch=off

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
33
T1's exec. Time (ms)

Experiment 2: Prefetcher
24
22
20
18
16
14
12
10
8
6
4
2
0

Not enough reserv.
More slowdown

deadline

60%

solo

corun

T1

Private L2
Prefetch=ON

Deadline violation

94%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

33%
T2

T1

82%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
34
T1's exec. Time (ms)

Experiment 2-2
18
16
14
12
10
8
6
4
2
0

Enough reserv.
60%

solo

corun

T1

Private L2
Prefetch=ON

No deadline violation

94%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

14%
T2

T1

69%
T2

900M/s

200M/s

900M/s

200M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
35
T1's exec. Times (ms)

Experiment 3: Shared Cache
24
22
20
18
16
14
12
10
8
6
4
2
0

Even more slowdown
Minimum reserv.

108%

solo

corun

solo

corun

No deadline violation

solo

corun

T1

11%
T2

T1

63%
T2

T1

Shared L2
Prefetch=ON

92%
T2

900M/s

200M/s

900M/s

200M/s

Core1

Core2

Core1

Core2

Core1

Core2

L2
DRAM

L2
DRAM

L2
DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
36

Weitere ähnliche Inhalte

Was ist angesagt?

淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technologySZ Lin
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...The Linux Foundation
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)Linaro
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in LinuxRaghu Udiyar
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal BootloaderSatpal Parmar
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAnh Dung NGUYEN
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelAdrian Huang
 
Project ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementationProject ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementationGeoffroy Van Cutsem
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Michelle Holley
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSELiang Yan
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Deepak Shankar
 
Linux scheduling and input and output
Linux scheduling and input and outputLinux scheduling and input and output
Linux scheduling and input and outputSanidhya Chugh
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64Yi-Hsiu Hsu
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreAMD
 

Was ist angesagt? (20)

Bootcamp linux commands
Bootcamp linux commandsBootcamp linux commands
Bootcamp linux commands
 
淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technology
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
 
Introduction to CXL Fabrics
Introduction to CXL FabricsIntroduction to CXL Fabrics
Introduction to CXL Fabrics
 
Introduction to Modern U-Boot
Introduction to Modern U-BootIntroduction to Modern U-Boot
Introduction to Modern U-Boot
 
Reverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux KernelReverse Mapping (rmap) in Linux Kernel
Reverse Mapping (rmap) in Linux Kernel
 
Project ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementationProject ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementation
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSE
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
Linux scheduling and input and output
Linux scheduling and input and outputLinux scheduling and input and output
Linux scheduling and input and output
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 

Andere mochten auch

IBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller OverviewIBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller Overviewndavenport
 
Bn1013 demo sap success factors
Bn1013 demo  sap success factorsBn1013 demo  sap success factors
Bn1013 demo sap success factorsconline training
 
An Agile approach to Business Metrics
An Agile approach to Business MetricsAn Agile approach to Business Metrics
An Agile approach to Business MetricsPablo Valcárcel
 
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...Mike KEPPELL
 
Agile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The WorkshopAgile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The WorkshopHiroshi Hiromoto
 
Agile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtAgile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtVincent Burckhardt
 
The difference between a KPI and a Metric
The difference between a KPI and a MetricThe difference between a KPI and a Metric
The difference between a KPI and a MetricDennis Mortensen
 
People Analytics: State of the Market - Top Ten List
People Analytics:  State of the Market - Top Ten ListPeople Analytics:  State of the Market - Top Ten List
People Analytics: State of the Market - Top Ten ListJosh Bersin
 
Lean Agile Metrics And KPIs
Lean Agile Metrics And KPIsLean Agile Metrics And KPIs
Lean Agile Metrics And KPIsYuval Yeret
 
Digital transformation in 50 soundbites
Digital transformation in 50 soundbitesDigital transformation in 50 soundbites
Digital transformation in 50 soundbitesJulie Dodd
 

Andere mochten auch (14)

IBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller OverviewIBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller Overview
 
Bn1013 demo sap success factors
Bn1013 demo  sap success factorsBn1013 demo  sap success factors
Bn1013 demo sap success factors
 
An Agile approach to Business Metrics
An Agile approach to Business MetricsAn Agile approach to Business Metrics
An Agile approach to Business Metrics
 
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
 
Agile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The WorkshopAgile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The Workshop
 
Agile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtAgile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is built
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
 
The difference between a KPI and a Metric
The difference between a KPI and a MetricThe difference between a KPI and a Metric
The difference between a KPI and a Metric
 
People Analytics: State of the Market - Top Ten List
People Analytics:  State of the Market - Top Ten ListPeople Analytics:  State of the Market - Top Ten List
People Analytics: State of the Market - Top Ten List
 
Lean Agile Metrics And KPIs
Lean Agile Metrics And KPIsLean Agile Metrics And KPIs
Lean Agile Metrics And KPIs
 
Agile KPIs
Agile KPIsAgile KPIs
Agile KPIs
 
Positive attitude ppt
Positive attitude pptPositive attitude ppt
Positive attitude ppt
 
Digital transformation in 50 soundbites
Digital transformation in 50 soundbitesDigital transformation in 50 soundbites
Digital transformation in 50 soundbites
 
Digital in 2016
Digital in 2016Digital in 2016
Digital in 2016
 

Ähnlich wie Improving Real-Time Performance on Multicore Platforms using MemGuard

Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsHeechul Yun
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureHeechul Yun
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Tokyo Institute of Technology
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Heechul Yun
 
Volatile Uses for Persistent Memory
Volatile Uses for Persistent MemoryVolatile Uses for Persistent Memory
Volatile Uses for Persistent MemoryIntel® Software
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsJoshua Mora
 
trends of microprocessor field
trends of microprocessor fieldtrends of microprocessor field
trends of microprocessor fieldRamya SK
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsHeechul Yun
 
UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6edlangley
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationedlangley
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAlan Renouf
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentationSaket Vihari
 

Ähnlich wie Improving Real-Time Performance on Multicore Platforms using MemGuard (20)

Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Cpu spec
Cpu specCpu spec
Cpu spec
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...
 
Volatile Uses for Persistent Memory
Volatile Uses for Persistent MemoryVolatile Uses for Persistent Memory
Volatile Uses for Persistent Memory
 
Tacc Infinite Memory Engine
Tacc Infinite Memory EngineTacc Infinite Memory Engine
Tacc Infinite Memory Engine
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systems
 
trends of microprocessor field
trends of microprocessor fieldtrends of microprocessor field
trends of microprocessor field
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentation
 
BURA Supercomputer
BURA SupercomputerBURA Supercomputer
BURA Supercomputer
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 

Kürzlich hochgeladen

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Kürzlich hochgeladen (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Improving Real-Time Performance on Multicore Platforms using MemGuard

  • 1. Improving Real-Time Performance on Multicore Platforms Using MemGuard University of Kansas Dr. Heechul Yun 10/28/2013
  • 3. Challenges: Shared Resources T1 T2 CPU T 1 T 2 Core 1 T 3 T 4 Core 2 T 5 T 6 Core 3 Memory Hierarchy T 8 Core 4 Memory Hierarchy Unicore T 7 Multicore Performance Impact 3
  • 4. Case Study • HRT – Synthetic real-time video capture – P=20, D=13ms – Cache-insensitive • X-server – Scrolling text on a gnome-terminal • Hardware platform – Intel Xeon 3530 – 8MB shared L3 cache – 4GB DDR3 1333MHz DIMM (1ch) HRT Xsrv. Core1 Core2 L3 (8MB) DRAM • CPU cores are isolated A desktop PC (Intel Xeon 3530) 4
  • 5. HRT Time Distribution solo w/ Xserver 99pct: 10.2ms 99pct: 14.3ms • 28% deadline violations • Due to contention in DRAM 5
  • 6. Outline • Motivation • Background – DRAM basics – Worst-case memory performance – MemGuard*RTAS’13+ • Improving Real-Time Performance with MemGuard 6
  • 7. Background: DRAM Organization Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Bank 1 Bank 2 Bank 3 Bank 4 • Have multiple banks • Different banks can be accessed in parallel
  • 8. Best-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Fast • Peak = 10.6 GB/s Bank 1 Bank 2 Bank 3 Bank 4 – DDR3 1333Mhz
  • 9. Best-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Fast • Peak = 10.6 GB/s Bank 1 Bank 2 Bank 3 Bank 4 – DDR3 1333Mhz • Out-of-order processors
  • 10. Most-cases Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Mess • Performance = ?? Bank 1 Bank 2 Bank 3 (*) Intel® 64 and IA-32 Architectures Optimization Reference Manual Bank 4
  • 11. Worst-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Slow • 1bank b/w Bank 1 Bank 2 Bank 3 (*) Intel® 64 and IA-32 Architectures Optimization Reference Manual Bank 4 – Less than peak b/w – How much?
  • 12. Background: DRAM Operation Bank 1 Row 1 Row 2 Row 3 Row 4 Row 5 activate READ (Bank 1, Row 3, Col 7) precharge Col7 Row Buffer Read/write • Stateful per-bank access time – Row miss: 19 cycles – Row hit: 9 cycles (*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting)
  • 13. Real Worst-case Core 1 Core 2 Core 3 Core 4 Request order time L3 Memory Controller (MC) DRAM DIMM Bank 1 Bank 2 Bank 3 Bank 4 Row 1 Row 2 Row 3 Row 4 Row 1 Row 2 … 1 bank & always row miss  ~1.2GB/s Each core = ¼ x 1.2GB/s = 300MB/s ? (*) Intel® 64 and IA-32 Architectures Optimization Reference Manual
  • 14. Background: Memory Controller(MC) Bruce Jacob et al, “Memory Systems: Cache, DRAM, Disk” Fig 13.1. • Request queue(s) – Not fair (open-row first  re-ordering) – Unpredictable queuing delay 14
  • 15. Challenges for Real-Time Systems • Multiple parallel resources (banks) • Stateful bank access latency • Queuing delay • Unpredictable memory performance 15
  • 16. MemGuard *RTAS’13+ MemGuard Operating System Reclaim Manager BW 0.6GB/s Regulator BW 0.2GB/s Regulator BW 0.2GB/s Regulator BW 0.2GB/s Regulator PMC Core1 PMC Core2 PMC Core3 PMC Core4 Memory Controller Multicore Processor DRAM DIMM • Goal: guarantee minimum memory b/w for each core • How: b/w reservation + best effort sharing 16
  • 17. Reservation • Idea – Scheduler regulates per-core memory b/w using h/w counters – Period = 1 scheduler tick (e.g., 1ms) Suspend the RT idle task 2 Budget 1 Core activity 0 1ms Schedule a RT idle task computation 2ms memory fetch 17
  • 19. Best-Effort Sharing time(ms) Core0 Core1 900MB/s 300MB/s 0 throttled reschedule 1 guaranteed b/w 2 best-effort b/w • Spare Sharing *RTAS’13+ • Proportional Sharing [Unpublished TR] 19
  • 20. Case Study • HRT – Synthetic real-time video capture – P=20, D=13ms – Cache-insensitive • X-server – Scrolling text on a gnome-terminal • Hardware platform – Intel Xeon 3530 – 8MB shared cache – 4GB DDR3 1333MHz DIMM HRT Xsrv. Core1 Core2 L3 (8MB) DRAM A desktop PC (Intel Xeon 3530) 20
  • 21. w/o MemGuard HRT (solo) HRT’s 99pct: 10.2ms HRT (w/ Xserver) HRT’s 99pct: 14.3ms X’s CPU util: 78% 21
  • 22. MemGuard reserve only (HRT=900MB/s, X=300MB/s) HRT (solo) HRT’s 99pct: 10.7ms HRT (w/ Xserver) HRT’s 99pct: 11.2ms X’s CPU util: 4% 22
  • 23. MemGuard reserve (HRT=900MB/s, X=300MB/s)+ best-effort sharing HRT (solo) HRT’s 99pct: 10.7ms HRT (w/ Xserver) HRT’s 99pct: 10.7ms X’s CPU util: 48% 23
  • 24. MemGuard reserve (HRT=600MB/s, X=600MB/s)+ best-effort sharing HRT (solo) HRT’s 99pct: 10.9 ms HRT (w/ Xserver) HRT’s 99pct: 12.1ms X’s CPU util: 61% 24
  • 25. Real-Time Performance Improvement HRT X-server • Using MemGuard, we can achieve – No deadline miss for HRT – Good X-server performance 25
  • 26. Conclusion • Unpredictable memory performance – multiple resources(banks), per-bank state, unpredictable queueing delay • MemGuard – Guarantee minimum memory bandwidth for each core – b/w reservation (guaranteed part) + best-effort sharing • Case-study – On Intel Xeon multicore platform, using HRT + X-server – MemGuard can improve real-time performance efficiently • Limitations and Future Work – Coarse grain (a OS tick) enforcement – Small guaranteed b/w  DRAM bank partitioning (submitted to RTAS’14) https://github.com/heechul/memguard 26
  • 28. Evaluation on Intel Core2 • T1: Synthetic video capture task (HRT) – Period=20ms(50Hz) – Deadline=14ms, – Metrics: ACET, WCET, stdev, deadline miss ratio (out of 1000 periods) • T2: Xserver, update screen (SRT) – Metric: CPU utilization • Higher CPU utilization  faster screen update • Platform – Intel Core2Quad 8400, 2MB L2 cache x 2, tunable H/W prefetchers – PC6400 DDR2 DRAM DIMM x 1 • Three platform configurations – Exp1: Private L2, Prefetch=off – Exp2: Private L2, Prefetch=on – Exp3: Shared L2, Prefetch=on Core0 Core1 Core2 L2 (pref.) Core3 L2 (pref.) DRAM Intel Core2Quad based PC 28
  • 29. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 deadline solo corun T1 Private L2 Prefetch=off Performance guarantee 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 29
  • 30. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 30% WCET WCET ACET solo corun T1 Private L2 Prefetch=off Performance guarantee deadline 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 30
  • 31. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 deadline solo corun T1 Private L2 Prefetch=off 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 31
  • 32. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 deadline solo corun T1 Private L2 Prefetch=off 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 32
  • 33. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 Performance target solo corun T1 Private L2 Prefetch=off 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 33
  • 34. T1's exec. Time (ms) Experiment 2: Prefetcher 24 22 20 18 16 14 12 10 8 6 4 2 0 Not enough reserv. More slowdown deadline 60% solo corun T1 Private L2 Prefetch=ON Deadline violation 94% T2 Core1 Core2 L2 L2 solo corun solo corun T1 33% T2 T1 82% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 34
  • 35. T1's exec. Time (ms) Experiment 2-2 18 16 14 12 10 8 6 4 2 0 Enough reserv. 60% solo corun T1 Private L2 Prefetch=ON No deadline violation 94% T2 Core1 Core2 L2 L2 solo corun solo corun T1 14% T2 T1 69% T2 900M/s 200M/s 900M/s 200M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 35
  • 36. T1's exec. Times (ms) Experiment 3: Shared Cache 24 22 20 18 16 14 12 10 8 6 4 2 0 Even more slowdown Minimum reserv. 108% solo corun solo corun No deadline violation solo corun T1 11% T2 T1 63% T2 T1 Shared L2 Prefetch=ON 92% T2 900M/s 200M/s 900M/s 200M/s Core1 Core2 Core1 Core2 Core1 Core2 L2 DRAM L2 DRAM L2 DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 36

Hinweis der Redaktion

  1. Soon more rt/embedded systems will use multicore as well.
  2. In the unicore systems, CPU time is the most important shared resource determining application’s performance. In the multicore systems, however, memory performance is also very important as multiple cores can concurrently access the memory and affect performance in significant ways.
  3. 5
  4. Problem 1: co-ordinate memory slot with tasks  require program modification(PREM)Problem 2: only 1 core can access memory at a time  do not fully utilize memory level parallelism
  5. First, let me explain how b/w regulator works.
  6. Why we want to regulate the request rates?
  7. 5
  8. Problem: DRAM
  9. Problem: DRAM
  10. Problem: DRAM
  11. Problem: DRAM
  12. Problem: DRAM
  13. Problem: DRAM
  14. Problem: DRAM
  15. Problem: DRAM