SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Improving Real-Time Performance on
Multicore Platforms Using MemGuard
University of Kansas
Dr. Heechul Yun
10/28/2013
Multicore

Server

Desktop

Mobile

RT/Embedded

2
Challenges: Shared Resources
T1

T2

CPU

T
1

T
2

Core
1

T
3

T
4

Core
2

T
5

T
6

Core
3

Memory Hierarchy

T
8

Core
4

Memory Hierarchy

Unicore

T
7

Multicore

Performance Impact
3
Case Study
• HRT
– Synthetic real-time video capture
– P=20, D=13ms
– Cache-insensitive

• X-server
– Scrolling text on a gnome-terminal

• Hardware platform
– Intel Xeon 3530
– 8MB shared L3 cache
– 4GB DDR3 1333MHz DIMM (1ch)

HRT

Xsrv.

Core1

Core2
L3 (8MB)

DRAM

• CPU cores are isolated

A desktop PC
(Intel Xeon 3530)
4
HRT Time Distribution

solo

w/ Xserver

99pct: 10.2ms

99pct: 14.3ms

• 28% deadline violations
• Due to contention in DRAM
5
Outline
• Motivation
• Background
– DRAM basics
– Worst-case memory performance
– MemGuard*RTAS’13+

• Improving Real-Time Performance with
MemGuard
6
Background: DRAM Organization
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Bank
1

Bank
2

Bank
3

Bank
4

• Have multiple banks
• Different banks can be
accessed in parallel
Best-case
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Fast
• Peak = 10.6 GB/s

Bank
1

Bank
2

Bank
3

Bank
4

– DDR3 1333Mhz
Best-case
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Fast
• Peak = 10.6 GB/s

Bank
1

Bank
2

Bank
3

Bank
4

– DDR3 1333Mhz

• Out-of-order processors
Most-cases
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Mess
• Performance = ??

Bank
1

Bank
2

Bank
3

(*) Intel® 64 and IA-32 Architectures Optimization Reference Manual

Bank
4
Worst-case
Core1

Core2

Core3

Core4

L3

Memory Controller (MC)

DRAM DIMM

Slow
• 1bank b/w

Bank
1

Bank
2

Bank
3

(*) Intel® 64 and IA-32 Architectures Optimization Reference Manual

Bank
4

– Less than peak b/w
– How much?
Background: DRAM Operation
Bank 1
Row 1
Row 2
Row 3
Row 4
Row 5
activate

READ (Bank 1, Row 3, Col 7)
precharge
Col7

Row Buffer
Read/write

• Stateful per-bank access time
– Row miss: 19 cycles
– Row hit: 9 cycles
(*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting)
Real Worst-case
Core
1

Core
2

Core
3

Core
4

Request order
time

L3

Memory Controller (MC)

DRAM DIMM

Bank
1

Bank
2

Bank
3

Bank
4

Row 1
Row 2
Row 3
Row 4
Row 1
Row 2
…

1 bank & always row miss  ~1.2GB/s
Each core = ¼ x 1.2GB/s = 300MB/s ?

(*) Intel® 64 and IA-32 Architectures Optimization Reference Manual
Background: Memory Controller(MC)
Bruce Jacob et al, “Memory Systems: Cache, DRAM, Disk” Fig 13.1.

• Request queue(s)
– Not fair (open-row first  re-ordering)
– Unpredictable queuing delay
14
Challenges for Real-Time Systems
• Multiple parallel resources (banks)
• Stateful bank access latency
• Queuing delay

• Unpredictable memory performance

15
MemGuard *RTAS’13+
MemGuard

Operating System

Reclaim Manager

BW
0.6GB/s
Regulator

BW
0.2GB/s
Regulator

BW
0.2GB/s
Regulator

BW
0.2GB/s
Regulator

PMC
Core1

PMC
Core2

PMC
Core3

PMC
Core4

Memory Controller

Multicore Processor

DRAM DIMM

• Goal: guarantee minimum memory b/w for each core
• How: b/w reservation + best effort sharing
16
Reservation
• Idea
– Scheduler regulates per-core memory b/w using h/w counters
– Period = 1 scheduler tick (e.g., 1ms)
Suspend the RT idle task
2

Budget 1
Core

activity
0

1ms
Schedule a RT idle task
computation

2ms

memory fetch
17
Reservation

18
Best-Effort Sharing
time(ms)

Core0

Core1

900MB/s

300MB/s

0
throttled
reschedule
1
guaranteed b/w

2

best-effort b/w

• Spare Sharing *RTAS’13+
• Proportional Sharing [Unpublished TR]
19
Case Study
• HRT
– Synthetic real-time video capture
– P=20, D=13ms
– Cache-insensitive

• X-server
– Scrolling text on a gnome-terminal

• Hardware platform
– Intel Xeon 3530
– 8MB shared cache
– 4GB DDR3 1333MHz DIMM

HRT

Xsrv.

Core1

Core2
L3 (8MB)

DRAM
A desktop PC
(Intel Xeon 3530)
20
w/o MemGuard

HRT (solo)
HRT’s 99pct: 10.2ms

HRT (w/ Xserver)
HRT’s 99pct: 14.3ms
X’s CPU util: 78%

21
MemGuard
reserve only (HRT=900MB/s, X=300MB/s)

HRT (solo)
HRT’s 99pct: 10.7ms

HRT (w/ Xserver)
HRT’s 99pct: 11.2ms
X’s CPU util: 4%

22
MemGuard
reserve (HRT=900MB/s, X=300MB/s)+ best-effort sharing

HRT (solo)
HRT’s 99pct: 10.7ms

HRT (w/ Xserver)
HRT’s 99pct: 10.7ms
X’s CPU util: 48%

23
MemGuard
reserve (HRT=600MB/s, X=600MB/s)+ best-effort sharing

HRT (solo)
HRT’s 99pct: 10.9 ms

HRT (w/ Xserver)
HRT’s 99pct: 12.1ms
X’s CPU util: 61%

24
Real-Time Performance Improvement
HRT

X-server

• Using MemGuard, we can achieve
– No deadline miss for HRT
– Good X-server performance
25
Conclusion
• Unpredictable memory performance
– multiple resources(banks), per-bank state, unpredictable queueing delay

• MemGuard
– Guarantee minimum memory bandwidth for each core
– b/w reservation (guaranteed part) + best-effort sharing

• Case-study
– On Intel Xeon multicore platform, using HRT + X-server
– MemGuard can improve real-time performance efficiently

• Limitations and Future Work
– Coarse grain (a OS tick) enforcement
– Small guaranteed b/w  DRAM bank partitioning (submitted to RTAS’14)

https://github.com/heechul/memguard
26
Thank you.

27
Evaluation on Intel Core2
• T1: Synthetic video capture task (HRT)
– Period=20ms(50Hz)
– Deadline=14ms,
– Metrics: ACET, WCET, stdev, deadline miss ratio (out of 1000 periods)

• T2: Xserver, update screen (SRT)
– Metric: CPU utilization
• Higher CPU utilization  faster screen update

• Platform
– Intel Core2Quad 8400, 2MB L2 cache x 2,
tunable H/W prefetchers
– PC6400 DDR2 DRAM DIMM x 1

• Three platform configurations
– Exp1: Private L2, Prefetch=off
– Exp2: Private L2, Prefetch=on
– Exp3: Shared L2, Prefetch=on

Core0

Core1

Core2

L2 (pref.)

Core3

L2 (pref.)
DRAM

Intel Core2Quad based PC
28
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

deadline

solo

corun

T1

Private L2
Prefetch=off

Performance guarantee

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
29
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

30%
WCET

WCET

ACET

solo

corun

T1

Private L2
Prefetch=off

Performance guarantee

deadline

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
30
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

deadline

solo

corun

T1

Private L2
Prefetch=off

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
31
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

deadline

solo

corun

T1

Private L2
Prefetch=off

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
32
T1’s exec. time (ms)

Experiment 1
18
16
14
12
10
8
6
4
2
0

Performance target

solo

corun

T1

Private L2
Prefetch=off

92%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

38%
T2

T1

78%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
33
T1's exec. Time (ms)

Experiment 2: Prefetcher
24
22
20
18
16
14
12
10
8
6
4
2
0

Not enough reserv.
More slowdown

deadline

60%

solo

corun

T1

Private L2
Prefetch=ON

Deadline violation

94%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

33%
T2

T1

82%
T2

550M/s

550M/s

550M/s

550M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
34
T1's exec. Time (ms)

Experiment 2-2
18
16
14
12
10
8
6
4
2
0

Enough reserv.
60%

solo

corun

T1

Private L2
Prefetch=ON

No deadline violation

94%
T2

Core1

Core2

L2

L2

solo

corun

solo

corun

T1

14%
T2

T1

69%
T2

900M/s

200M/s

900M/s

200M/s

Core1
L2

Core2
L2

Core1
L2

Core2
L2

DRAM

DRAM

DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
35
T1's exec. Times (ms)

Experiment 3: Shared Cache
24
22
20
18
16
14
12
10
8
6
4
2
0

Even more slowdown
Minimum reserv.

108%

solo

corun

solo

corun

No deadline violation

solo

corun

T1

11%
T2

T1

63%
T2

T1

Shared L2
Prefetch=ON

92%
T2

900M/s

200M/s

900M/s

200M/s

Core1

Core2

Core1

Core2

Core1

Core2

L2
DRAM

L2
DRAM

L2
DRAM

Original

MemGuard
(Reserve only)

MemGuard
(reclaim + share)
36

Weitere ähnliche Inhalte

Was ist angesagt?

Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architectureSHAJANA BASHEER
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307Linaro
 
What is Universal Flash Storage (UFS)?
What is Universal Flash Storage (UFS)?What is Universal Flash Storage (UFS)?
What is Universal Flash Storage (UFS)?UniversalFlash
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debuggingHao-Ran Liu
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBabak Farrokhi
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLinaro
 
High Bandwidth Memory : Notes
High Bandwidth Memory : NotesHigh Bandwidth Memory : Notes
High Bandwidth Memory : NotesSubhajit Sahu
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLinaro
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI KeynoteAllan Cantle
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaBrendan Gregg
 
Dave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDanny Abukalam
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLinaro
 
Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...
Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...
Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...Memory Fabric Forum
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKBshimosawa
 
OpenConfig: collaborating to enable programmable network management
OpenConfig: collaborating to enable programmable network managementOpenConfig: collaborating to enable programmable network management
OpenConfig: collaborating to enable programmable network managementAnees Shaikh
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Jérôme Petazzoni
 

Was ist angesagt? (20)

Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
What is Universal Flash Storage (UFS)?
What is Universal Flash Storage (UFS)?What is Universal Flash Storage (UFS)?
What is Universal Flash Storage (UFS)?
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktrace
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoC
 
High Bandwidth Memory : Notes
High Bandwidth Memory : NotesHigh Bandwidth Memory : Notes
High Bandwidth Memory : Notes
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 
OpenPOWER Summit 2020 - OpenCAPI Keynote
OpenPOWER Summit 2020 -  OpenCAPI KeynoteOpenPOWER Summit 2020 -  OpenCAPI Keynote
OpenPOWER Summit 2020 - OpenCAPI Keynote
 
Linux memory
Linux memoryLinux memory
Linux memory
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
Dave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMU
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination Interface
 
Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...
Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...
Micron: Memory Expansion with CXL Modules: Benefits, Use Cases and Enriching ...
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKB
 
OpenConfig: collaborating to enable programmable network management
OpenConfig: collaborating to enable programmable network managementOpenConfig: collaborating to enable programmable network management
OpenConfig: collaborating to enable programmable network management
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 

Andere mochten auch

IBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller OverviewIBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller Overviewndavenport
 
Bn1013 demo sap success factors
Bn1013 demo  sap success factorsBn1013 demo  sap success factors
Bn1013 demo sap success factorsconline training
 
An Agile approach to Business Metrics
An Agile approach to Business MetricsAn Agile approach to Business Metrics
An Agile approach to Business MetricsPablo Valcárcel
 
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...Mike KEPPELL
 
Agile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The WorkshopAgile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The WorkshopHiroshi Hiromoto
 
Agile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtAgile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtVincent Burckhardt
 
The difference between a KPI and a Metric
The difference between a KPI and a MetricThe difference between a KPI and a Metric
The difference between a KPI and a MetricDennis Mortensen
 
People Analytics: State of the Market - Top Ten List
People Analytics:  State of the Market - Top Ten ListPeople Analytics:  State of the Market - Top Ten List
People Analytics: State of the Market - Top Ten ListJosh Bersin
 
Lean Agile Metrics And KPIs
Lean Agile Metrics And KPIsLean Agile Metrics And KPIs
Lean Agile Metrics And KPIsYuval Yeret
 
Digital transformation in 50 soundbites
Digital transformation in 50 soundbitesDigital transformation in 50 soundbites
Digital transformation in 50 soundbitesJulie Dodd
 

Andere mochten auch (14)

IBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller OverviewIBM Kenexa Partner/Re-Seller Overview
IBM Kenexa Partner/Re-Seller Overview
 
Bn1013 demo sap success factors
Bn1013 demo  sap success factorsBn1013 demo  sap success factors
Bn1013 demo sap success factors
 
An Agile approach to Business Metrics
An Agile approach to Business MetricsAn Agile approach to Business Metrics
An Agile approach to Business Metrics
 
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
Digital Literacies: Knowledge, Skills and Attitudes for a Digital Age - Ruth ...
 
Agile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The WorkshopAgile Transformation with Improvement Kata - The Workshop
Agile Transformation with Improvement Kata - The Workshop
 
Agile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtAgile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is built
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
 
The difference between a KPI and a Metric
The difference between a KPI and a MetricThe difference between a KPI and a Metric
The difference between a KPI and a Metric
 
People Analytics: State of the Market - Top Ten List
People Analytics:  State of the Market - Top Ten ListPeople Analytics:  State of the Market - Top Ten List
People Analytics: State of the Market - Top Ten List
 
Lean Agile Metrics And KPIs
Lean Agile Metrics And KPIsLean Agile Metrics And KPIs
Lean Agile Metrics And KPIs
 
Agile KPIs
Agile KPIsAgile KPIs
Agile KPIs
 
Positive attitude ppt
Positive attitude pptPositive attitude ppt
Positive attitude ppt
 
Digital transformation in 50 soundbites
Digital transformation in 50 soundbitesDigital transformation in 50 soundbites
Digital transformation in 50 soundbites
 
Digital in 2016
Digital in 2016Digital in 2016
Digital in 2016
 

Ähnlich wie Improving Real-Time Performance on Multicore Platforms using MemGuard

Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsHeechul Yun
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureHeechul Yun
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Tokyo Institute of Technology
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Heechul Yun
 
Volatile Uses for Persistent Memory
Volatile Uses for Persistent MemoryVolatile Uses for Persistent Memory
Volatile Uses for Persistent MemoryIntel® Software
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsJoshua Mora
 
trends of microprocessor field
trends of microprocessor fieldtrends of microprocessor field
trends of microprocessor fieldRamya SK
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsHeechul Yun
 
UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6edlangley
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationedlangley
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAlan Renouf
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentationSaket Vihari
 

Ähnlich wie Improving Real-Time Performance on Multicore Platforms using MemGuard (20)

Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Cpu spec
Cpu specCpu spec
Cpu spec
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
 
Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...Memory access control in multiprocessor for real-time system with mixed criti...
Memory access control in multiprocessor for real-time system with mixed criti...
 
Volatile Uses for Persistent Memory
Volatile Uses for Persistent MemoryVolatile Uses for Persistent Memory
Volatile Uses for Persistent Memory
 
Tacc Infinite Memory Engine
Tacc Infinite Memory EngineTacc Infinite Memory Engine
Tacc Infinite Memory Engine
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systems
 
trends of microprocessor field
trends of microprocessor fieldtrends of microprocessor field
trends of microprocessor field
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6UKUUG presentation about µCLinux on Pluto 6
UKUUG presentation about µCLinux on Pluto 6
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentation
 
BURA Supercomputer
BURA SupercomputerBURA Supercomputer
BURA Supercomputer
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 

Kürzlich hochgeladen

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Kürzlich hochgeladen (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Improving Real-Time Performance on Multicore Platforms using MemGuard

  • 1. Improving Real-Time Performance on Multicore Platforms Using MemGuard University of Kansas Dr. Heechul Yun 10/28/2013
  • 3. Challenges: Shared Resources T1 T2 CPU T 1 T 2 Core 1 T 3 T 4 Core 2 T 5 T 6 Core 3 Memory Hierarchy T 8 Core 4 Memory Hierarchy Unicore T 7 Multicore Performance Impact 3
  • 4. Case Study • HRT – Synthetic real-time video capture – P=20, D=13ms – Cache-insensitive • X-server – Scrolling text on a gnome-terminal • Hardware platform – Intel Xeon 3530 – 8MB shared L3 cache – 4GB DDR3 1333MHz DIMM (1ch) HRT Xsrv. Core1 Core2 L3 (8MB) DRAM • CPU cores are isolated A desktop PC (Intel Xeon 3530) 4
  • 5. HRT Time Distribution solo w/ Xserver 99pct: 10.2ms 99pct: 14.3ms • 28% deadline violations • Due to contention in DRAM 5
  • 6. Outline • Motivation • Background – DRAM basics – Worst-case memory performance – MemGuard*RTAS’13+ • Improving Real-Time Performance with MemGuard 6
  • 7. Background: DRAM Organization Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Bank 1 Bank 2 Bank 3 Bank 4 • Have multiple banks • Different banks can be accessed in parallel
  • 8. Best-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Fast • Peak = 10.6 GB/s Bank 1 Bank 2 Bank 3 Bank 4 – DDR3 1333Mhz
  • 9. Best-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Fast • Peak = 10.6 GB/s Bank 1 Bank 2 Bank 3 Bank 4 – DDR3 1333Mhz • Out-of-order processors
  • 10. Most-cases Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Mess • Performance = ?? Bank 1 Bank 2 Bank 3 (*) Intel® 64 and IA-32 Architectures Optimization Reference Manual Bank 4
  • 11. Worst-case Core1 Core2 Core3 Core4 L3 Memory Controller (MC) DRAM DIMM Slow • 1bank b/w Bank 1 Bank 2 Bank 3 (*) Intel® 64 and IA-32 Architectures Optimization Reference Manual Bank 4 – Less than peak b/w – How much?
  • 12. Background: DRAM Operation Bank 1 Row 1 Row 2 Row 3 Row 4 Row 5 activate READ (Bank 1, Row 3, Col 7) precharge Col7 Row Buffer Read/write • Stateful per-bank access time – Row miss: 19 cycles – Row hit: 9 cycles (*) PC6400-DDR2 with 5-5-5 (RAS-CAS-CL latency setting)
  • 13. Real Worst-case Core 1 Core 2 Core 3 Core 4 Request order time L3 Memory Controller (MC) DRAM DIMM Bank 1 Bank 2 Bank 3 Bank 4 Row 1 Row 2 Row 3 Row 4 Row 1 Row 2 … 1 bank & always row miss  ~1.2GB/s Each core = ¼ x 1.2GB/s = 300MB/s ? (*) Intel® 64 and IA-32 Architectures Optimization Reference Manual
  • 14. Background: Memory Controller(MC) Bruce Jacob et al, “Memory Systems: Cache, DRAM, Disk” Fig 13.1. • Request queue(s) – Not fair (open-row first  re-ordering) – Unpredictable queuing delay 14
  • 15. Challenges for Real-Time Systems • Multiple parallel resources (banks) • Stateful bank access latency • Queuing delay • Unpredictable memory performance 15
  • 16. MemGuard *RTAS’13+ MemGuard Operating System Reclaim Manager BW 0.6GB/s Regulator BW 0.2GB/s Regulator BW 0.2GB/s Regulator BW 0.2GB/s Regulator PMC Core1 PMC Core2 PMC Core3 PMC Core4 Memory Controller Multicore Processor DRAM DIMM • Goal: guarantee minimum memory b/w for each core • How: b/w reservation + best effort sharing 16
  • 17. Reservation • Idea – Scheduler regulates per-core memory b/w using h/w counters – Period = 1 scheduler tick (e.g., 1ms) Suspend the RT idle task 2 Budget 1 Core activity 0 1ms Schedule a RT idle task computation 2ms memory fetch 17
  • 19. Best-Effort Sharing time(ms) Core0 Core1 900MB/s 300MB/s 0 throttled reschedule 1 guaranteed b/w 2 best-effort b/w • Spare Sharing *RTAS’13+ • Proportional Sharing [Unpublished TR] 19
  • 20. Case Study • HRT – Synthetic real-time video capture – P=20, D=13ms – Cache-insensitive • X-server – Scrolling text on a gnome-terminal • Hardware platform – Intel Xeon 3530 – 8MB shared cache – 4GB DDR3 1333MHz DIMM HRT Xsrv. Core1 Core2 L3 (8MB) DRAM A desktop PC (Intel Xeon 3530) 20
  • 21. w/o MemGuard HRT (solo) HRT’s 99pct: 10.2ms HRT (w/ Xserver) HRT’s 99pct: 14.3ms X’s CPU util: 78% 21
  • 22. MemGuard reserve only (HRT=900MB/s, X=300MB/s) HRT (solo) HRT’s 99pct: 10.7ms HRT (w/ Xserver) HRT’s 99pct: 11.2ms X’s CPU util: 4% 22
  • 23. MemGuard reserve (HRT=900MB/s, X=300MB/s)+ best-effort sharing HRT (solo) HRT’s 99pct: 10.7ms HRT (w/ Xserver) HRT’s 99pct: 10.7ms X’s CPU util: 48% 23
  • 24. MemGuard reserve (HRT=600MB/s, X=600MB/s)+ best-effort sharing HRT (solo) HRT’s 99pct: 10.9 ms HRT (w/ Xserver) HRT’s 99pct: 12.1ms X’s CPU util: 61% 24
  • 25. Real-Time Performance Improvement HRT X-server • Using MemGuard, we can achieve – No deadline miss for HRT – Good X-server performance 25
  • 26. Conclusion • Unpredictable memory performance – multiple resources(banks), per-bank state, unpredictable queueing delay • MemGuard – Guarantee minimum memory bandwidth for each core – b/w reservation (guaranteed part) + best-effort sharing • Case-study – On Intel Xeon multicore platform, using HRT + X-server – MemGuard can improve real-time performance efficiently • Limitations and Future Work – Coarse grain (a OS tick) enforcement – Small guaranteed b/w  DRAM bank partitioning (submitted to RTAS’14) https://github.com/heechul/memguard 26
  • 28. Evaluation on Intel Core2 • T1: Synthetic video capture task (HRT) – Period=20ms(50Hz) – Deadline=14ms, – Metrics: ACET, WCET, stdev, deadline miss ratio (out of 1000 periods) • T2: Xserver, update screen (SRT) – Metric: CPU utilization • Higher CPU utilization  faster screen update • Platform – Intel Core2Quad 8400, 2MB L2 cache x 2, tunable H/W prefetchers – PC6400 DDR2 DRAM DIMM x 1 • Three platform configurations – Exp1: Private L2, Prefetch=off – Exp2: Private L2, Prefetch=on – Exp3: Shared L2, Prefetch=on Core0 Core1 Core2 L2 (pref.) Core3 L2 (pref.) DRAM Intel Core2Quad based PC 28
  • 29. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 deadline solo corun T1 Private L2 Prefetch=off Performance guarantee 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 29
  • 30. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 30% WCET WCET ACET solo corun T1 Private L2 Prefetch=off Performance guarantee deadline 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 30
  • 31. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 deadline solo corun T1 Private L2 Prefetch=off 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 31
  • 32. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 deadline solo corun T1 Private L2 Prefetch=off 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 32
  • 33. T1’s exec. time (ms) Experiment 1 18 16 14 12 10 8 6 4 2 0 Performance target solo corun T1 Private L2 Prefetch=off 92% T2 Core1 Core2 L2 L2 solo corun solo corun T1 38% T2 T1 78% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 33
  • 34. T1's exec. Time (ms) Experiment 2: Prefetcher 24 22 20 18 16 14 12 10 8 6 4 2 0 Not enough reserv. More slowdown deadline 60% solo corun T1 Private L2 Prefetch=ON Deadline violation 94% T2 Core1 Core2 L2 L2 solo corun solo corun T1 33% T2 T1 82% T2 550M/s 550M/s 550M/s 550M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 34
  • 35. T1's exec. Time (ms) Experiment 2-2 18 16 14 12 10 8 6 4 2 0 Enough reserv. 60% solo corun T1 Private L2 Prefetch=ON No deadline violation 94% T2 Core1 Core2 L2 L2 solo corun solo corun T1 14% T2 T1 69% T2 900M/s 200M/s 900M/s 200M/s Core1 L2 Core2 L2 Core1 L2 Core2 L2 DRAM DRAM DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 35
  • 36. T1's exec. Times (ms) Experiment 3: Shared Cache 24 22 20 18 16 14 12 10 8 6 4 2 0 Even more slowdown Minimum reserv. 108% solo corun solo corun No deadline violation solo corun T1 11% T2 T1 63% T2 T1 Shared L2 Prefetch=ON 92% T2 900M/s 200M/s 900M/s 200M/s Core1 Core2 Core1 Core2 Core1 Core2 L2 DRAM L2 DRAM L2 DRAM Original MemGuard (Reserve only) MemGuard (reclaim + share) 36

Hinweis der Redaktion

  1. Soon more rt/embedded systems will use multicore as well.
  2. In the unicore systems, CPU time is the most important shared resource determining application’s performance. In the multicore systems, however, memory performance is also very important as multiple cores can concurrently access the memory and affect performance in significant ways.
  3. 5
  4. Problem 1: co-ordinate memory slot with tasks  require program modification(PREM)Problem 2: only 1 core can access memory at a time  do not fully utilize memory level parallelism
  5. First, let me explain how b/w regulator works.
  6. Why we want to regulate the request rates?
  7. 5
  8. Problem: DRAM
  9. Problem: DRAM
  10. Problem: DRAM
  11. Problem: DRAM
  12. Problem: DRAM
  13. Problem: DRAM
  14. Problem: DRAM
  15. Problem: DRAM