28. Evaluation on Intel Core2
• T1: Synthetic video capture task (HRT)
– Period=20ms(50Hz)
– Deadline=14ms,
– Metrics: ACET, WCET, stdev, deadline miss ratio (out of 1000 periods)
• T2: Xserver, update screen (SRT)
– Metric: CPU utilization
• Higher CPU utilization faster screen update
• Platform
– Intel Core2Quad 8400, 2MB L2 cache x 2,
tunable H/W prefetchers
– PC6400 DDR2 DRAM DIMM x 1
• Three platform configurations
– Exp1: Private L2, Prefetch=off
– Exp2: Private L2, Prefetch=on
– Exp3: Shared L2, Prefetch=on
Core0
Core1
Core2
L2 (pref.)
Core3
L2 (pref.)
DRAM
Intel Core2Quad based PC
28
29. T1’s exec. time (ms)
Experiment 1
18
16
14
12
10
8
6
4
2
0
deadline
solo
corun
T1
Private L2
Prefetch=off
Performance guarantee
92%
T2
Core1
Core2
L2
L2
solo
corun
solo
corun
T1
38%
T2
T1
78%
T2
550M/s
550M/s
550M/s
550M/s
Core1
L2
Core2
L2
Core1
L2
Core2
L2
DRAM
DRAM
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
29
30. T1’s exec. time (ms)
Experiment 1
18
16
14
12
10
8
6
4
2
0
30%
WCET
WCET
ACET
solo
corun
T1
Private L2
Prefetch=off
Performance guarantee
deadline
92%
T2
Core1
Core2
L2
L2
solo
corun
solo
corun
T1
38%
T2
T1
78%
T2
550M/s
550M/s
550M/s
550M/s
Core1
L2
Core2
L2
Core1
L2
Core2
L2
DRAM
DRAM
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
30
31. T1’s exec. time (ms)
Experiment 1
18
16
14
12
10
8
6
4
2
0
deadline
solo
corun
T1
Private L2
Prefetch=off
92%
T2
Core1
Core2
L2
L2
solo
corun
solo
corun
T1
38%
T2
T1
78%
T2
550M/s
550M/s
550M/s
550M/s
Core1
L2
Core2
L2
Core1
L2
Core2
L2
DRAM
DRAM
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
31
32. T1’s exec. time (ms)
Experiment 1
18
16
14
12
10
8
6
4
2
0
deadline
solo
corun
T1
Private L2
Prefetch=off
92%
T2
Core1
Core2
L2
L2
solo
corun
solo
corun
T1
38%
T2
T1
78%
T2
550M/s
550M/s
550M/s
550M/s
Core1
L2
Core2
L2
Core1
L2
Core2
L2
DRAM
DRAM
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
32
33. T1’s exec. time (ms)
Experiment 1
18
16
14
12
10
8
6
4
2
0
Performance target
solo
corun
T1
Private L2
Prefetch=off
92%
T2
Core1
Core2
L2
L2
solo
corun
solo
corun
T1
38%
T2
T1
78%
T2
550M/s
550M/s
550M/s
550M/s
Core1
L2
Core2
L2
Core1
L2
Core2
L2
DRAM
DRAM
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
33
34. T1's exec. Time (ms)
Experiment 2: Prefetcher
24
22
20
18
16
14
12
10
8
6
4
2
0
Not enough reserv.
More slowdown
deadline
60%
solo
corun
T1
Private L2
Prefetch=ON
Deadline violation
94%
T2
Core1
Core2
L2
L2
solo
corun
solo
corun
T1
33%
T2
T1
82%
T2
550M/s
550M/s
550M/s
550M/s
Core1
L2
Core2
L2
Core1
L2
Core2
L2
DRAM
DRAM
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
34
35. T1's exec. Time (ms)
Experiment 2-2
18
16
14
12
10
8
6
4
2
0
Enough reserv.
60%
solo
corun
T1
Private L2
Prefetch=ON
No deadline violation
94%
T2
Core1
Core2
L2
L2
solo
corun
solo
corun
T1
14%
T2
T1
69%
T2
900M/s
200M/s
900M/s
200M/s
Core1
L2
Core2
L2
Core1
L2
Core2
L2
DRAM
DRAM
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
35
36. T1's exec. Times (ms)
Experiment 3: Shared Cache
24
22
20
18
16
14
12
10
8
6
4
2
0
Even more slowdown
Minimum reserv.
108%
solo
corun
solo
corun
No deadline violation
solo
corun
T1
11%
T2
T1
63%
T2
T1
Shared L2
Prefetch=ON
92%
T2
900M/s
200M/s
900M/s
200M/s
Core1
Core2
Core1
Core2
Core1
Core2
L2
DRAM
L2
DRAM
L2
DRAM
Original
MemGuard
(Reserve only)
MemGuard
(reclaim + share)
36
Hinweis der Redaktion
Soon more rt/embedded systems will use multicore as well.
In the unicore systems, CPU time is the most important shared resource determining application’s performance. In the multicore systems, however, memory performance is also very important as multiple cores can concurrently access the memory and affect performance in significant ways.
5
Problem 1: co-ordinate memory slot with tasks require program modification(PREM)Problem 2: only 1 core can access memory at a time do not fully utilize memory level parallelism