SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Cache Memory

Rabi Mahapatra & Hank Walker
Adapted from lectures notes of Dr. Patterson and
Dr. Kubiatowicz of UC Berkeley
Revisiting Memory Hierarchy
• Facts
– Big is slow
– Fast is small

• Increase performance by having
“hierarchy” of memory subsystems
• “Temporal Locality” and “Spatial Locality”
are big ideas
Revisiting Memory Hierarchy
• Terms
–
–
–
–
–

Cache Miss
Cache Hit
Hit Rate
Miss Rate
Index, Offset and Tag
Direct Mapped Cache
Direct Mapped Cache [contd…]
• What is the size of cache ?
4K
• If I read
0000 0000 0000 0000 0000 0000 1000 0001
• What is the index number checked ?
64
• If the number was found, what are the
inputs to comparator ?
Direct Mapped Cache [contd…]
Taking advantage of spatial locality, we read 4 bytes at a time
Direct Mapped Cache [contd…]
• Advantage
– Simple
– Fast

• Disadvantage
– Mapping is fixed !!!
Associative Caches
• Block 12 placed in 8 block cache:
– Fully associative, direct mapped, 2-way set associative
– S.A. Mapping = Block Number Modulo Number Sets
Fully associative:
block 12 can go
anywhere
Block
no.

01234567

Direct mapped:
block 12 can go
only into block 4
(12 mod 8)
Block
no.

01234567

Set associative:
block 12 can go
anywhere in set 0
(12 mod 4)
Block
no.

Block-frame address

Block
no.

1111111111222222222233
01234567890123456789012345678901

01234567

Set Set Set Set
0 1 2 3
Set Associative Cache
•

N-way set associative: N entries for each Cache Index
– N direct mapped caches operates in parallel
Example: Two-way set associative cache
– Cache Index selects a “set” from the cache
– The two tags in the set are compared to the input in parallel
– Data is selected based on the tag result

•

Valid

Cache Tag

:

:
Adr Tag

Compare

Cache Index
Cache Data
Cache Data
Cache Block 0
Cache Block 0

:

:

Sel1 1

Mux

0 Sel0

OR
Hit

Cache Block

Cache Tag

Valid

:

:

Compare
Example: 4-way set associative Cache

What is the cache size in this case ?
Disadvantages of Set Associative Cache
• N-way Set Associative Cache versus Direct Mapped Cache:
– N comparators vs. 1
– Extra MUX delay for the data
– Data comes AFTER Hit/Miss decision and set selection
• In a direct mapped cache, Cache Block is available BEFORE
Hit/Miss:
Valid

Cache Tag

:

:
Adr Tag

Compare

Cache Index
Cache Data
Cache Data
Cache Block 0
Cache Block 0

:

:

Sel1 1

Mux

0 Sel0

OR
Hit

Cache Block

Cache Tag

Valid

:

:

Compare
Fully Associative Cache

•

Fully Associative Cache
– Forget about the Cache Index
– Compare the Cache Tags of all cache entries in parallel
– Example: Block Size = 32 B blocks, we need N 27-bit comparators
By definition: Conflict Miss = 0 for a fully associative cache
31

4
0
Byte Select

Cache Tag (27 bits long)

Ex: 0x01
Cache Tag

Valid Bit Cache Data

=

Byte 31
Byte 63

=

: :

•

Byte 1 Byte 0
Byte 33 Byte 32

=
=
=

:

:

:
Cache Misses
•

•

•

•

Compulsory (cold start or process migration, first reference): first
access to a block
– “Cold” fact of life: not a whole lot you can do about it
– Note: If you are going to run “billions” of instruction,
Compulsory Misses are insignificant
Capacity:
– Cache cannot contain all blocks access by the program
– Solution: increase cache size
Conflict (collision):
– Multiple memory locations mapped
to the same cache location
– Solution 1: increase cache size
– Solution 2: increase associativity
Coherence (Invalidation): other process (e.g., I/O) updates memory
Design Options at Constant Cost
Direct Mapped
Cache Size
Compulsory Miss

Big
Same

N-way Set Associative
Medium
Same

Fully Associative
Small
Same

Conflict Miss

High

Medium

Zero

Capacity Miss

Low

Medium

High

Coherence Miss

Same

Same

Same
Four Questions for Cache Design
• Q1: Where can a block be placed in the upper level?
(Block placement)
• Q2: How is a block found if it is in the upper level?
(Block identification)
• Q3: Which block should be replaced on a miss?
(Block replacement)
• Q4: What happens on a write?
(Write strategy)
Where can a block be placed in the upper
level?
• Direct Mapped
• Set Associative
• Fully Associative
How is a block found if it is in the upper
level?
Block Address
Tag

Index

Block
offset

Set Select
Data Select

• Direct indexing (using index and block offset), tag
compares, or combination
• Increasing associativity shrinks index, expands tag
Which block should be replaced on a
miss?
• Easy for Direct Mapped
• Set Associative or Fully Associative:
– Random
– LRU (Least Recently Used)
Associativity:
2-way
Size
LRU Random
16 KB
5.2% 5.7%
64 KB
1.9% 2.0%
256 KB 1.15% 1.17%

4-way
8-way
LRU Random LRU Random
4.7% 5.3% 4.4%
5.0%
1.5% 1.7% 1.4%
1.5%
1.13% 1.13% 1.12% 1.12%
What happens on a write?
•
•

•

•

Write through—The information is written to both the block in the
cache and to the block in the lower-level memory.
Write back—The information is written only to the block in the cache.
The modified cache block is written to main memory only when it is
replaced.
– is block clean or dirty?
Pros and Cons of each?
– WT: read misses cannot result in writes
– WB: no writes of repeated writes
WT always combined with write buffers so that don’t wait for lower
level memory
Two types of Cache
PC
I -Cache

• Instruction Cache
• Data Cache

miss

IR
IRex

A

B

invalid
IRm

R
D Cache

IRwb

T
Miss
Improving Cache Performance
• Reduce Miss Rate
– Associativity
– Victim Cache
– Compiler Optimizations

• Reduce Miss Penalty
– Faster DRAM
– Write Buffers

• Reduce Hit Time
Reduce Miss Rate
0.14

1-way
2-way

0.1

4-way

0.08

8-way

0.06

Capacity

0.04
0.02

Cache Size (KB)

128

64

32

16

8

4

2

0
1

Miss Rate per Type

0.12

Compulsory
Reduce Miss Rate [contd…]
miss rate 1-way associative cache size X
= miss rate 2-way associative cache size X/2
0.14

1-way

Conflict

2-way

0.1

4-way

0.08

8-way

0.06

Capacity

0.04
0.02

Cache Size (KB)

128

64

32

16

8

4

2

0
1

Miss Rate per Type

0.12

Compulsory
3Cs Comparision
100%
80%

Conflict

2-way
4-way
8-way

60%
40%

Capacity

20%

Cache Size (KB)

128

64

32

16

8

4

2

0%
1

Miss Rate per Type

1-way

Compulsory
Compiler Optimizations
• Compiler Optimizations to reduce miss rate
– Loop Fusion
– Loop Interchange
– Merge Arrays
Reducing Miss Penalty
• Faster RAM memories
– Driven by technology and cost !!!
– Eg: CRAY uses only SRAM
Reduce Hit Time
• Lower Associativity
– Add L1 and L2 caches
– L1 cache is small => Hit Time is critical
– L2 cache has large => Miss Penalty is critical
Reducing Miss Penalty [contd…]
Processor

Cache

DRAM

Write Buffer

• A Write Buffer is needed between the Cache and Memory
– Processor: writes data into the cache and the write buffer
– Memory controller: write contents of the buffer to memory
• Write buffer is just a FIFO:
Designing the Memory System to
Support Caches

Eg: 1 clock cycle to send address and data, 15 for each DRAM access

Case 1: (Main memory of one word)
1 + 4 x 15 + 4 x 1 = 65 clk cycles, bytes/clock = 16/65 = 0.25
Case 2: (Main memory of two words)
1 + 2 x 15 + 2 x 1 = 33 clk cycles, bytes/clock = 16/33 = 0.48
Case 3: (Interleaved memory)
1 + 1 x 15 + 4 x 1 = 20 clk cycles, bytes/clock = 16/20 = 0.80
The possible R/W cases
• Read Hit
– Good for CPU !!!

• Read Miss
–

Stall CPU, fetch, Resume

• Write Hit
– Write Through
– Write Back

• Write Miss
– Write entire block into memory, Read into cache
Performance
• CPU time = (CPU execution clock cycles +
Memory stall clock cycles) x clock cycle time
• Memory stall clock cycles =
(Reads x Read miss rate x Read miss penalty +
Writes x Write miss rate x Write miss penalty)
• Memory stall clock cycles =
Memory accesses x Miss rate x Miss penalty
• Different measure: AMAT
Average Memory Access time (AMAT) =
Hit Time + (Miss Rate x Miss Penalty)
• Note: memory hit time is included in execution cycles.
Performance [contd…]
• Suppose a processor executes at
– Clock Rate = 200 MHz (5 ns per cycle)
– Base CPI = 1.1
– 50% arith/logic, 30% ld/st, 20% control
• Suppose that 10% of memory
operations get 50 cycle miss penalty
• Suppose that 1% of instructions get same miss penalty
• CPI = Base CPI + average stalls per instruction
1.1(cycles/ins) +
[ 0.30 (DataMops/ins)
x 0.10 (miss/DataMop) x 50 (cycle/miss)] +
[ 1 (InstMop/ins)
x 0.01 (miss/InstMop) x 50 (cycle/miss)]
= (1.1 + 1.5 + .5) cycle/ins = 3.1
• 58% of the time the proc is stalled waiting for memory!
• AMAT=(1/1.3)x[1+0.01x50]+(0.3/1.3)x[1+0.1x50]=2.54
Summary
•

•

•

The Principle of Locality:
– Program likely to access a relatively small portion of the address space at
any instant of time.
• Temporal Locality: Locality in Time
• Spatial Locality: Locality in Space
Three (+1) Major Categories of Cache Misses:
– Compulsory Misses: sad facts of life. Example: cold start misses.
– Conflict Misses: increase cache size and/or associativity.
Nightmare Scenario: ping pong effect!
– Capacity Misses: increase cache size
– Coherence Misses: Caused by external processors or I/O devices
Cache Design Space
– total size, block size, associativity
– replacement policy
– write-hit policy (write-through, write-back)
– write-miss policy
Summary [contd…]
Cache Size

•

•

•

Several interacting dimensions
– cache size
– block size
– associativity
– replacement policy
– write-through vs write-back
– write allocation
The optimal choice is a compromise
– depends on access characteristics
• workload
• use (I-cache, D-cache, TLB)
– depends on technology / cost
Simplicity often wins

Associativity

Block Size

Bad

Good Factor A
Less

Factor B

More

Weitere ähnliche Inhalte

Was ist angesagt?

Cache coherence
Cache coherenceCache coherence
Cache coherence
Employee
 
Topograhical synthesis
Topograhical synthesis   Topograhical synthesis
Topograhical synthesis
Deiptii Das
 

Was ist angesagt? (20)

cache memory
cache memorycache memory
cache memory
 
04 cache memory.ppt 1
04 cache memory.ppt 104 cache memory.ppt 1
04 cache memory.ppt 1
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
 
Memory organization
Memory organizationMemory organization
Memory organization
 
Cache memory
Cache memoryCache memory
Cache memory
 
Virtual memory presentation
Virtual memory presentationVirtual memory presentation
Virtual memory presentation
 
Memory management
Memory managementMemory management
Memory management
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Cache memory
Cache memoryCache memory
Cache memory
 
Pci,usb,scsi bus
Pci,usb,scsi busPci,usb,scsi bus
Pci,usb,scsi bus
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
Topograhical synthesis
Topograhical synthesis   Topograhical synthesis
Topograhical synthesis
 
Cache memory
Cache memoryCache memory
Cache memory
 
What is Cache and how it works
What is Cache and how it worksWhat is Cache and how it works
What is Cache and how it works
 

Andere mochten auch (17)

cache
cachecache
cache
 
Cache memory presentation
Cache memory presentationCache memory presentation
Cache memory presentation
 
Cache memory
Cache memoryCache memory
Cache memory
 
Chapter 9 - Virtual Memory
Chapter 9 - Virtual MemoryChapter 9 - Virtual Memory
Chapter 9 - Virtual Memory
 
04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Makalah cache
Makalah cacheMakalah cache
Makalah cache
 
Cache
CacheCache
Cache
 
Ch4 memory management
Ch4 memory managementCh4 memory management
Ch4 memory management
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
axi protocol
axi protocolaxi protocol
axi protocol
 
Memória Interna - Arquitetura e Organização de Computadores
Memória Interna - Arquitetura e Organização de ComputadoresMemória Interna - Arquitetura e Organização de Computadores
Memória Interna - Arquitetura e Organização de Computadores
 
Set model and page fault.44
Set model and page fault.44Set model and page fault.44
Set model and page fault.44
 
Caches microP
Caches microPCaches microP
Caches microP
 
Cache memory
Cache memoryCache memory
Cache memory
 
Virtual memory ppt
Virtual memory pptVirtual memory ppt
Virtual memory ppt
 

Ähnlich wie cache memory

SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOCSOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SnehaLatha68
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
Haris456
 

Ähnlich wie cache memory (20)

Lecture 25
Lecture 25Lecture 25
Lecture 25
 
Computer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary MemoryComputer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary Memory
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOCSOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
 
computer-memory
computer-memorycomputer-memory
computer-memory
 
cache memory
 cache memory cache memory
cache memory
 
hierarchical memory technology.pptx
hierarchical memory technology.pptxhierarchical memory technology.pptx
hierarchical memory technology.pptx
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
Computer organization memory hierarchy
Computer organization memory hierarchyComputer organization memory hierarchy
Computer organization memory hierarchy
 
module3.ppt
module3.pptmodule3.ppt
module3.ppt
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
 

Mehr von VELAMMAL INSTITUTE OF TECHNOLOGY (6)

cache memory management
cache memory managementcache memory management
cache memory management
 
Memory management
Memory managementMemory management
Memory management
 
Zigbee technology
Zigbee technology Zigbee technology
Zigbee technology
 
VIBRATIONAL ENERGY HARVESTING POWER FROM ROAD SPEED BRAKER
VIBRATIONAL ENERGY HARVESTING POWER FROM ROAD SPEED BRAKERVIBRATIONAL ENERGY HARVESTING POWER FROM ROAD SPEED BRAKER
VIBRATIONAL ENERGY HARVESTING POWER FROM ROAD SPEED BRAKER
 
Eigg mod23.1
Eigg mod23.1Eigg mod23.1
Eigg mod23.1
 
wireless traffic density control using sensor
wireless traffic density control using sensorwireless traffic density control using sensor
wireless traffic density control using sensor
 

Kürzlich hochgeladen

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Kürzlich hochgeladen (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

cache memory

  • 1. Cache Memory Rabi Mahapatra & Hank Walker Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley
  • 2. Revisiting Memory Hierarchy • Facts – Big is slow – Fast is small • Increase performance by having “hierarchy” of memory subsystems • “Temporal Locality” and “Spatial Locality” are big ideas
  • 3. Revisiting Memory Hierarchy • Terms – – – – – Cache Miss Cache Hit Hit Rate Miss Rate Index, Offset and Tag
  • 5. Direct Mapped Cache [contd…] • What is the size of cache ? 4K • If I read 0000 0000 0000 0000 0000 0000 1000 0001 • What is the index number checked ? 64 • If the number was found, what are the inputs to comparator ?
  • 6. Direct Mapped Cache [contd…] Taking advantage of spatial locality, we read 4 bytes at a time
  • 7. Direct Mapped Cache [contd…] • Advantage – Simple – Fast • Disadvantage – Mapping is fixed !!!
  • 8. Associative Caches • Block 12 placed in 8 block cache: – Fully associative, direct mapped, 2-way set associative – S.A. Mapping = Block Number Modulo Number Sets Fully associative: block 12 can go anywhere Block no. 01234567 Direct mapped: block 12 can go only into block 4 (12 mod 8) Block no. 01234567 Set associative: block 12 can go anywhere in set 0 (12 mod 4) Block no. Block-frame address Block no. 1111111111222222222233 01234567890123456789012345678901 01234567 Set Set Set Set 0 1 2 3
  • 9. Set Associative Cache • N-way set associative: N entries for each Cache Index – N direct mapped caches operates in parallel Example: Two-way set associative cache – Cache Index selects a “set” from the cache – The two tags in the set are compared to the input in parallel – Data is selected based on the tag result • Valid Cache Tag : : Adr Tag Compare Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 : : Sel1 1 Mux 0 Sel0 OR Hit Cache Block Cache Tag Valid : : Compare
  • 10. Example: 4-way set associative Cache What is the cache size in this case ?
  • 11. Disadvantages of Set Associative Cache • N-way Set Associative Cache versus Direct Mapped Cache: – N comparators vs. 1 – Extra MUX delay for the data – Data comes AFTER Hit/Miss decision and set selection • In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: Valid Cache Tag : : Adr Tag Compare Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 : : Sel1 1 Mux 0 Sel0 OR Hit Cache Block Cache Tag Valid : : Compare
  • 12. Fully Associative Cache • Fully Associative Cache – Forget about the Cache Index – Compare the Cache Tags of all cache entries in parallel – Example: Block Size = 32 B blocks, we need N 27-bit comparators By definition: Conflict Miss = 0 for a fully associative cache 31 4 0 Byte Select Cache Tag (27 bits long) Ex: 0x01 Cache Tag Valid Bit Cache Data = Byte 31 Byte 63 = : : • Byte 1 Byte 0 Byte 33 Byte 32 = = = : : :
  • 13. Cache Misses • • • • Compulsory (cold start or process migration, first reference): first access to a block – “Cold” fact of life: not a whole lot you can do about it – Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant Capacity: – Cache cannot contain all blocks access by the program – Solution: increase cache size Conflict (collision): – Multiple memory locations mapped to the same cache location – Solution 1: increase cache size – Solution 2: increase associativity Coherence (Invalidation): other process (e.g., I/O) updates memory
  • 14. Design Options at Constant Cost Direct Mapped Cache Size Compulsory Miss Big Same N-way Set Associative Medium Same Fully Associative Small Same Conflict Miss High Medium Zero Capacity Miss Low Medium High Coherence Miss Same Same Same
  • 15. Four Questions for Cache Design • Q1: Where can a block be placed in the upper level? (Block placement) • Q2: How is a block found if it is in the upper level? (Block identification) • Q3: Which block should be replaced on a miss? (Block replacement) • Q4: What happens on a write? (Write strategy)
  • 16. Where can a block be placed in the upper level? • Direct Mapped • Set Associative • Fully Associative
  • 17. How is a block found if it is in the upper level? Block Address Tag Index Block offset Set Select Data Select • Direct indexing (using index and block offset), tag compares, or combination • Increasing associativity shrinks index, expands tag
  • 18. Which block should be replaced on a miss? • Easy for Direct Mapped • Set Associative or Fully Associative: – Random – LRU (Least Recently Used) Associativity: 2-way Size LRU Random 16 KB 5.2% 5.7% 64 KB 1.9% 2.0% 256 KB 1.15% 1.17% 4-way 8-way LRU Random LRU Random 4.7% 5.3% 4.4% 5.0% 1.5% 1.7% 1.4% 1.5% 1.13% 1.13% 1.12% 1.12%
  • 19. What happens on a write? • • • • Write through—The information is written to both the block in the cache and to the block in the lower-level memory. Write back—The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. – is block clean or dirty? Pros and Cons of each? – WT: read misses cannot result in writes – WB: no writes of repeated writes WT always combined with write buffers so that don’t wait for lower level memory
  • 20. Two types of Cache PC I -Cache • Instruction Cache • Data Cache miss IR IRex A B invalid IRm R D Cache IRwb T Miss
  • 21. Improving Cache Performance • Reduce Miss Rate – Associativity – Victim Cache – Compiler Optimizations • Reduce Miss Penalty – Faster DRAM – Write Buffers • Reduce Hit Time
  • 22. Reduce Miss Rate 0.14 1-way 2-way 0.1 4-way 0.08 8-way 0.06 Capacity 0.04 0.02 Cache Size (KB) 128 64 32 16 8 4 2 0 1 Miss Rate per Type 0.12 Compulsory
  • 23. Reduce Miss Rate [contd…] miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/2 0.14 1-way Conflict 2-way 0.1 4-way 0.08 8-way 0.06 Capacity 0.04 0.02 Cache Size (KB) 128 64 32 16 8 4 2 0 1 Miss Rate per Type 0.12 Compulsory
  • 24. 3Cs Comparision 100% 80% Conflict 2-way 4-way 8-way 60% 40% Capacity 20% Cache Size (KB) 128 64 32 16 8 4 2 0% 1 Miss Rate per Type 1-way Compulsory
  • 25. Compiler Optimizations • Compiler Optimizations to reduce miss rate – Loop Fusion – Loop Interchange – Merge Arrays
  • 26. Reducing Miss Penalty • Faster RAM memories – Driven by technology and cost !!! – Eg: CRAY uses only SRAM
  • 27. Reduce Hit Time • Lower Associativity – Add L1 and L2 caches – L1 cache is small => Hit Time is critical – L2 cache has large => Miss Penalty is critical
  • 28. Reducing Miss Penalty [contd…] Processor Cache DRAM Write Buffer • A Write Buffer is needed between the Cache and Memory – Processor: writes data into the cache and the write buffer – Memory controller: write contents of the buffer to memory • Write buffer is just a FIFO:
  • 29. Designing the Memory System to Support Caches Eg: 1 clock cycle to send address and data, 15 for each DRAM access Case 1: (Main memory of one word) 1 + 4 x 15 + 4 x 1 = 65 clk cycles, bytes/clock = 16/65 = 0.25 Case 2: (Main memory of two words) 1 + 2 x 15 + 2 x 1 = 33 clk cycles, bytes/clock = 16/33 = 0.48 Case 3: (Interleaved memory) 1 + 1 x 15 + 4 x 1 = 20 clk cycles, bytes/clock = 16/20 = 0.80
  • 30. The possible R/W cases • Read Hit – Good for CPU !!! • Read Miss – Stall CPU, fetch, Resume • Write Hit – Write Through – Write Back • Write Miss – Write entire block into memory, Read into cache
  • 31. Performance • CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time • Memory stall clock cycles = (Reads x Read miss rate x Read miss penalty + Writes x Write miss rate x Write miss penalty) • Memory stall clock cycles = Memory accesses x Miss rate x Miss penalty • Different measure: AMAT Average Memory Access time (AMAT) = Hit Time + (Miss Rate x Miss Penalty) • Note: memory hit time is included in execution cycles.
  • 32. Performance [contd…] • Suppose a processor executes at – Clock Rate = 200 MHz (5 ns per cycle) – Base CPI = 1.1 – 50% arith/logic, 30% ld/st, 20% control • Suppose that 10% of memory operations get 50 cycle miss penalty • Suppose that 1% of instructions get same miss penalty • CPI = Base CPI + average stalls per instruction 1.1(cycles/ins) + [ 0.30 (DataMops/ins) x 0.10 (miss/DataMop) x 50 (cycle/miss)] + [ 1 (InstMop/ins) x 0.01 (miss/InstMop) x 50 (cycle/miss)] = (1.1 + 1.5 + .5) cycle/ins = 3.1 • 58% of the time the proc is stalled waiting for memory! • AMAT=(1/1.3)x[1+0.01x50]+(0.3/1.3)x[1+0.1x50]=2.54
  • 33. Summary • • • The Principle of Locality: – Program likely to access a relatively small portion of the address space at any instant of time. • Temporal Locality: Locality in Time • Spatial Locality: Locality in Space Three (+1) Major Categories of Cache Misses: – Compulsory Misses: sad facts of life. Example: cold start misses. – Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! – Capacity Misses: increase cache size – Coherence Misses: Caused by external processors or I/O devices Cache Design Space – total size, block size, associativity – replacement policy – write-hit policy (write-through, write-back) – write-miss policy
  • 34. Summary [contd…] Cache Size • • • Several interacting dimensions – cache size – block size – associativity – replacement policy – write-through vs write-back – write allocation The optimal choice is a compromise – depends on access characteristics • workload • use (I-cache, D-cache, TLB) – depends on technology / cost Simplicity often wins Associativity Block Size Bad Good Factor A Less Factor B More