SlideShare ist ein Scribd-Unternehmen logo
1 von 60
CACHE MEMORY
By
Anand Goyal
2010C6PS648
G
Memory Hierarchy
 Computer memory is organized in a
hierarchy. This done to cope up with
the speed of processor and hence
increase performance.
 Closest to the processor are the
Processing registers. Then comes the
Cache memory, followed by Main
memory.
SRAM and DRAM
 Both are random access memories and are
volatile, i.e. constant power supply is required
to avoid data loss.
 DRAM :- made up of a capacitor and a
transistor. Transistor acts as a switch and
data in the form of charge is present on the
capacitor. Requires periodic charge
refreshing to maintain data storage. Lesser
cost per bit, less expensive. Used for large
memory
 SRAM :- made up of 4 transistors, which are
cross-connected in an arrangement that
produces stable logic state. Greater costs per
bit, more expensive. Used for small memory.
Principles of Locality
 Since programs can access a small
portion of their address space at any
given instant, thus to increase
performance, two policies are followed
:-
 A) Temporal Locality :- locality in time,
i.e. if an item is referred, it will tend to
referred again soon.
 B) Spatial Locality :- locality in space,
i.e. if an item is referred, its neighboring
Mapping Functions
 There are three main types of memory
mapping functions :-
 1) Direct Mapped
 2) Fully Associative
 3) Set Associative
 For the coming explanations, let us
assume 1GB main memory, 128KB
Cache memory and Cache line size
32B.
Direct Mapping
TAG LINE or SLOT (r) OFFSET
•Each memory block is mapped to a
single cache line. For the purpose of
cache access, each main memory
address can be viewed as consisting of
three fields
•No two block in the same line have the
same Tag field
•Check contents of the cache by finding
s w
 For the given example, we have –
 1GB main memory = 220 bytes
 Cache size = 128KB = 217 bytes
 Block size = 32B = 25 bytes
 No. of cache lines = 217/25 = 212, thus
12 bits are required to locate 212 lines.
 Also, offset is 25bytes and thus 5 bits
are required to locate individual byte.
 Thus Tag bits = 32 – 12 - 5 = 14 bits
14 12 5
Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w
words or bytes
 Block size = line size = 2w words or bytes
 No. of blocks in main memory = 2s+ w/2w
= 2s
 Number of lines in cache = m = 2r
 Size of tag = (s – r) bits
 Mapping Function
 Jth Block of the main memory maps to ith
cache line
 I = J modulo M (M = no. of cache lines)
Pro’s and Con’s
 Simple
 Inexpensive
 Fixed location for given block
 If a program accesses 2 blocks that
map to the same line repeatedly,
cache misses (conflict misses) are
very high
Fully Associative Mapping
 A main memory block can load into any
line
 of cache
 Memory address is interpreted as tag
and
 word
 Tag uniquely identifies block of memory
 Every line’s tag is examined for a match
 Cache searching gets expensive and
more power consumption due to parallel
comparators
TAG OFFSET
s w
Fully Associative Cache
Organization
 For the given example, we have –
 1GB main memory = 220 bytes
 Cache size = 128KB = 217 bytes
 Block size = 32B = 25 bytes
Here, offset is 25bytes and thus 5 bits
are required to locate individual byte.
 Thus Tag bits = 32 – 5 = 27 bits
27 5
Fully Associative Mapping
Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words
or bytes
 Block size = line size = 2w words or bytes
 No. of blocks in main memory = 2s+ w/2w =
2s
 Number of lines in cache = Total Number
of cache blocks
 Size of tag = s bits
Pro’s and Con’s
 There is flexibility as to which block to
replace when a new block is read into
the cache
 The complex circuitry required for
parallel Tag comparison is however a
major disadvantage.
Set Associative Mapping
 Cache is divided into a number of sets
 Each set contains a number of lines
 A given block maps to any line in a
given set. e.g. Block B can be in any
line of set i
 If 2 lines per set,
 2 way associative mapping
 A given block can be in one of 2 lines in
only one sets w
TAG SET (d) OFFSET
K-Way Set Associative
Organization
 For the given example, we have –
 1GB main memory = 220 bytes
 Cache size = 128KB = 217 bytes
 Block size = 32B = 25 bytes
 Let it be a 2-way set associative cache,
 No. of sets = 217/(2*25 )= 211, thus 11 bits
are required to locate 211 sets and each
set containing 2 lines each
 Also, offset is 25bytes and thus 5 bits are
required to locate individual byte.
 Thus Tag bits = 32 – 11 - 5 = 16 bits
16 11 5
Set Associative Mapping
Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words or
bytes
 Block size = line size = 2w words or bytes
 Number of blocks in main memory = 2s
 Number of lines in set = k
 Number of sets = v = 2d
 Number of lines in cache = kv = k * 2d
 Size of tag = (s – d) bits
 Mapping Function
 Jth Block of the main memory maps to ith set
 I = J modulo v (v = no. of sets)
 Within the set, the block can be mapped to any
cache line.
Pro’s and Con’s
 After simulating the hit ratio for direct
mapped and (2,4,8 way) set associative
mapped cache, we observe that there
is significant difference in performance
at least up to cache size of 64KB, set
associative being the better one.
 However, beyond that, the complexity
of cache increases in proportion to the
associativity, hence both mapping give
approximately similar hit ratio.
N-way Set Associative Cache
Vs. Direct Mapped Cache:
 N comparators Vs 1
 Extra mux delay for the data
 Data comes after hit/miss
 In a direct map cache, cache block is
available before hit/miss
 Number of misses
 DM > SA > FA
 Access latency : time to perform read or
write operation, i.e. time from instant
address is presented to memory to the
instant that data have stored or made
available
 DM < SA < FA
Types of Misses
Compulsory Misses :-
 When a program is started, the cache
is completely empty and hence the
first access to the block will always be
a miss as it has to brought to the
cache from memory, at least for the
first time.
 Also called first reference misses.
Can’t be avoided easily.
Capacity Misses
 Since the cache cannot hold all the
blocks needed during the execution of
program
 Thus this miss occurs due to the
blocks being discarded and later
retrieved.
 They occur because the cache is
limited in size.
 Fully Associative cache has this as its
major miss reason.
Conflict Misses
 It occurs because multiple distinct
memory locations map to the same
cache location.
 Thus in case of DM or SA, it occurs
because a blocks being discarded and
later retrieved.
 In DM, this is a repeated phenomenon
as two blocks which map to the same
cache line can be accessed alternately
and thereby decreasing the hit ratio.
 This phenomenon is called
Solutions to reduce misses
 Capacity Misses :-
◦ Increase cache size
◦ Re-structure the program
 Conflict Misses :-
◦ Increase cache size
◦ Increase associativity
Coherence Misses
 Occur when other processors update
memory which in turn invalidates the
data block present in other
processor’s cache.
Replacement Algorithms
 For Direct Mapped Cache, since each
block maps to only one line, we have no
choice but the replace that line itself
 Hence there isn’t any replacement policy
for DM.
 For SA and FA, few replacement policies
:-
◦ Optimal
◦ Random
◦ Arrival
◦ Frequency
◦ Recently Used
Optimal
This is the ideal benchmarking
replacement strategy.
 All other policies are compared to it.
 This is not implemented, but used just
for comparison purposes.
Random
 Block to be replaced is randomly
picked
 Minimum hardware complexity – just a
pseudo random number generator
required.
 Access time is not affected by the
replacement circuit.
 Not suitable for high performance
systems
Arrival - FIFO
 For an N-way set associative cache
 Implementation 1
 Use N-bit register per cache line to store arrival time information
 On cache miss – registers of all cache line in the set are
 compared to choose the victim cache line
 Implementation 2
 Maintain a FIFO queue
 Register with (log2 N) bits per cache line
 On cache miss – cache line corresponding to register value 00
 will be the victim.
 Decrement all other registers in the set by 1 and set the victim
 register with value N-1
FIFO : Advantages &
Disadvantages
 Advantages
 Low hardware Complexity
 Better cache hit performance than Random
replacement
 The cache access time is not affected by the
replacement
 strategy (not in critical path)
 Disadvantages
 Cache hit performance is poor compared to LRU and
frequency based replacement schemes
 Not suitable for high performance systems
 Replacement circuit complexity increases with increase
Frequency – Least Frequently
Used
 Requires a register per cache line to
save number of references (frequency
count)
 If cache access is hit, then increase
frequency count of the corresponding
register by 1
 If cache miss, find the victim cache line
as the cache line corresponding to
minimum frequency count in the set
 Reset the register corresponding to
victim cache line as 0
 LFU can not differentiate between past
Least Frequently Used –
Dynamic Aging (LFU-DA)
 When any frequency count register in
the set reaches its maximum value, all
the frequency count registers in that
set is shifter one position right (divide
by 2)
 Rest is same as LFU
LFU : Advantages &
Disadvantages
 Advantages
 For small and medium caches LFU works better
than
 FIFO and Random replacements
 Suitable for high performance systems whose
memory pattern follows frequency order
 Disadvantages
 The register should be updated in every cache
access
 Affects the critical path
 The replacement circuit becomes more complicated
when
Least Recently Used Policy
 Most widely used replacement
strategy
 Replaces the least recently used
cache line
 Implemented by two techniques :-
◦ Square Matrix Implementation
◦ Counter Implementation
Square Matrix Implementation
 N2 bits per set (DFF’s) to store the LRU
information
 The cache line corresponding to the row
with all zeros is the victim cache line
for replacement
 If cache hit, all the bits in corresponding
row is set to 1 and all the bits in
corresponding column is set to 0.
 If cache miss, priority encoder selects
the cache line corresponding to the row
with all zeros for replacement
 Used when associativity is less
Matrix Implementation – 4 way
set Associative Cache
Counter Implementation
 N registers with log2N bits for N- way
set associativity. Thus Nlog2N bits
used.
 Each register for each line
 Cache line corresponding to counter 0
is victim cache line for replacement
 If hit, all cache line with counter
greater than hit cache line is
decremented by 1 & hit cache line is
set to N-1
 If miss, the cache whose count value
Look Policy
Look Through : Access Cache, if data not found access the lower
level
Look Aside : Request to Cache and its lower level at the same
Write Policy
Need of Write Policy :-
 A block in cache might have been be
updated, but corresponding updation
in main memory might not have been
done
 Multiple CPU’s have individual
cache’s, thereby invalidating the data
in other processor’s cache
 I/O may be able to read write directly
into main memory
Write Through
 In this technique, all the write operations
are made to main memory as well as to
cache, ensuring MM is always valid.
 Any other processor-cache module, may
monitor traffic to MM to maintain
consistency.
DISADVANTAGE
 It generates memory traffic and may
create bottleneck.
 Bottleneck : delay in transmission of data
due to less bandwidth. Hence info is not
relayed at speed it is processed.
Pseudo Write Through
 Also called Write Buffer
 Processor writes data into the cache
and the write buffer
 Memory controller writes contents of
the buffer to memory
 FIFO (typical number of entries 4)
 After write is complete, buffer is
flushed
Write Back
 In this technique, the updates are made only
in cache.
 When an update is made, a dirty bit or use bit,
associated with the line is set
 Then when a block is replaced, it is written
back into the main memory, iff the dirty bit is
set
 Thus it minimizes memory writes
DISADVANTAGE
 Portions of MM are still invalid, hence I/O
should be allowed access only through cache
 This makes complex circuitry and potential
bottleneck
Cache Coherency
This is required only in case of
multiprocessors where each CPU has
its own cache
Why is it needed ?
 Be it any write policy, if the data is
modified in one cache, it invalidates
the data in other cache, if they seem
to hold the same data
 Hence we need to maintain a cache
coherency to obtain correct results
Approaches towards Cache
Coherency
1) Bus watching write through :
 Cache controller monitors writes into
shared memory that also resides in
the cache memory
 If any writes are made, the controller
invalidates the cache entry
 This approach depends on use of
write through policy
2) Hardware Transparency :-
 Additional hardware to ensure that all
updates to main memory via cache
are reflected in all cache
3) Non Cacheable memory :-
 Only a portion of main memory is
shared by more than 1 processor, and
this is designated as non cacheable.
 Here, all access to shared memory
are cache misses, as its never copied
to cache
Cache Optimization
 Reducing the miss penalty
1. Multi level caches
2. Critical word first
3. Priority to Read miss over writes
4. Merging write buffers
5. Victim caches
Multilevel Cache
 The inclusion of an on-chip cache gave
left a question whether another external
cache is still desirable?
 The answer is yes! The reasons are :
◦ If there is no L2 cache and Processor makes
a request for a memory location not in the L1
cache, then it accesses the DRAM or ROM.
Due to relatively slower bus speed,
performance degrades.
◦ Whereas, if an L2 SRAM cache is included,
the frequently missing information can be
quickly retrieved. Also SRAM is fast enough
to match the bus speed, hence giving zero-
wait state transaction.
 L2 cache do not use the system bus as
path for transfer between L2 and
processor, but a separate data path to
reduce burden
 A series have simulations have proved
that L2 cache is most efficient when
its double the size of L1 cache, as
otherwise, its contents will be similar to
L1
 Due to continued shrinkage of processor
components, many processors can
accommodate L2 cache on chip giving
rise to opportunity to include an L3 cache
 The only disadvantage of multilevel
cache is that it complicates the design,
Cache Performance
 Average memory access time = Hit
timeL1+Miss Rate L1 X (Hit time L2 +
Miss Rate L2 X Miss penalty L2)
 Average memory stalls per instruction
= Misses per instruction L1 X (Hit time
L2 + Misses per instruction L2 X Miss
penalty L2)
Unified Vs Split Cache
 Earlier same cache is used for data as
well as instructions i.e. Unified Cache
 Now we have separate caches for
data and instructions i.e. Split cache
 Thus, if the processor attempts to
fetch instruction from main memory, it
first consults the instruction L1 cache
and similarly for data.
Advantages of Unified Cache
 It balances load between data and
instructions automatically.
 That is, if execution involves more
instruction fetches, the cache will tend
to fill up with instructions, and if
execution involves more of data
fetches, the cache tends to fill up with
data.
 Only one cache is needed to design
Advantages of Split Cache
 Useful in parallel instruction execution
and pre-fetching of predicted future
instructions
 Eliminate contention for the instruction
fetch/decode unit and the execution
unit and thereby supporting pipelining
 the processor will fetch the instructions
ahead of time and fill the buffer, or
pipeline.
 E.g. Super scalar machines Pentium
and Power PC
Critical Word First
 This policy involves sending the
requested word first and then transfer
the rest. Thus getting the data to the
processor in 1st cycle.
 Assume that 1 block = 16 bytes. 1 cycle
transfers 4 bytes. Thus at least 4 cycles
required to transfer the block.
 If the processor demands for 2nd byte,
then why should we wait for entire block
to be transferred. We can first send that
word and then the complete block with
the remaining bytes.
Priority to read miss over
writes
Write Buffer:
 Using write buffers: RAW conflicts with reads on cache
misses
 If simply wait for write buffer to empty - increases read
miss penalty by 50%
 Check the content of the write buffer on read miss, if no
conflicts and memory system is available, allow read
miss to continue. If there is a conflict, then flush the
buffer before read
Write Back?
 Read miss replacing dirty block
 Normal: Write dirty block to memory, and then do the
read
 Instead copy the dirty block to a write buffer, then do the
Victim Cache
 How to combine fast hit time of DM with
reduced conflict Misses?
 Add a small fully associative buffer
(cache) to hold data discarded from
cache Victim Cache
 A small fully associative cache is used
for collecting spill out data
 Blocks that are discarded because of a
miss (Victim) is stored in victim cache
and is checked on a cache miss.
 If found swap the data block between
victim cache and main cache
 Replacement will always happen with the LRU
block of victim cache. The block that we want
to transfer is made MRU.
 Then from cache, the block will come to victim
cache and made MRU.
 The block which was transferred to cache is
now made LRU
 If miss in victim cache also, then MM is
referred.
01 00 10 11
8
8 0
0
11 11
00
Cache Optimization
 Reducing hit time
1. Small and simple caches
2. Way prediction cache
3. Trace cache
4. Avoid Address translation during
indexing of the cache
Cache Optimization
 Reducing miss rate
1)Changing cache configurations
2)Compiler optimization
Cache Optimization
 Reducing miss penalty per miss rate
via parallelism
1)Hardware prefetching
2)Compiler prefetching
Cache Optimization
 Increasing cache bandwidth
1)Pipelined cache,
2)Multi-banked cache
3)Non-blocking cache

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Cache memory
Cache memoryCache memory
Cache memory
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Modes of data transfer
Modes of data transferModes of data transfer
Modes of data transfer
 
DMA and DMA controller
DMA and DMA controllerDMA and DMA controller
DMA and DMA controller
 
8086 memory segmentation
8086 memory segmentation8086 memory segmentation
8086 memory segmentation
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memory
 
Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
Microprocessor 80386
Microprocessor 80386Microprocessor 80386
Microprocessor 80386
 
Cache memory
Cache memoryCache memory
Cache memory
 
Memory hierarchy
Memory hierarchyMemory hierarchy
Memory hierarchy
 
Memory mapping
Memory mappingMemory mapping
Memory mapping
 
Computer arithmetic
Computer arithmeticComputer arithmetic
Computer arithmetic
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
 
Cache memory
Cache memoryCache memory
Cache memory
 
Associative memory 14208
Associative memory 14208Associative memory 14208
Associative memory 14208
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Virtual memory ppt
Virtual memory pptVirtual memory ppt
Virtual memory ppt
 

Ähnlich wie Cache memory

CACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITSCACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITSAnkitPandey440
 
Cache management
Cache managementCache management
Cache managementUET Taxila
 
Memory mapping techniques and low power memory design
Memory mapping techniques and low power memory designMemory mapping techniques and low power memory design
Memory mapping techniques and low power memory designUET Taxila
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDITtjunicornfx
 
Cache.pptx
Cache.pptxCache.pptx
Cache.pptxVCETCSE
 
Cache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.pptCache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.pptrularofclash69
 
Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldPraveen Kumar
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, functionTeddyIswahyudi1
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxshahdivyanshu1002
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimizationK Gowsic Gowsic
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureShweta Ghate
 

Ähnlich wie Cache memory (20)

04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
CACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITSCACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITS
 
Cache management
Cache managementCache management
Cache management
 
Memory mapping techniques and low power memory design
Memory mapping techniques and low power memory designMemory mapping techniques and low power memory design
Memory mapping techniques and low power memory design
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDIT
 
Cache.pptx
Cache.pptxCache.pptx
Cache.pptx
 
Cache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.pptCache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.ppt
 
Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworld
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimization
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 

Kürzlich hochgeladen

Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 

Kürzlich hochgeladen (20)

Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 

Cache memory

  • 2. Memory Hierarchy  Computer memory is organized in a hierarchy. This done to cope up with the speed of processor and hence increase performance.  Closest to the processor are the Processing registers. Then comes the Cache memory, followed by Main memory.
  • 3. SRAM and DRAM  Both are random access memories and are volatile, i.e. constant power supply is required to avoid data loss.  DRAM :- made up of a capacitor and a transistor. Transistor acts as a switch and data in the form of charge is present on the capacitor. Requires periodic charge refreshing to maintain data storage. Lesser cost per bit, less expensive. Used for large memory  SRAM :- made up of 4 transistors, which are cross-connected in an arrangement that produces stable logic state. Greater costs per bit, more expensive. Used for small memory.
  • 4. Principles of Locality  Since programs can access a small portion of their address space at any given instant, thus to increase performance, two policies are followed :-  A) Temporal Locality :- locality in time, i.e. if an item is referred, it will tend to referred again soon.  B) Spatial Locality :- locality in space, i.e. if an item is referred, its neighboring
  • 5. Mapping Functions  There are three main types of memory mapping functions :-  1) Direct Mapped  2) Fully Associative  3) Set Associative  For the coming explanations, let us assume 1GB main memory, 128KB Cache memory and Cache line size 32B.
  • 6. Direct Mapping TAG LINE or SLOT (r) OFFSET •Each memory block is mapped to a single cache line. For the purpose of cache access, each main memory address can be viewed as consisting of three fields •No two block in the same line have the same Tag field •Check contents of the cache by finding s w
  • 7.  For the given example, we have –  1GB main memory = 220 bytes  Cache size = 128KB = 217 bytes  Block size = 32B = 25 bytes  No. of cache lines = 217/25 = 212, thus 12 bits are required to locate 212 lines.  Also, offset is 25bytes and thus 5 bits are required to locate individual byte.  Thus Tag bits = 32 – 12 - 5 = 14 bits 14 12 5
  • 8. Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  No. of blocks in main memory = 2s+ w/2w = 2s  Number of lines in cache = m = 2r  Size of tag = (s – r) bits  Mapping Function  Jth Block of the main memory maps to ith cache line  I = J modulo M (M = no. of cache lines)
  • 9. Pro’s and Con’s  Simple  Inexpensive  Fixed location for given block  If a program accesses 2 blocks that map to the same line repeatedly, cache misses (conflict misses) are very high
  • 10. Fully Associative Mapping  A main memory block can load into any line  of cache  Memory address is interpreted as tag and  word  Tag uniquely identifies block of memory  Every line’s tag is examined for a match  Cache searching gets expensive and more power consumption due to parallel comparators TAG OFFSET s w
  • 12.  For the given example, we have –  1GB main memory = 220 bytes  Cache size = 128KB = 217 bytes  Block size = 32B = 25 bytes Here, offset is 25bytes and thus 5 bits are required to locate individual byte.  Thus Tag bits = 32 – 5 = 27 bits 27 5
  • 13. Fully Associative Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  No. of blocks in main memory = 2s+ w/2w = 2s  Number of lines in cache = Total Number of cache blocks  Size of tag = s bits
  • 14. Pro’s and Con’s  There is flexibility as to which block to replace when a new block is read into the cache  The complex circuitry required for parallel Tag comparison is however a major disadvantage.
  • 15. Set Associative Mapping  Cache is divided into a number of sets  Each set contains a number of lines  A given block maps to any line in a given set. e.g. Block B can be in any line of set i  If 2 lines per set,  2 way associative mapping  A given block can be in one of 2 lines in only one sets w TAG SET (d) OFFSET
  • 17.  For the given example, we have –  1GB main memory = 220 bytes  Cache size = 128KB = 217 bytes  Block size = 32B = 25 bytes  Let it be a 2-way set associative cache,  No. of sets = 217/(2*25 )= 211, thus 11 bits are required to locate 211 sets and each set containing 2 lines each  Also, offset is 25bytes and thus 5 bits are required to locate individual byte.  Thus Tag bits = 32 – 11 - 5 = 16 bits 16 11 5
  • 18. Set Associative Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  Number of blocks in main memory = 2s  Number of lines in set = k  Number of sets = v = 2d  Number of lines in cache = kv = k * 2d  Size of tag = (s – d) bits  Mapping Function  Jth Block of the main memory maps to ith set  I = J modulo v (v = no. of sets)  Within the set, the block can be mapped to any cache line.
  • 19. Pro’s and Con’s  After simulating the hit ratio for direct mapped and (2,4,8 way) set associative mapped cache, we observe that there is significant difference in performance at least up to cache size of 64KB, set associative being the better one.  However, beyond that, the complexity of cache increases in proportion to the associativity, hence both mapping give approximately similar hit ratio.
  • 20. N-way Set Associative Cache Vs. Direct Mapped Cache:  N comparators Vs 1  Extra mux delay for the data  Data comes after hit/miss  In a direct map cache, cache block is available before hit/miss  Number of misses  DM > SA > FA  Access latency : time to perform read or write operation, i.e. time from instant address is presented to memory to the instant that data have stored or made available  DM < SA < FA
  • 21. Types of Misses Compulsory Misses :-  When a program is started, the cache is completely empty and hence the first access to the block will always be a miss as it has to brought to the cache from memory, at least for the first time.  Also called first reference misses. Can’t be avoided easily.
  • 22. Capacity Misses  Since the cache cannot hold all the blocks needed during the execution of program  Thus this miss occurs due to the blocks being discarded and later retrieved.  They occur because the cache is limited in size.  Fully Associative cache has this as its major miss reason.
  • 23. Conflict Misses  It occurs because multiple distinct memory locations map to the same cache location.  Thus in case of DM or SA, it occurs because a blocks being discarded and later retrieved.  In DM, this is a repeated phenomenon as two blocks which map to the same cache line can be accessed alternately and thereby decreasing the hit ratio.  This phenomenon is called
  • 24. Solutions to reduce misses  Capacity Misses :- ◦ Increase cache size ◦ Re-structure the program  Conflict Misses :- ◦ Increase cache size ◦ Increase associativity
  • 25. Coherence Misses  Occur when other processors update memory which in turn invalidates the data block present in other processor’s cache.
  • 26. Replacement Algorithms  For Direct Mapped Cache, since each block maps to only one line, we have no choice but the replace that line itself  Hence there isn’t any replacement policy for DM.  For SA and FA, few replacement policies :- ◦ Optimal ◦ Random ◦ Arrival ◦ Frequency ◦ Recently Used
  • 27. Optimal This is the ideal benchmarking replacement strategy.  All other policies are compared to it.  This is not implemented, but used just for comparison purposes.
  • 28. Random  Block to be replaced is randomly picked  Minimum hardware complexity – just a pseudo random number generator required.  Access time is not affected by the replacement circuit.  Not suitable for high performance systems
  • 29. Arrival - FIFO  For an N-way set associative cache  Implementation 1  Use N-bit register per cache line to store arrival time information  On cache miss – registers of all cache line in the set are  compared to choose the victim cache line  Implementation 2  Maintain a FIFO queue  Register with (log2 N) bits per cache line  On cache miss – cache line corresponding to register value 00  will be the victim.  Decrement all other registers in the set by 1 and set the victim  register with value N-1
  • 30. FIFO : Advantages & Disadvantages  Advantages  Low hardware Complexity  Better cache hit performance than Random replacement  The cache access time is not affected by the replacement  strategy (not in critical path)  Disadvantages  Cache hit performance is poor compared to LRU and frequency based replacement schemes  Not suitable for high performance systems  Replacement circuit complexity increases with increase
  • 31. Frequency – Least Frequently Used  Requires a register per cache line to save number of references (frequency count)  If cache access is hit, then increase frequency count of the corresponding register by 1  If cache miss, find the victim cache line as the cache line corresponding to minimum frequency count in the set  Reset the register corresponding to victim cache line as 0  LFU can not differentiate between past
  • 32. Least Frequently Used – Dynamic Aging (LFU-DA)  When any frequency count register in the set reaches its maximum value, all the frequency count registers in that set is shifter one position right (divide by 2)  Rest is same as LFU
  • 33. LFU : Advantages & Disadvantages  Advantages  For small and medium caches LFU works better than  FIFO and Random replacements  Suitable for high performance systems whose memory pattern follows frequency order  Disadvantages  The register should be updated in every cache access  Affects the critical path  The replacement circuit becomes more complicated when
  • 34. Least Recently Used Policy  Most widely used replacement strategy  Replaces the least recently used cache line  Implemented by two techniques :- ◦ Square Matrix Implementation ◦ Counter Implementation
  • 35. Square Matrix Implementation  N2 bits per set (DFF’s) to store the LRU information  The cache line corresponding to the row with all zeros is the victim cache line for replacement  If cache hit, all the bits in corresponding row is set to 1 and all the bits in corresponding column is set to 0.  If cache miss, priority encoder selects the cache line corresponding to the row with all zeros for replacement  Used when associativity is less
  • 36. Matrix Implementation – 4 way set Associative Cache
  • 37. Counter Implementation  N registers with log2N bits for N- way set associativity. Thus Nlog2N bits used.  Each register for each line  Cache line corresponding to counter 0 is victim cache line for replacement  If hit, all cache line with counter greater than hit cache line is decremented by 1 & hit cache line is set to N-1  If miss, the cache whose count value
  • 38. Look Policy Look Through : Access Cache, if data not found access the lower level Look Aside : Request to Cache and its lower level at the same
  • 39. Write Policy Need of Write Policy :-  A block in cache might have been be updated, but corresponding updation in main memory might not have been done  Multiple CPU’s have individual cache’s, thereby invalidating the data in other processor’s cache  I/O may be able to read write directly into main memory
  • 40. Write Through  In this technique, all the write operations are made to main memory as well as to cache, ensuring MM is always valid.  Any other processor-cache module, may monitor traffic to MM to maintain consistency. DISADVANTAGE  It generates memory traffic and may create bottleneck.  Bottleneck : delay in transmission of data due to less bandwidth. Hence info is not relayed at speed it is processed.
  • 41. Pseudo Write Through  Also called Write Buffer  Processor writes data into the cache and the write buffer  Memory controller writes contents of the buffer to memory  FIFO (typical number of entries 4)  After write is complete, buffer is flushed
  • 42. Write Back  In this technique, the updates are made only in cache.  When an update is made, a dirty bit or use bit, associated with the line is set  Then when a block is replaced, it is written back into the main memory, iff the dirty bit is set  Thus it minimizes memory writes DISADVANTAGE  Portions of MM are still invalid, hence I/O should be allowed access only through cache  This makes complex circuitry and potential bottleneck
  • 43. Cache Coherency This is required only in case of multiprocessors where each CPU has its own cache Why is it needed ?  Be it any write policy, if the data is modified in one cache, it invalidates the data in other cache, if they seem to hold the same data  Hence we need to maintain a cache coherency to obtain correct results
  • 44. Approaches towards Cache Coherency 1) Bus watching write through :  Cache controller monitors writes into shared memory that also resides in the cache memory  If any writes are made, the controller invalidates the cache entry  This approach depends on use of write through policy
  • 45. 2) Hardware Transparency :-  Additional hardware to ensure that all updates to main memory via cache are reflected in all cache 3) Non Cacheable memory :-  Only a portion of main memory is shared by more than 1 processor, and this is designated as non cacheable.  Here, all access to shared memory are cache misses, as its never copied to cache
  • 46. Cache Optimization  Reducing the miss penalty 1. Multi level caches 2. Critical word first 3. Priority to Read miss over writes 4. Merging write buffers 5. Victim caches
  • 47. Multilevel Cache  The inclusion of an on-chip cache gave left a question whether another external cache is still desirable?  The answer is yes! The reasons are : ◦ If there is no L2 cache and Processor makes a request for a memory location not in the L1 cache, then it accesses the DRAM or ROM. Due to relatively slower bus speed, performance degrades. ◦ Whereas, if an L2 SRAM cache is included, the frequently missing information can be quickly retrieved. Also SRAM is fast enough to match the bus speed, hence giving zero- wait state transaction.
  • 48.  L2 cache do not use the system bus as path for transfer between L2 and processor, but a separate data path to reduce burden  A series have simulations have proved that L2 cache is most efficient when its double the size of L1 cache, as otherwise, its contents will be similar to L1  Due to continued shrinkage of processor components, many processors can accommodate L2 cache on chip giving rise to opportunity to include an L3 cache  The only disadvantage of multilevel cache is that it complicates the design,
  • 49. Cache Performance  Average memory access time = Hit timeL1+Miss Rate L1 X (Hit time L2 + Miss Rate L2 X Miss penalty L2)  Average memory stalls per instruction = Misses per instruction L1 X (Hit time L2 + Misses per instruction L2 X Miss penalty L2)
  • 50. Unified Vs Split Cache  Earlier same cache is used for data as well as instructions i.e. Unified Cache  Now we have separate caches for data and instructions i.e. Split cache  Thus, if the processor attempts to fetch instruction from main memory, it first consults the instruction L1 cache and similarly for data.
  • 51. Advantages of Unified Cache  It balances load between data and instructions automatically.  That is, if execution involves more instruction fetches, the cache will tend to fill up with instructions, and if execution involves more of data fetches, the cache tends to fill up with data.  Only one cache is needed to design
  • 52. Advantages of Split Cache  Useful in parallel instruction execution and pre-fetching of predicted future instructions  Eliminate contention for the instruction fetch/decode unit and the execution unit and thereby supporting pipelining  the processor will fetch the instructions ahead of time and fill the buffer, or pipeline.  E.g. Super scalar machines Pentium and Power PC
  • 53. Critical Word First  This policy involves sending the requested word first and then transfer the rest. Thus getting the data to the processor in 1st cycle.  Assume that 1 block = 16 bytes. 1 cycle transfers 4 bytes. Thus at least 4 cycles required to transfer the block.  If the processor demands for 2nd byte, then why should we wait for entire block to be transferred. We can first send that word and then the complete block with the remaining bytes.
  • 54. Priority to read miss over writes Write Buffer:  Using write buffers: RAW conflicts with reads on cache misses  If simply wait for write buffer to empty - increases read miss penalty by 50%  Check the content of the write buffer on read miss, if no conflicts and memory system is available, allow read miss to continue. If there is a conflict, then flush the buffer before read Write Back?  Read miss replacing dirty block  Normal: Write dirty block to memory, and then do the read  Instead copy the dirty block to a write buffer, then do the
  • 55. Victim Cache  How to combine fast hit time of DM with reduced conflict Misses?  Add a small fully associative buffer (cache) to hold data discarded from cache Victim Cache  A small fully associative cache is used for collecting spill out data  Blocks that are discarded because of a miss (Victim) is stored in victim cache and is checked on a cache miss.  If found swap the data block between victim cache and main cache
  • 56.  Replacement will always happen with the LRU block of victim cache. The block that we want to transfer is made MRU.  Then from cache, the block will come to victim cache and made MRU.  The block which was transferred to cache is now made LRU  If miss in victim cache also, then MM is referred. 01 00 10 11 8 8 0 0 11 11 00
  • 57. Cache Optimization  Reducing hit time 1. Small and simple caches 2. Way prediction cache 3. Trace cache 4. Avoid Address translation during indexing of the cache
  • 58. Cache Optimization  Reducing miss rate 1)Changing cache configurations 2)Compiler optimization
  • 59. Cache Optimization  Reducing miss penalty per miss rate via parallelism 1)Hardware prefetching 2)Compiler prefetching
  • 60. Cache Optimization  Increasing cache bandwidth 1)Pipelined cache, 2)Multi-banked cache 3)Non-blocking cache