This slide contains information about Memory and its parameters,Classification of memory, Allocation policies, cache memory, Virtual memory, Paging, Segmentation and pipelining
2. 1. Location
Computer memory is placed in 3 different locations :
a. Processor (CPU) : In the form of CPU registers and it’s internal cache
memory.
b. Internal (Main Memory) : Main memory which the CPU directly accesses.
c. External (Secondary memory) : Secondary memory devices like the
magnetis disk, tapes.
2. Capacity : Expressed as word size and no. of words.
Common word sizes are 8, 16,32, 64 bits.
No. of words specifies the no. of words available in a particular
memory.
Memory Capacity = 4 K x 8 = (4 x 1024 ) x 8 bits=32768 bits
no. of words
word size
3. 3. Unit of transfer
Internal(MM)
◦ Usually governed by data bus width
External(secondary)
◦ Usually a block which is much larger than a word
4. 4. Access type
• Sequential – tape
o Start at the beginning and read through in order
o Access time depends on location of data and previous location
• Direct – disk
o Individual blocks have unique address
o Access is by jumping to vicinity plus sequential search
• Random - RAM
o Individual addresses identify location exactly
5. 5. Performance :
3 performance parameters are:
• Access time (latency)
The time between presenting an address and getting
access to valid data
• Memory Cycle time
Consists of access time plus any additional time required
before second access can begin
=Access time plus recovery time
• Transfer rate
The rate at which data can be transferred into or out of a memory
unit
6. 6. Physical type
Semiconductor – RAM
Magnetic – disk and tape
Optical – CD and DVD
7. Physical Characteristics
Volatile/non-volatile
Volatile : Information is lost when power is switched off ( eg. RAM)
Non-volatile : Magnetic Tapes/Disks.
7. Semi Conductor Memories
Read Only memory(ROM) Random Access Memory
Non volatile memory
Used to store system level programs (Eg BIOS)
PROM UV-EPROM EEPROM Static RAM(SRAM) Dynamic
RAM(DRAM)
8. Programmable ROM
• One-time programmable non-volatile memory (OTP NVM)
Erasable PROM
• Possible to erase information on the ROM chip by exposing the chip to UV
light.
• The chip has to be physically removed from the circuit to erase the info.
Electrically Erasable PROM
• ROM that can be erased and reprogrammed (written to) repeatedly
through the application electrical voltage .
• Chip does not have to be removed for erasure.
9. RAM :
• Contents are lost when power is turned off.
• Volatile memory.
• Types – SRAM and DRAM
a) DRAM :
• Stores each bit of data in a separate capacitor.
• Hence information can be stored only for a few milliseconds.
• To store information for a longer time, contents of the capacitor needs to be
refreshed periodically.
• Hence this makes DRAMs slower.
• Used to implement main memory.
• Cheaper
10. b) SRAM :
• Each bit in an SRAM is stored on four transistors.
• Hence a low density device compared to DRAM.
• No capacitors are used hence no refreshing required.
• Hence faster as compared to DRAMs
• Expensive.
• Used in cache memories.
11. SRAM DRAM
Transistors used to store
info.
Capacitors are used to store
data.
No refreshing required. Refreshing is required.
Fast Memory Slower memory
Expensive Cheaper
Low density device High density device
Used in Cache memories Used in main memories
13. Processor fetches the code and data from main memory (MM).
MM is implemented using DRAM which are slower devices.
This reduces the speed of execution.
Soln :
At a time a program works only on a small section of code and data.
Hence add a small section of SRAM along with DRAM which is referred to as
a cache memory.
1. Registers:
The fastest access to data is to data held in processor registers.
Eg: Program counter, SP , general purpose registers.
2. Level 1 cache ( L1 or primary cache) :
Small amount of memory implemented directly on the processor
chip called processor cache.
Holds copies of of instructions and data stored in a much larger
memory.
Implementation – Using SRAM chips.
14. 3. Level 2 cache (L2- secondary cache)
Placed between the primary cache and the rest of the memory.
Implementation – Using SRAM chips.
4. Main memory
Larger but slower than cache memory.
Access time (MM) = 10 times access time of L1 cache.
Implementation – Using DRAM chips.
5. Secondary memory
Provides huge amount of inexpensive storage.
Slower compared to semiconductor memories.
Hold programs and data not continually required by the CPU.
15. Main memory consists of 2n
addressablewords each
word having a unique n-bit address.
MM is divided into blocks of k words each.
Total no. of blocks (M) = 2n
/ K
Cache consists of C lines of k words each ( C<< M)
At any time, some subset of the blocks of main memory
resides in the lines in the cache.
19. Cache memory – Small amount of fast memory placed between
processor and main memory
20. Cache memory overview
• Processor requests the contents of some memory location(word).
• The cache is checked for the requested data.
• If found, the requested word is delivered to the processor.
• If not found, the block of MM in which the word resides is read into
the cache and the requested word is delivered to the processor.
• A block of data is bought into the cache even though the processor
requires only a word in that block since it is likely that there
will be future references to the same memory
location(word) or to other words in the block.
This is called the locality of reference principle
21. It is observed that memory reference by the processor for data and
instructions cluster.
Programs contain iterative loops and subroutines . Once a loop is
entered , there are repeated references to a small set of
instructions.
Operations on tables and arrays involve access to a clustered set of
data words.
22. 2 ways of introducing cache into a computer :
1. Look aside cache design
2. Look through cache design
24. CPU initiates a read operation by placing the address on the address bus.
Cache immediately checks to see if the data is a cache hit.
If HIT
Cache responds to the read operation.
MM is not involved.
If MISS
1. MM responds to the processor and the required word is
transferred to the CPU.
2. Block is transferred to the cache.
Adv:
Provides better response since data is directly transferred from
MM to the CPU
Disadv:
Processor cannot access the cache while another device is accessing
the MM (since there is a common bus that connects CPU, MM and other
devices in the system)
26. CPU communicates with the cache via a local bus isolated from the
system bus.
The CPU initiates a read operation by checking if data is in cache.
If HIT
Cache responds to the read operation.
MM is not involved.
If MISS
1. Block is first read into cache.
2. Word is transferred from cache to the processor.
27. 1. Cache size
Minimize the cache size.
Larger is cache, larger is the no of gates involved in addressing the cache,
as a result performance decreases i.e. cache becomes slower.
2. Mapping Function
Fewer cache lines than MM blocks.
Hence an algorithm is required to map main memory blocks into cache
lines.
3 techniques used are :
1. Direct mapping
2. Associative mapping
3. Set Associative mapping
28. 1. Direct mapping
No. of bits in MM address = 4 bits. Hence 16 possible addresses.
Cache size = 8 bytes.
Block size = 2 bytes(2 words). Assume 1 Word = 1 Byte
Hence total no. of cache lines = 8/2 = 4
Main memory address is split as-----
Tag
0
1
2
3
Line No.
T A G L I N E W O R D
29. 1. No. of bits in the WORD field :
No. of bits in Word field depends on the block size.
Since Blk size = 2 Words.
Hence Word field is 1 bit.
2. No. of bits in the LINE field :
No. of bits in Line field depends on the no. of cache lines.
Since no. of cache lines = 4
Hence Line field is 2 bit.
3. No. of bits in the TAG field :
Remaining 1 bit for TAG since (4-(2+1))=1
1 2 1
MM address
30. A block of MM goes to block no.(i) mod total no. of lines in
cache(c)
i.e. i mod c
Block 5 of MM will go to 5 mod 4 =1 i.e. line 1 of cache
Tag
0
2
3
Line No.
1 Block 51
31. How does the cpu access data from the cache ?
i.e if the cpu needs data at address 1011
Tag Line Word
1. Line is used as an index into the cache to access particular line.
2. If the 1-bit tag no. in the address matches the tag no. stored
currently in that line then 1-bit word is used to select one of the 2
words in that line.
3. Otherwise the 3 bit (Tag + line) field is used to fetch the block from
the main memory.
1 0 1 1
32. Computer has a 32 bit MM add. Space
Cache size= 8KB
Block size =16 bytes
Since each block is of 16 bytes hence 4 bits in the WORD field.
28 bits for TAG and LINE.
Cache size is 8KB
No. of cache lines = 8KB/16= 512 lines
512 = 29
Hence 9 bits for the LINE field.
Hence TAG = 19 bits
19 9 4
32 bits of MM address
34. Adv :
Cache controller quickly releases hit/miss information.
Diasdv :
Not flexible because a given block is mapped onto a fixed
cache line.
35. 2. Associative mapping
Best and most expensive.
A block of MM can be mapped to any cache line.
MM address consists of 2 fields
When the CPU needs a word, the cache controller has to match the TAG field
in the address with the TAG contents of all the lines in cache.
Adv:
Offers flexibility since a block of MM can be moved to any cache
line. Hence replacement is needed only if cache is totally full.
Diadv:
Longer access time
T A G W O R D
36. No. of bits in the WORD field :
No. of bits in Word field depends on the block size.
No. of bits in the TAG field :
Total no. of bits in MM address – No. of bits in WORD field
Computer has a 32 bit MM add. Space
Cache size= 8KB
Block size =16 bytes
Since each block is of 16 bytes hence 4 bits in the WORD field.
Remaining 28 bits for the TAG
T A G W O R D
28 4
38. 3. Set associative mapping
Combination of direct mapping and associative mapping.
Cache is divided into v sets consisting of k lines each.
Hence total lines in cache = v x k
Total no. sets No.of lines in each set
K lines in each set ---- Called k-way set associative
MM block can be mapped into a specific set only.
Which set?
Block i of MM gets mapped into set -- i mod v
Within a set a block can be placed in any line.
MM address has 3 fields
T A G S E T W O R D
39. No. of bits in the WORD field :
No. of bits in Word field depends on the block size.
No. of bits in the SET field :
No. of bits in Set field depends on the number of sets.
No. of bits in the TAG field :
Total no. of bits in MM address – (No. of bits in WORD field +
No. of bits in Set field )
When the CPU needs to check the cache for accessing a word, the
cache controller uses the SET field to access the cache.
Within the set there are k lines, hence the TAG field in address is
matched with TAG contents of all k lines.
40. Computer has a 32 bit MM add. space
Cache size= 8KB
Block size =16 bytes
4 way set associative.
No. of cache lines = 8KB/16= 512 lines
Total no. of sets =512/4 =128 = 27
Hence no. of bits in SET = 7
No. of bits in WORD = 4
No. of bits in TAG = 21
T A G S E T W O R D
21 7 4
42. Adv:
Flexible compared to Direct mapping.
Multiple choices for mapping a MM block .
No. of options depend on k.
During searching , TAG matching is limited to the no. of lines in the
set.
43. 2. Cache can hold 64KB of data.
Data is transferred between MM and the cache in blocks of 4 bytes
each.
MM consists of 16MB.
Find the distribution of MM address in all 3 mapping
techniques.
1. Direct Mapping
Since MM consists of 16MB.
24
x 220
= 224
i.e 224
locations (addresses)
Hence no. of bits in MM address is 24.
No. of cache lines= 26
x 210
/ 22
= 214
To select one of the cache line 14 bits are required.
LINE = 14 bits, WORD = 2 ( since block size = 4 bytes)
44. 2. Associative Mapping
MM address
Word = 2 bits ( since block size = 4 bytes)
TAG= 24-2 =22 bits
T A G W O R D
22 2
45. 3. Set Associative Mapping
Assume 2 way set associative
i.e 2 cache lines make 1 set.
MM add.
Hence no. of sets =214
/2 = 213
To select 1 of the 213
sets we require 13 bits.
Hence no. of bits in SET field =13.
Hence no. of bits in WORD field =2.
Hence no. of bits in TAG field =24-(13+2) =9
9 13 2
T A G S E T W O R D
T A G S E T W O R D
46. Eg:
A set associative cache consists of 64 lines, divided into 4-line
sets. MM consists 4K blocks of 128 words each. Show the
format of MM addess.
No. of blocks in MM = 4K= 210
x 22
= 212
Total no. of words in MM = 212
x 128 = 219
Hence total no. of bits in MM address =19 bits.
Total no. of lines in cache= 64
Total no. of sets in cache=64/4= 16= 24
Hence total no. of bits in SET field = 4.
Block size = 128 words= 27
Hence total no. of bits in Word field = 7.
Tag =19-(7+ 4) =8
8 4 7
T A G S E T W O R D
47. 3. Replacement Algorithm
Replacement Problem
Cache memory full.
New block of main memory has to be brought in cache.
Replace an existing block of cache memory with the new block.
Which block to replace?
Decision made by replacement algorithm.
For direct mapping- One possible cache line for any particular block
– no choice- no replacement needed.
For associative and set associative – options available –hence
replacement algorithm needed.
48. 4 common replacement algorithms used :
1. Least recently used(LRU):
Replace the line in the cache that has been in the cache the longest
with no reference to it.
Since it is assumed that the more recently used blocks are more
likely to be referenced again.
Can be implemented easily in a 2-way set associative mapping.
Each line includes a USE bit.
When line is referenced it’s USE bit is set to 1 and the USE bit of the
other line in that set is set to 0.
When set is full, consider that block for replacement whose USE bit
is zero.
49. 2. First In First Out :
Replace the block that has been in the cache the longest.
3.Least Frequently used (LFU)
Replace the block that has had the least references.
Requires a counter for each cache line.
4. Random Algorithm
Randomly replace a line in the cache.
50. 4. Write Policy
Before a block in the cache can be replaced it is necessary to consider
whether it has been altered in the cache by the processor.
If not altered : Block in the cache can be overwritten by the new block.
If the block in the cache is altered : Update the copy of the cache block
residing In MM.
Write Policies :
1. Write Through policy :
All write operations are made to main memory as well as to the cache,
ensuring that memory is always up-to-date.
Drawback: Requires time to write data in memory, with increase
in bus traffic.
51. 2. Write back Policy/Copy back policy
Updates are made only to the copy of the block in cache and not to
the copy of block in MM.
Update bit is associated with each line which is set when the block
in the cache is updated.
When a block in cache is considered for replacement, it is written
back in MM if and only if the UPDATE bit is set.
Minimizes MM writes
Adv: Saves time since writing in MM is not done for every write
operation on cache block.
52. 3. Buffered write through :
Improvement of write through policy.
When an update is made to a cache block the update is also written
into a buffer and from the buffer updates are made to the MM.
Processor can start a new write cycle to the cache before the
previous write cycle to MM is completed since writes to MM are
buffered.
53. Cache coherency:
Cache coherence problem occurs in a multiprocessor system.
There are multiple processors, each processor has it’s own cache with a
common memory.
Due to the presence of multiple caches, copies of 1 MM block can be
present in multiple caches.
When any processor writes(updates) a word in it’s own cache, all other
caches that contain a copy of that word will have the old, incorrect value.
Hence the caches that contain a copy of that word must be informed of
the change.
Cache coherence problem occurs when a processor modifies it’s cache
without making the updates available to other caches.
54. Cache Coherence is defined as a condition in which all the cache
blocks of a shared memory block contain same information at any
point of time.
55. Jun 13, 2015CPE 731, Snooping 55
◦ Processors see different values for u after event 3
◦ Write done by P3 not visible to P1,P2
I/O devices
Memory
P1
cache cache cache
P2 P3
5
u= ?
4
u= ?
u:5
1
u:5
2
u :5
3
u= 7
56. Approaches to achieve cache coherency :
1. Bus watching with write through
Each cache controller monitors the address lines to detect write
operations to memory by other processors.
If another processor writes to a location in memory that also
resides in some other cache, then that cache controller
invalidates that entry.
2. Hardware transparency :
All updates are reflected in memory and caches that contain a
copy of that block.
If one processor modifies a word in it’s cache , this update is
written to MM as well as other caches which contain a copy of
that word.
A "big brother" watches all caches, and upon seeing an update to
any processor's cache, it updates main memory AND all of the
caches
57. 3. Non cacheable memory
Only a portion of MM is shared by more than one processor and this
is designated as non cacheable.
Here all accesses to the shared memory are cache misses,
because the shared memory is never copied into any of the cahes.
All processors access the MM for read and write operations
i.e a block is never copied into any of the caches.
58. 5. Line Size(Block size)
As block size increases from very small to larger sizes , hit ration increases
due to principle of locality.
However as blocks become even bigger hit ration decreases since :
1. Larger blocks reduce the no. of blocks that fit into cache.
2. As blocks become larger , each additional word is farther away from the
requested word and therefore less likely to be needed in the near future.
6. No. of caches :
1. Single/two level
Originally a typical system has a single cache. Recently multiple
caches has become the norm.
Multilevel caches
2- level caches - L1 and L2 cache.
59. 2. Unified/Split caches
Unified cache: Single cache is used to store data and
Instructions.
Split cache : Split the cache into two. One dedicated for
instructions and the other dedicated for data.
60. Main memory is divided into 2 parts :one for the
operating system and the other part for user programs.
In uniprogramming, there is OS in one part and the other
part for the program currently being executed.
In a multi programming system, the user part of the
memory must be further subdivided to accommodate
multiple processes.
One of the basic functions of the OS is to bring
processes into main memory for execution by the
processor.
61. 1. Multiprogramming with Fixed Partitions
(Multiprogramming with Fixed number of tasks-
MFT)
Obsolete technique.
Supports multiprogramming.
Based on contiguous allocation.
Memory is partitioned into fixed number of
partitions but their size remains fixed.
The first region is reserved for the OS and the
remaining partitions are used for the user
programs.
62. An example of a partitioned memory :
Memory is partitioned into 5 regions. The first is
reserved for the OS.
Partition Starting
add of
partition
Size of
partition
Status
1 0K 200 K ALLOCATE
D
2 200 K 100 K FREE
3 300 K 150 K FREE
4 450 k 250 K ALLOCATE
D
5 700 K 300 K FREE
63. Once partitions are defined , the OS keeps track of
the status of each partition (allocated/free) through a
data structure called PARTITION TABLE.
For execution of a program, it is assigned to a
partition. Partition should be large enough to
accommodate the program to be executed.
After allocation some memory in the partition may
remain unused. This is known as internal
fragmentation.
64. Strategies used to allocate free partition to a program
are:
1.First –fit :
Allocates the first free partition large enough to
accommodate the process.
Leads to fast allocation of memory space.
2. Best- fit
Allocates the smallest free partition that can
accommodate the process.
Results in least wasted space
Internal fragmentation reduced but not eliminated
66. Given memory partitions of 100 KB, 500 KB, 200 KB,
300 KB, and 600 KB (in order), how would each
algorithm place processes of size 212 KB, 417 KB,
112 KB, and 426 KB (in order)?
Process: 212KB 417KB 112KB 426KB
First fit : 500KB 600KB 200KB Cannot be executed
67. Process: 212KB 417KB 112KB 426KB
Best fit : 300KB 500KB 200KB 600KB
Process: 212KB 417KB 112KB 426KB
Worst fit: 600KB 500KB 300KB Cannot be executed
68. Disadvantage of MFT :
The number of active processes is limited by the
system
i.e. limited by the pre-determined number of partitions
A large number of very small process will not use
the space efficiently giving rise to internal
fragmentation.
69. 2. Dynamic Partitioning (Multprogramming with variable no. of tasks-MVT)
Developed to overcome the difficulties with fixed Partitioning.
Partitions are of variable length and number dictated by the size of the
process.
A process is allocated exactly as much memory as required.
Eg: 64 MB of MM. Initially MM is empty except for the OS
70. Process 1 arrives of size 20 M.
Next process 2 arrives of size 14 M.
Next process 3 arrives of size 18 M.
Hole at the end of the memory of 4 M that is too small for process 4 of 8
M
74. MVT eventually leads to a situation in which there
are a lot of small holes in memory.
This is called external fragmentation.
External fragmentation – total memory space
exists to satisfy a request, but it is not contiguous;
storage is fragmented into a large number of small
holes.
75.
76. Dynamic Partitioning Placement Algorithm (Allocation
Policy)
Satisfy request of size n from list of free holes – four basic
methods:
◦ First-fit: Allocate the first hole that is big enough.
◦ Next-fit: Same logic as first-fit but starts search always
from the last allocated hole .
◦ Best-fit: Allocate the smallest hole that is big enough;
must search entire list, unless ordered by size.
Produces the smallest leftover hole.
◦ Worst-fit: Allocate the largest hole; must also search
entire list. Produces the largest leftover hole.
77. Last block that was used was a 22 MB
block, from which 14 MB was used by
the process returning a hole of 8MB
Assuming now a program of size 16
MB had to be allocated memory.
78. Issues in modern computer systems:
Physical main memory is not as large as the size of the user programs.
In a multiprogramming environment more than 1 programs are
competing to be in MM.
Size of each is larger than the size of MM.
Solution :
Parts of the program currently required by the processor are
brought into the MM and remaining stored on the secondary
devices.
Movement of programs and data between secondary storage
and MM is handled by the OS.
User does not need to be aware of limitations imposed by the available
memory.
79. Def: Virtual memory
VM is a concept which enables a program to execute even when it’s
entire code and data are not present in memory.
Permits execution of a program whose size exceeds the size of the
MM.
VM is a concept implemented through a technique called paging.
Paging
Logical address : Location relative to the beginning of a program
Physical address : Actual locations in Main memory.
Cpu when executing a program is aware of only the logical
addresses.
80. When a program needs to be accessed from the MM, MM should be
presented with physical addresses.
Memory Management Unit (MMU) converts the logical addresses into
physical addresses.
A memory management unit (MMU), is a computer hardware component responsible for
converting the logical add. to physical add.
In paging scheme --
User programs are split into fixed sized blocks called pages.
Physical memory is also divided into fixed size slots called page
frames.
Page size --- 512 bytes, 1 K, 2K ,8K in length.
Allocation of memory to a program consists of finding a
sufficient no. of unused frames for loading of pages of a
program.
Each page requires one page frame.
81. Logical Address
Lower order n bits designate the offset within the page.
Higher order (m-n) bits designate the page no.
Page no. Offset
m bits
n bits
82. How user programs are mapped into physical memory?
a
b
c
d
I
j
k
l
m
n
o
p
Logical Memory (user program)
1
0
2
3
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
e
f
g
h
.
.
Frame 0
Frame 1
Frame 2
Frame 3
Frame n-1
Let the logical memory of a
program be 2m
bytes
Hence no. of bits in the logical
address is m bits.
Here m= 4 bits
Size of a page be 2n
bytes.
Logical address
Page no. Offset
m bits
n bits
83. Paging Example - 32-byte memory
with 4-byte pages
a
b
c
d
e
f
g
h
I
j
k
l
m
n
o
p
0 5
1 6
2 1
3 2
Page Table
Logical Memory
I
j
k
l
m
n
o
p
a
b
c
d
e
f
g
h
Physical Memory
Virtual Memory: Paging
1
1
2
3
4
5
6
7
0
Frame 0
2
3
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Gives information that a page
is in which page frame
84. Mapping from logical address to physical address-
Assume CPU wants to access data corresponding to logical
Address 14 :
1. Use page no. field (11) to index into the page table (PT)
2. Indexing the PT we find page 3 is in frame no. 2 of MM.
3. 2 x 4(page size) =8
4. 8 - base address of frame 2
5. Use the offset of the logical address to access the byte within the
frame. Offset - 10(2)
6. 8 + 2(offset) = 10
Hence logical address 14 maps to physical address 10
1 1 1 0
Page no. Offset
86. At a given point in time, some of the frames ain memory are in use
and some are free
87. Do not load all pages of a program in memory, put some pages in
memory which will be required and other pages are kept on the
backing store (secondary memory)
Structue of a page table
Page
No
Frame
No.
Valid Dirty
bit
0 5 1 1
1 2 1 0
2 3 1 1
3 - 0 0
4 - 0 0
Valid bit – Indicates whether a
page is actually loaded in
main memory.
Dirty bit – Indicates whether
the page has been modified
during it’s residency in the
main memory
88. What happens when the CPU tries to use a page that was not brought
into the memory ?
1. The MMU tries to translate the logical address into physical address by
using the page table.
2. During the translation procedure, the page no. of the logical address is
used to index the PT. The valid bit for that page table entry is checked to
indicate whether the page is in memory or not.
3. If bit =0 then page is not in memory, and it is a page fault which causes
an interrupt to the OS. i.e a request for a page that is not in MM.
4. OS searches MM to find a free frame to bring in the required page.
5. Disk operation is scheduled to bring in the desired page into the newly
allocated frame.
6. Page table is modified to indicate that the page is in MM.
7. MMU uses the modified PT to translate the logical address to physical
adress.
89. Page tables are stored in MM
Every reference to memory causes 2 physical memory accesses.
1. One to fetch the appropriate PT entry from the PT.
2. Second to fetch the desired data from MM once the frame no. is
obtained from the PT.
Hence implementing a virtual memory scheme has the effect of
doubling the memory access time.
Solution:
Use a special cache for Page Table Entries called the
Translation
lookaside buffer(TLB)
TLB contains the most recently used Page Table Entries
90. Operation of paging using a TLB:
1. Logical address is in the form
2. During the translation Process MMU consults the TLB to see if
the matching Page table entry is present.
3. If match found, physical address is generarted by combining the
frame no. present at that entry at that entry in the TLB with the
offset.
4. If match not found, entry is accessed from the page table in the
memory.
Page no. Offset
91.
92. When MM is full page replacement algorithm determines which
page in memory will be replaced to bring in the new page.
1. First-In-First-Out (FIFO)
Replaces the oldest page in memory i.e the page that has spent
longest time in memory.
Implemented using a FIFO queue. Replace the page at the head
of the queue.
93. Assume MM has 3 page frames. Execution of a program requires 5
distinct pages Pi where i= 1,2,3,4,5
Let the reference string be
2 3 2 1 5 2 4 5 3 2 5 2
5
3
1
2 2
3
2
3
2
3
1
3
2
4
5
2
1
5
2
4
5
2
4
3
2
4
3
5
2
3
5
4
HH H
Hit ratio = 3/12
Drawback: Tends to throw away frequently used
pages because it does not take into account the
pattern of usage of a given page
94. Least Recently used Algorithm :
Replaces that page which has not been used for the longest period
of time.
2 3 2 1 5 2 4 5 3 2 5 2
2
5
1
2 2
3
2
3
2
3
1
3
5
2
2
5
1
2
5
4
2
5
4
3
5
4
3
5
2
3
5
2
HH H H H
No. of page faults =7
Hit ratio = 5/12
Generates less no. of page faults than FIFO
Commonly used algorithms
95. 3. Optimal page Replacement algorithm
Replace that page which will not be used for the longest period of
time.
2 3 2 1 5 2 4 5 3 2 5 2
2
3
5
2 2
3
2
3
2
3
1
2
3
5
2
3
5
4
3
5
4
3
5
4
3
5
2
3
5
2
3
5
HH H H HH
Hit Ratio=6/12=1/2
Drawback: Difficult to implement since it requires a future knowledge
of the reference string.
Used mainly for comparison studies.
96. Reference String : 7 0 1 2 0 3 0 4 2 3 0
Assume no. of page frames =3
FIFO
H
Hit ratio = 1/11
99. Assumption: 2 level memory- cache, MM
This 2 level memory provides improved performance over 1 level memory
due to a property called locality of reference
2 types of locality of reference ;
1. Spatial locality :
Tendency of a processor to access instructions sequentially.
Also reflects tendency of a pgm to access data locations sequentially.
Eg: When processing a table of data.
2. Temporal Locality
Tendency of a processor to access memory locations that have been
used recently.
Eg: When an iteration loop is executed the processor executes the same set of
instructions repeatedly.
100. The locality property can be exploited by the formation of a 2
level memory
Upper level memory (M1) is smaller, faster and more expensive
than the lower memory(M2)
M1 is used as a temporary store for part of the contents of the
larger M2.
When a memory reference is made, an attempt is made to
access the item in M1.
If succeeds -- Access is made via M1.
If not -- Block of memory locations is copied from M2 to M1.
101. Performance characteristics:
1. Average time to access an item
Eg: Access time of M1 is 0.01 microsec.
Access time of M2 is 0.1 microsec.
If 95% of the memory accesses are found in the cache.
Find the average time to access a word.
Average access time (Ts)=
(0.95)(0.01microsec) + (0.05)(0.01microsec + 0.1microsec)
= 0.0095 + 0.0055 =0.015 microsec.
In general -
Ts= H x T1 + (1-H)(T1+T2)
= HxT1 + (1-H) x T2 since T1<<T2
102. 2. Average cost of a 2 level memory
Cs=average cost/bit of a 2 level memory.
C1=Cost/bit of M1.
C2=Cost/bit of M2.
S1=size of M1.
S2=size of M2.
Cs=(C1xS1 + C2xS2)/(S1+S2)
We would like Cs ~ C2
Hence S2>>S1
103. 3. Access efficiency
Measure of how close average access time (Ts) is to access time
T1
Access Efficiency =T1/Ts
We have Ts = HxT1 + (1-H) x T2
Dividing by T1
Ts/T1= H + (1-H) x T2 /T1
T1/Ts=1/(H + (1-H) x T2 /T1)
104. In a 2 level memory tA1= 10-7
s and tA2=10-2
s. What must be the hit
ratio(H) for the access efficiency to be atleast 90% of it’s maximum
possible value?
T1= 10-7
s
T2 =10-2
s
T1/Ts= 0.9
T1/Ts=1/(H + (1-H) x T2 /T1)
0.9 (H + (1-H) x 10-2
/10-7
)=1
H + (1-H) x 105=
1/0.9
H=0.99
105. A given memory module has a fixed memory bandwidth , i.e no of
bytes transferred per second.
CPU is capable of handling a much hegher bandwidth than the
memory.
Due to the speed mismatch the CPU has to idle frequently for the
need of instructions/data from the memory.
Memory interleaving is a technique of dividing the MM into multiple
independent modules so that overall bandwidth is increased.
The interleaving of MM addresses among ‘m’ memory modules is
called m-way interleaving.
In interleaved memory , the MM of a computer is srtuctured as a
collection of physically separate modules each with it’s own address
buffer register(ABR) and a data buffer register(DBR).
106. Adv : memory access operations may proceed in more than one
module at the same time; increasing the aggregate rate of
transmission of words to and from the MM.
2 ways of distributing individual addresses over the modules :
1. High Order Interleaving
2. Low Order Interleaving
107. 1. High Order Interleaving
Each module contains consecutive addresses.
Eg: MM address– 4 bits
Total 24
=16 possible addresses.
Assume 4-way interleaving
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Module 0 Module 1 Module 2 Module 3
108. The higher order k bits of an n-bit MM address select the module and (n-k)
bits i.e m-bits select address in module
Module Address in module
Module 0 Module 1 Module 2K
-1
ABR ABR ABRDBR DBR DBR
K bits m bits
MM bits
Consecutive words in a module
109. Adv: Permits easy memory expansion by the addition of 1/more
modules.
Disadv :
Causes memory conflicts in case of array processors since
consecutive data is in the same module.
A previous memory access has to be completed before the
arrival of the next request thereby resulting in delay.
110. 2. Low order interleaving :
Consecutive words are located in consecutive modules.
Eg: MM address– 4 bits
Total 24
=16 possible addresses.
Assume 4-way interleaving
0000
0100
1000
1100
0001
0101
1001
1101
0010
0110
1010
1110
0011
0111
1011
1111
Module 0 Module 1 Module 2 Module 3
111. The lower order k bits of an n-bit MM address select the module and higher
order m-bits select address in module
ModuleAddress in module
Module 0 Module 1 Module 2K
-1
ABR ABR ABRDBR DBR DBR
m bits k bits
MM bits
Consecutive words in a module
112. Adv
A request to access consecutive mem. Locations can keep several
modules busy at one time .
Hence results in faster access to a block of data.
113. Paging is not (usually) visible to the programmer
Segmentation is visible to the programmer.
Segmentation is a memory management scheme that supports user’s view
of memory.
Programmers never think of their programs as a linear array of words.
Rather, they think of their programs as a collection of logically related
entities, such as subroutines or procedures, functions, global or local data
areas, stack etc.
i.e they view programs as a collection of segments.
A program consists of functions/subroutines , main program and data which
are required for the execution of the program.
Each logical entity is a segment.
Hence if we have a program consisting of 2 functions and an inbuilt function
sqrt.
Then we have a segment for the main program, each of the functions, data
segment and a segment for sqrt.
116. Each segment has a name and a length.
A logical address in segmentation is specified
1. A segment name (segment number)
2. An offset within the segment
The user therefore specifies each address by 2 quantities :
segment no. and the offset
When a user program is compiled, the compiler automatically
constructs different segments reflecting the input program.
117. Conversion of logical address to physical address :
This mapping is done with the help of a segment table.
Every process has its own segment table.
Each entry in the segment table has a segment base and a
segment limit.
The segment base contains the starting physical address where
the segment resides in memory.
Segment limit specifies the length of the segment.
118.
119.
120. Eg: Segment 2 is 400 bytes long and begins at location(address)
4300.
A reference to byte no. 53 of segment 2 is mapped onto location
4300 + 53 =4353.
A reference to segment 3, byte 852, is mapped onto 3200 +
852=4052.
A reference to byte 1222 of segment 0 would result in a
trap(interrupt) to the OS as this segment is only 1000 bytes.
121. External Fragmentation– total memory space exists to
satisfy a request, but it is not contiguous.
Each segment is allocated a contiguous piece of physical memory.
As segments are of different sizes swapping in/out of segments
leads to holes in memory hence causes external fragmentation.
Memory allocation to a segment is done using first fit, worst fit and
best fit strategy
122. The associative memory is also known as Content
addressable memory.
It differs from other memories such as RAM, ROM, disk etc
in the nature of accessing the memory and reading the
content.
To any other memory, the address is given as input for
reading the content of location. In the case of associative
memory, usually a part of the content or the entire content
is used for accessing the content.
The associative memory searches all the locations
simultaneously in parallel
It is useful as a cache memory where in a search of the
locations is needed to match with given TAG and TLB
123. - Accessed by the content of the data rather than by an address
- Also called Content Addressable Memory (CAM)
Hardware Organization Argument register(A)
Key register (K)
Associative memory
array and logic
m words
n bits per word
Match
register
Input
Read
Write
- Compare each word in CAM in parallel with the
content of A(Argument Register)
- If CAM Word[i] = A, M(i) = 1
- K(Key Register) provides a mask for choosing a
particular field or key in the argument in A
(only those bits in the argument that have 1’s in
their corresponding position of K are compared)
Associative Memory
124. Fig shows the block diagram of associative memory of m
locations of n bits each.
To perform a search operation, the argument is given to the
argument register .
Two types of searches are possible
◦ Searching on the entire argument
◦ Searching on a part within the argument
The key register is supplied a mask pattern that indicates the
bits of argument to be included for search pattern.
If any bit is 0 in the mask pattern, then the corresponding bits of
the argument register should not be included in the search.
125. For an entire argument search, the key register is
supplied all 1’s
While searching, the associative memory locates all the
words which match the given argument and marks them
by setting the corresponding bits in the match register
Example
A = 10111100
K = 11100000
Word1 = 10011100 no match
Word 2 = 101100001 match
126. Associative memory – parallel search
TLB
◦ If Page no is in associative register, get frame #
out
◦ Otherwise get frame # from page table in
memory
Page # Frame #
127. Example Associative memory(TLB)
PT of process id(0001)
Eg: if page 2 of pid 0001 is searched for in the tlb to
find a matching frame no.
Then input register will have value 01000010000
Mask register - 11111110000
Page
number
Frame
number
000 1001
001 Ω
010 1000
011 0001
100 Ω
101 1101
110 Ω
111 Ω
Page
number
PID
Frame
number
000 0001 1001
010 0001 1000
011 0001 0001
101 0100 1101
128.
129. Pipelining used in processors allow overlapping
execution of multiple instructions.
131. Space time diagram for non pipelined processor
I1
I1
I1
I1
I2
I2
I2
I2
IF
ID
OF
EX
1 2 3 4 5 6 7 8
time
132. Space time diagram for pipelined processor
I1
I1
I1
I1
I2
I2
I2
I3IF
ID
OF
EX
I3
I4
time
1 2 3 4 5 6 7
I2
133. Computers can be classified based on the instruction and
data streams
134. This classification was first studied and proposed by Michael Flynn in
1972.
He introduced the concept of instruction and data streams for
categorizing of computers.
Instruction Stream and Data Stream
The term ‘stream’ refers to a sequence or flow of either instructions or
data operated on by the computer.
In the complete cycle of instruction execution, a flow of instructions from
main memory to the CPU is established. This flow of instructions is
called instruction stream.
Similarly, there is a flow of operands between processor and memory
bi-directionally. This flow of operands is called data stream .
135. Thus, it can be said that the sequence of instructions executed by CPU forms
the Instruction stream and the sequence of data (operands) required for
execution of instructions form the Data streams.
136. Flynn’s Classification
Flynn’s classification is based on multiplicity of instruction streams
and data streams observed by the CPU during program execution.
Let Is and Ds are streams flowing at any point in the execution,
then the computer organization can be categorized as follows:
1. SISD- Single Instruction and Single Data stream
2. SIMD- Single Instruction and Multiple Data stream
3. MISD- Multiple Instruction and Single Data stream
4. MIMD- Multiple Instruction and Multiple Data stream
138. Pipelined processor
IF ID OF EX
Pipelined processors come under the category of SISD
computer
139. SIMD
•There are multiple processing elements supervised by the same
control unit.
•All PE’s receive the same instruction broadcast from the same control
unit but operate on different data sets from distinct data streams.
• The shared memory subsystem may contain multiple modules.
140. Eg:
For (i=1; i<10; i++)
c[i]= a[i]+b[i]
In this example various iterations of the loop are
independent of each other. i.e
c[1]=a[1] + b[1]
c[2]=a[2]+b[2] and so on….
In this case if there are 10 processing elements then the same
instruction add could be executed on all on different data sets
hence increasing the speed of execution
141. IS>1 Ds=1
There are n processors each receiving distinct instructions operating
over the same data stream. The output of one processor become
the input to the next and so on.
No computers exists in this class
143. Here each processing element is capable of
executing a different program independent of the
other processing element .
SIMD require less hardware and memory as
compared to MIMD.
MIMD can be further classified as tightly coupled
and loosely coupled
Tightly coupled- Processors share memory
Loosely coupled- Each processor has it’s own
memory.