SlideShare a Scribd company logo
1 of 12
cache memory
Cache memory is random access memory (RAM) that a computer microprocessor can access
more quickly than it can access regular RAM. As the microprocessor processes data, it looks first
in the cache memory and if it finds the data there (from a previous reading of data), it does not
have to do the more time-consuming reading of data from larger memory.
Cache memory is sometimes described in levels of closeness and accessibility to the
microprocessor. An L1 cache is on the same chip as the microprocessor. (For example,
thePowerPC 601 processor has a 32 kilobyte level-1 cache built into its chip.) L2 is usually a
separate static RAM (SRAM) chip. The main RAM is usually a dynamic RAM (DRAM) chip.
In addition to cache memory, one can think of RAM itself as a cache of memory for hard
diskstorage since all of RAM's contents come from the hard disk initially when you turn your
computer on and load the operating system (you are loading it into RAM) and later as you start
new applications and access new data. RAM can also contain a special area called adisk
cache that contains the data most recently read in from the hard disk.
A type of RAM (random Access Memory) that computers go to first to research for the information it
needs. The computer can access this quicker than it can regular RAM and need go no farther,
providing the information is in the Cache memory.
A CPU cache is a cache used by the central processing unit (CPU) of a computer to reduce the
average time to accessmemory. The cache is a smaller, faster memory which stores copies of the
data from frequently used main memorylocations. Most CPUs have different independent caches,
including instruction and data caches, where the data cache is usually organized as a hierarchy of
more cache levels (L1, L2 etc.)
Cache (pronounced cash) memory is extremely fast memory that is built into a computer’scentral
processing unit (CPU), or located next to it on a separate chip. The CPU uses cache memory to store
instructions that are repeatedly required to run programs, improving overall system speed. The
advantage of cache memory is that the CPU does not have to use themotherboard’s system bus for
data transfer. Whenever data must be passed through the system bus, the data transfer speed slows
to the motherboard’s capability. The CPU can process data much faster by avoiding the bottleneck
created by the system bus.
As it happens, once most programs are open and running, they use very few resources. When these
resources are kept in cache, programs can operate more quickly and efficiently. All else being equal,
cache is so effective in system performance that a computer running a fast CPU with little cache can
have lower benchmarks than a system running a somewhat slower CPU with more cache. Cache built
into the CPU itself is referred to as Level 1 (L1)cache. Cache that resides on a separate chip next to
the CPU is called Level 2 (L2) cache. Some CPUs have both L1 and L2 cache built-in and designate
the separate cache chip asLevel 3 (L3) cache.
Ad
Cache that is built into the CPU is faster than separate cache, running at the speed of
themicroprocessor itself. However, separate cache is still roughly twice as fast as Random Access
Memory (RAM). Cache is more expensive than RAM, but it is well worth getting a CPU and
motherboard with built-in cache in order to maximize system performance.
Disk caching applies the same principle to the hard disk that memory caching applies to the CPU.
Frequently accessed hard disk data is stored in a separate segment of RAM in order to avoid having
to retrieve it from the hard disk over and over. In this case, RAM is faster than the platter technology
used in conventional hard disks. This situation will change, however, as hybrid hard disks become
ubiquitous. These disks have built-in flash memory caches. Eventually, hard drives will be 100% flash
drives, eliminating the need for RAM disk caching, as flash memory is faster than RAM.
cache coherence



Related Terms
 cache
 trace cache
 Cache Logical Partition
 address translation cache
 DataBurst Cache
Expert Articles
 Software Defined Storage: What About the File System?
 Sanbolic Launches New Software for the Software-Defined Data Center
 Maxta Raises $25M, Teams with Intel for Software-Defined Storage
(cash cōhēr´&ns) (n.) A protocol for managing the caches of a multiprocessor system so that no data is lost or
overwritten before the data is transferred from a cache to the target memory. When two or more computer processors
work together on a single program, known as multiprocessing, each processor may have its own memory cache that
is separate from the larger RAM that the individual processors will access. A memory cache, sometimes called
a cache store or RAM cache, is a portion of memory made of high-speed static RAM (SRAM) instead of the slower
and cheaperdynamic RAM (DRAM) used for main memory. Memory caching is effective because
most programs access the same data or instructions over and over. By keeping as much of this information as
possible in SRAM, the computer avoids accessing the slower DRAM.
When multiple processors with separate caches share a common memory, it is necessary to keep the caches in a
state of coherence by ensuring that any shared operand that is changed in any cache is changed throughout the
entire system. This is done in either of two ways: through a directory-based or a snooping system. In a directory-
based system, the data being shared is placed in a common directory that maintains the coherence between caches.
The directory acts as a filter through which the processor must ask permission to load an entry from the primary
memory to its cache. When an entry is changed the directory either updates or invalidates the other caches with that
entry. In a snooping system, all caches on the bus monitor (or snoop) the bus to determine if they have a copy of the
block of data that is requested on the bus. Every cache has a copy of the sharing status of every block of physical
memory it has.
Cache misses and memory traffic due to shared data blocks limit the performance of parallel computing in
multiprocessor computers or systems. Cache coherence aims to solve the problems associated with sharing data.
Cache coherence
From Wikipedia, the free encyclopedia
Multiple Caches of Shared Resource
In computing, cache coherence is the consistency of shared resource data that ends up stored in
multiplelocal caches.
When clients in a system maintain caches of a common memory resource, problems may arise with
inconsistent data. This is particularly true of CPUs in amultiprocessing system. Referring to the
illustration on the right, if the top client has a copy of a memory block from a previous read and the
bottom client changes that memory block, the top client could be left with an invalid cache of
memory without any notification of the change. Cache coherence is intended to manage such
conflicts and maintain consistency between cache and memory.
Overview[edit]
In a shared memory multiprocessor system with a separate cache memory for each processor, it is
possible to have many copies of any one instruction operand: one copy in the main memory and one
in each cache memory. When one copy of an operand is changed, the other copies of the operand
must be changed also. Cache coherence is the discipline that ensures that changes in the values of
shared operands are propagated throughout the system in a timely fashion.[1]:30
There are three distinct levels of cache coherence:[2]
1. every write operation appears to occur instantaneously
2. all processors see exactly the same sequence of changes of values for each separate
operand
3. different processors may see an operation and assume different sequences of values; this is
considered to be a noncoherent behavior.
In both level 2 behavior and level 3 behavior, a program can observe stale data. Recently, computer
designers have come to realize that the programming discipline required to deal with level 2
behavior is sufficient to deal also with level 3 behavior.[citation needed]
Therefore, at some point only
level 1 and level 3 behavior will be seen in machines.[citation needed]
Definition[edit]
Coherence defines the behavior of reads and writes to the same memory location. The coherence of
caches is obtained if the following conditions are met:[3]
1. In a read made by a processor P to a location X that follows a write by the same processor P
to X, with no writes of X by another processor occurring between the write and the read
instructions made by P, X must always return the value written by P. This condition is related
with the program order preservation, and this must be achieved even in monoprocessed
architectures.
2. A read made by a processor P1 to location X that happens after a write by another processor
P2 to X must return the written value made by P2 if no other writes to X made by any
processor occur between the two accesses and the read and write are sufficiently
separated. This condition defines the concept of coherent view of memory. If processors can
read the same old value after the write made by P2, we can say that the memory is
incoherent.
3. Writes to the same location must be sequenced. In other words, if location X received two
different values A and B, in this order, from any two processors, the processors can never
read location X as B and then read it as A. The location X must be seen with values A and B
in that order.[2]
These conditions are defined supposing that the read and write operations are made
instantaneously. However, this doesn't happen in computer hardware given memory latency and
other aspects of the architecture. A write by processor P1 may not be seen by a read from processor
P2 if the read is made within a very small time after the write has been made. Thememory
consistency model defines when a written value must be seen by a following read instruction made
by the other processors.
Rarely, and especially in algorithms, coherence can instead refer to the locality of reference.
Coherency mechanisms[edit]
Directory-based
In a directory-based system, the data being shared is placed in a common directory that
maintains the coherence between caches. The directory acts as a filter through which the
processor must ask permission to load an entry from the primary memory to its cache. When
an entry is changed the directory either updates or invalidates the other caches with that
entry.
Snooping
This is a process where the individual caches monitor address lines for accesses to memory
locations that they have cached.[4]
It is called a write invalidate protocol when a write
operation is observed to a location that a cache has a copy of and the cache controller
invalidates its own copy of the snooped memory location.[5]
Snarfing
It is a mechanism where a cache controller watches both address and data in an attempt to
update its own copy of a memory location when a second master modifies a location in main
memory. When a write operation is observed to a location that a cache has a copy of, the
cache controller updates its own copy of the snarfed memory location with the new data.
Distributed shared memory systems mimic these mechanisms in an attempt to maintain
consistency between blocks of memory in loosely coupled systems.
The two most common mechanisms of ensuring coherency are snooping and directory-
based, each having its own benefits and drawbacks. Snooping protocols tend to be
faster, if enough bandwidth is available, since all transactions are a request/response
seen by all processors. The drawback is that snooping isn't scalable. Every request
must be broadcast to all nodes in a system, meaning that as the system gets larger, the
size of the (logical or physical) bus and the bandwidth it provides must grow. Directories,
on the other hand, tend to have longer latencies (with a 3 hop request/forward/respond)
but use much less bandwidth since messages are point to point and not broadcast. For
this reason, many of the larger systems (>64 processors) use this type of cache
coherence.
For the snooping mechanism, a snoop filter reduces the snooping traffic by maintaining
a plurality of entries, each representing a cache line that may be owned by one or more
nodes. When replacement of one of the entries is required, the snoop filter selects for
replacement the entry representing the cache line or lines owned by the fewest nodes,
as determined from a presence vector in each of the entries. A temporal or other type of
algorithm is used to refine the selection if more than one cache line is owned by the
fewest number of nodes.[6]
Coherency protocol[edit]
A coherency protocol is a protocol which maintains the consistency between all the
caches in a system of distributed shared memory. The protocol maintains memory
coherence according to a specific consistency model. Older multiprocessors support
the sequential consistency model, while modern shared memory systems typically
support therelease consistency or weak consistency models.
Transitions between states in any specific implementation of these protocols may vary.
For example, an implementation may choose different update and invalidation
transitions such as update-on-read, update-on-write, invalidate-on-read, or invalidate-
on-write. The choice of transition may affect the amount of inter-cache traffic, which in
turn may affect the amount of cache bandwidth available for actual work. This should be
taken into consideration in the design of distributed software that could cause strong
contention between the caches of multiple processors.
Various models and protocols have been devised for maintaining coherence, such
as MSI, MESI (aka Illinois), MOSI, MOESI,MERSI, MESIF, write-once,
and Synapse, Berkeley, Firefly and Dragon protocol.[1]:30–34
Cache Coherence for Multi Processing
In computing, cache coherence (also cache coherency) refers to the consistency of data stored in local caches of
a shared resource. Cache coherence is a special case of memory coherence. When clients in a system maintain
caches of a common memory resource, problems may arise with inconsistent data.
This is particularly true of CPUs in a multiprocessing system. Referring to the "Multiple Caches of Shared Resource"
figure, if the top client has a copy of a memory block from a previous read and the bottom client changes that
memory block, the top client could be left with an invalid cache of memory without any notification of the
change. Cache coherence is intended to manage such conflicts and maintain consistency between cache and
memory.
Definition - What does Cache Coherence mean?
Cache coherence is the regularity or consistency of data stored in cache memory. Maintaining cache
and memory consistency is imperative for multiprocessors or distributed shared memory (DSM)
systems. Cache management is structured to ensure that data is not overwritten or lost. Different
techniques may be used to maintain cache coherency, including directory based coherence, bus
snooping and snarfing. To maintain consistency, a DSM system imitates these techniques and uses
a coherency protocol, which is essential to system operations. Cache coherence is also known as
cache coherency or cache consistency.
Techopedia explains Cache Coherence
The majority of coherency protocols that support multiprocessors use a sequential consistency
standard. DSM systems use a weak or release consistency standard. The following methods are
used for cache coherence management and consistency in read/write (R/W) and instantaneous
operations: Written data locations are sequenced. Write operations occur instantaneously. Program
order preservation is maintained with RW data. A coherent memory view is maintained, where
consistent values are provided through shared memory. Several types of cache coherency may be
utilized by different structures, as follows: Directory based coherence: References a filter in
which memory data is accessible to all processors. When memory area data changes, the
cache is updated or invalidated. Bus snooping: Monitors and manages all cache memory and
notifies the processor when there is a write operation. Used in smaller systems with fewer
processors.Snarfing: Self-monitors and updates its address and data versions. Requires
large amounts of bandwidth and resources compared to directory based coherence and bus
snooping.
Issues in Cache Memory
Memory Hierarchy Issues
We first illustrate the issues involved in optimizing memory system performance on
multiprocessors, and define the terms that are used in this paper. True sharingcache
misses occur whenever two processors access the same data word. True sharing
requires the processors involved to explicitly synchronize with each other to ensure
program correctness. A computation is said to have temporal locality if it re-uses
much of the data it has been accessing; programs with high temporal locality tend to
have less true sharing. The amount of true sharing in the program is a critical factor
for performance on multiprocessors; high levels of true sharing and synchronization
can easily overwhelm the advantage of parallelism.
It is important to take synchronization and sharing into consideration when deciding
on how to parallelize a loop nest and how to assign the iterations to processors.
Consider the code shown in Figure 1(a). While all the iterations in the first two-deep
loop nest can run in parallel, only the inner loop of the second loop nest is
parallelizable. To minimize synchronization and sharing, we should also parallelize
only the inner loop in the first loop nest. By assigning the ith iteration in each of the
inner loops to the same processor, each processor always accesses the same rows of
the arrays throughout the entire computation. Figure 1(b) shows the data accessed by
each processor in the case where each processor is assigned a block of rows. In this
way, no interprocessor communication or synchronization is necessary.
Figure 1: A simple example: (a) sample code, (b) original data mapping and (c)
optimized data mapping. The light grey arrows show the memory layout order.
Due to characteristics found in typical data caches, it is not sufficient to just minimize
sharing between processors. First, data are transferred in fixed-size units known
as cache lines, which are typically 4 to 128 bytes long[16]. A computation is said to
have spatial locality if it uses multiple words in a cache line before the line is
displaced from the cache. While spatial locality is a consideration for both uni- and
multiprocessors, false sharing is unique to multiprocessors. False sharing results when
different processors use different data that happen to be co-located on the same cache
line. Even if a processor re-uses a data item, the item may no longer be in the cache
due to an intervening access by another processor to another word in the same cache
line.
Assuming the FORTRAN convention that arrays are allocated in column-major order,
there is a significant amount of false sharing in our example, as shown in Figure1(b).
If the number of rows accessed by each processor is smaller than the number of words
in a cache line, every cache line is shared by at least two processors. Each time one of
these lines is accessed, unwanted data are brought into the cache. Also, when one
processor writes part of the cache line, that line is invalidated in the other processor's
cache. This particular combination of computation mapping and data layout will result
in poor cache performance.
Another problematic characteristic of data caches is that they typically have a small
set-associativity; that is, each memory location can only be cached in a small number
of cache locations. Conflict misses occur whenever different memory locations
contend for the same cache location. Since each processor only operates on a subset of
the data, the addresses accessed by each processor may be distributed throughout the
shared address space.
Consider what happens to the example in Figure 1(b) if the arrays are of
size and the target machine has a direct-mapped cache of size 64KB.
Assuming that REALs are 4B long, the elements in every 16th column will map to the
same cache location and cause conflict misses. This problem exists even if the caches
are set-associative, given that existing caches usually only have a small degree of
associativity.
As shown above, the cache performance of multiprocessor code depends on how the
computation is distributed as well as how the data are laid out. Instead of simply
obeying the data layout convention used by the input language (e.g. column-major in
FORTRAN and row-major in C), we can improve the cache performance by
customizing the data layout for the specific program. We observe that multiprocessor
cache performance problems can be minimized by making the data accessed by each
processor contiguous in the shared address space, an example of which is shown in
Figure 1(c). Such a layout enhances spatial locality, minimizes false sharing and also
minimizes conflict misses.
The importance of optimizing memory subsystem performance for multiprocessors
has also been confirmed by several studies of hand optimizations on real applications.
Singh et al. explored performance issues on scalable shared address space
architectures; they improved cache behavior by transforming two-dimensional arrays
into four-dimensional arrays so that each processor's local data are contiguous in
memory[28]. Torrellas et al.[30] and Eggers et al.[11,12] also showed that improving
spatial locality and reducing false sharing resulted in significant speedups for a set of
programs on shared-memory machines. In summary, not only must we minimize
sharing to achieve efficient parallelization, it is also important to optimize for the
multi-word cache line and the small set associativity. The cache behavior depends on
both the computation mapping and the data layout. Thus, besides choosing a good
parallelization scheme and a good computation mapping, we may also wish to change
the data structures in the program.

More Related Content

What's hot (20)

Cache memory
Cache memoryCache memory
Cache memory
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
Memory management
Memory managementMemory management
Memory management
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principles
 
Memory organization (Computer architecture)
Memory organization (Computer architecture)Memory organization (Computer architecture)
Memory organization (Computer architecture)
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
 
Memory management ppt coa
Memory management ppt coaMemory management ppt coa
Memory management ppt coa
 
Semiconductor memories
Semiconductor memoriesSemiconductor memories
Semiconductor memories
 
Disk scheduling
Disk schedulingDisk scheduling
Disk scheduling
 
Associative memory 14208
Associative memory 14208Associative memory 14208
Associative memory 14208
 
Memory hierarchy
Memory hierarchyMemory hierarchy
Memory hierarchy
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt
 
Cache memory
Cache memoryCache memory
Cache memory
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
 
cache memory
 cache memory cache memory
cache memory
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
 
13. multiprocessing
13. multiprocessing13. multiprocessing
13. multiprocessing
 
pipelining
pipeliningpipelining
pipelining
 
Memory management
Memory managementMemory management
Memory management
 

Viewers also liked (7)

cache memory
cache memorycache memory
cache memory
 
Snooping protocols 3
Snooping protocols 3Snooping protocols 3
Snooping protocols 3
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
Cache memory presentation
Cache memory presentationCache memory presentation
Cache memory presentation
 
COMPUTER MEMORY : TYPES & FUNCTIONS
COMPUTER MEMORY : TYPES & FUNCTIONSCOMPUTER MEMORY : TYPES & FUNCTIONS
COMPUTER MEMORY : TYPES & FUNCTIONS
 

Similar to Cache Memory Basics

Virtual Memory vs Cache Memory
Virtual Memory vs Cache MemoryVirtual Memory vs Cache Memory
Virtual Memory vs Cache MemoryAshik Iqbal
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cacheVISHAL DONGA
 
Ram and types of ram.Cache
Ram and types of ram.CacheRam and types of ram.Cache
Ram and types of ram.Cachehamza mukhtiar
 
2. the memory systems (module2)
2. the memory systems (module2)2. the memory systems (module2)
2. the memory systems (module2)Ajit Saraf
 
ملٹی لیول کے شے۔
ملٹی لیول کے شے۔ملٹی لیول کے شے۔
ملٹی لیول کے شے۔maamir farooq
 
Cpu caching concepts mr mahesh
Cpu caching concepts mr maheshCpu caching concepts mr mahesh
Cpu caching concepts mr maheshFaridabad
 
CPU Caching Concepts
CPU Caching ConceptsCPU Caching Concepts
CPU Caching ConceptsAbhijit K Rao
 
Describe each level of cache. Time sharing systems do what for the u.pdf
Describe each level of cache.  Time sharing systems do what for the u.pdfDescribe each level of cache.  Time sharing systems do what for the u.pdf
Describe each level of cache. Time sharing systems do what for the u.pdfpasqualealvarez467
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory managementMukesh Chinta
 
COMPUTER MEMORY
COMPUTER MEMORYCOMPUTER MEMORY
COMPUTER MEMORYRajat More
 

Similar to Cache Memory Basics (20)

Cache memory
Cache memoryCache memory
Cache memory
 
Virtual Memory vs Cache Memory
Virtual Memory vs Cache MemoryVirtual Memory vs Cache Memory
Virtual Memory vs Cache Memory
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cache
 
CA UNIT V..pptx
CA UNIT V..pptxCA UNIT V..pptx
CA UNIT V..pptx
 
Ram and types of ram.Cache
Ram and types of ram.CacheRam and types of ram.Cache
Ram and types of ram.Cache
 
Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0
 
Cache memory ...
Cache memory ...Cache memory ...
Cache memory ...
 
2. the memory systems (module2)
2. the memory systems (module2)2. the memory systems (module2)
2. the memory systems (module2)
 
Opetating System Memory management
Opetating System Memory managementOpetating System Memory management
Opetating System Memory management
 
ملٹی لیول کے شے۔
ملٹی لیول کے شے۔ملٹی لیول کے شے۔
ملٹی لیول کے شے۔
 
Cpu caching concepts mr mahesh
Cpu caching concepts mr maheshCpu caching concepts mr mahesh
Cpu caching concepts mr mahesh
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
CPU Caching Concepts
CPU Caching ConceptsCPU Caching Concepts
CPU Caching Concepts
 
Memory managment
Memory managmentMemory managment
Memory managment
 
Describe each level of cache. Time sharing systems do what for the u.pdf
Describe each level of cache.  Time sharing systems do what for the u.pdfDescribe each level of cache.  Time sharing systems do what for the u.pdf
Describe each level of cache. Time sharing systems do what for the u.pdf
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory management
 
Operating system
Operating systemOperating system
Operating system
 
Linux%20 memory%20management
Linux%20 memory%20managementLinux%20 memory%20management
Linux%20 memory%20management
 
COMPUTER MEMORY
COMPUTER MEMORYCOMPUTER MEMORY
COMPUTER MEMORY
 
Cache memory
Cache memoryCache memory
Cache memory
 

Recently uploaded

trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxsubscribeus100
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermicultureTakeleZike1
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Recently uploaded (20)

trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermiculture
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 

Cache Memory Basics

  • 1. cache memory Cache memory is random access memory (RAM) that a computer microprocessor can access more quickly than it can access regular RAM. As the microprocessor processes data, it looks first in the cache memory and if it finds the data there (from a previous reading of data), it does not have to do the more time-consuming reading of data from larger memory. Cache memory is sometimes described in levels of closeness and accessibility to the microprocessor. An L1 cache is on the same chip as the microprocessor. (For example, thePowerPC 601 processor has a 32 kilobyte level-1 cache built into its chip.) L2 is usually a separate static RAM (SRAM) chip. The main RAM is usually a dynamic RAM (DRAM) chip. In addition to cache memory, one can think of RAM itself as a cache of memory for hard diskstorage since all of RAM's contents come from the hard disk initially when you turn your computer on and load the operating system (you are loading it into RAM) and later as you start new applications and access new data. RAM can also contain a special area called adisk cache that contains the data most recently read in from the hard disk. A type of RAM (random Access Memory) that computers go to first to research for the information it needs. The computer can access this quicker than it can regular RAM and need go no farther, providing the information is in the Cache memory. A CPU cache is a cache used by the central processing unit (CPU) of a computer to reduce the average time to accessmemory. The cache is a smaller, faster memory which stores copies of the data from frequently used main memorylocations. Most CPUs have different independent caches, including instruction and data caches, where the data cache is usually organized as a hierarchy of more cache levels (L1, L2 etc.)
  • 2. Cache (pronounced cash) memory is extremely fast memory that is built into a computer’scentral processing unit (CPU), or located next to it on a separate chip. The CPU uses cache memory to store instructions that are repeatedly required to run programs, improving overall system speed. The advantage of cache memory is that the CPU does not have to use themotherboard’s system bus for data transfer. Whenever data must be passed through the system bus, the data transfer speed slows to the motherboard’s capability. The CPU can process data much faster by avoiding the bottleneck created by the system bus. As it happens, once most programs are open and running, they use very few resources. When these resources are kept in cache, programs can operate more quickly and efficiently. All else being equal, cache is so effective in system performance that a computer running a fast CPU with little cache can have lower benchmarks than a system running a somewhat slower CPU with more cache. Cache built into the CPU itself is referred to as Level 1 (L1)cache. Cache that resides on a separate chip next to the CPU is called Level 2 (L2) cache. Some CPUs have both L1 and L2 cache built-in and designate the separate cache chip asLevel 3 (L3) cache. Ad Cache that is built into the CPU is faster than separate cache, running at the speed of themicroprocessor itself. However, separate cache is still roughly twice as fast as Random Access Memory (RAM). Cache is more expensive than RAM, but it is well worth getting a CPU and motherboard with built-in cache in order to maximize system performance. Disk caching applies the same principle to the hard disk that memory caching applies to the CPU. Frequently accessed hard disk data is stored in a separate segment of RAM in order to avoid having to retrieve it from the hard disk over and over. In this case, RAM is faster than the platter technology used in conventional hard disks. This situation will change, however, as hybrid hard disks become ubiquitous. These disks have built-in flash memory caches. Eventually, hard drives will be 100% flash drives, eliminating the need for RAM disk caching, as flash memory is faster than RAM.
  • 3. cache coherence    Related Terms  cache  trace cache  Cache Logical Partition  address translation cache  DataBurst Cache Expert Articles  Software Defined Storage: What About the File System?  Sanbolic Launches New Software for the Software-Defined Data Center  Maxta Raises $25M, Teams with Intel for Software-Defined Storage (cash cōhēr´&ns) (n.) A protocol for managing the caches of a multiprocessor system so that no data is lost or overwritten before the data is transferred from a cache to the target memory. When two or more computer processors work together on a single program, known as multiprocessing, each processor may have its own memory cache that is separate from the larger RAM that the individual processors will access. A memory cache, sometimes called a cache store or RAM cache, is a portion of memory made of high-speed static RAM (SRAM) instead of the slower and cheaperdynamic RAM (DRAM) used for main memory. Memory caching is effective because most programs access the same data or instructions over and over. By keeping as much of this information as possible in SRAM, the computer avoids accessing the slower DRAM. When multiple processors with separate caches share a common memory, it is necessary to keep the caches in a state of coherence by ensuring that any shared operand that is changed in any cache is changed throughout the entire system. This is done in either of two ways: through a directory-based or a snooping system. In a directory- based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed the directory either updates or invalidates the other caches with that entry. In a snooping system, all caches on the bus monitor (or snoop) the bus to determine if they have a copy of the block of data that is requested on the bus. Every cache has a copy of the sharing status of every block of physical memory it has. Cache misses and memory traffic due to shared data blocks limit the performance of parallel computing in multiprocessor computers or systems. Cache coherence aims to solve the problems associated with sharing data.
  • 4. Cache coherence From Wikipedia, the free encyclopedia Multiple Caches of Shared Resource In computing, cache coherence is the consistency of shared resource data that ends up stored in multiplelocal caches. When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in amultiprocessing system. Referring to the illustration on the right, if the top client has a copy of a memory block from a previous read and the bottom client changes that memory block, the top client could be left with an invalid cache of memory without any notification of the change. Cache coherence is intended to manage such conflicts and maintain consistency between cache and memory. Overview[edit] In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of any one instruction operand: one copy in the main memory and one in each cache memory. When one copy of an operand is changed, the other copies of the operand must be changed also. Cache coherence is the discipline that ensures that changes in the values of shared operands are propagated throughout the system in a timely fashion.[1]:30 There are three distinct levels of cache coherence:[2] 1. every write operation appears to occur instantaneously 2. all processors see exactly the same sequence of changes of values for each separate operand 3. different processors may see an operation and assume different sequences of values; this is considered to be a noncoherent behavior.
  • 5. In both level 2 behavior and level 3 behavior, a program can observe stale data. Recently, computer designers have come to realize that the programming discipline required to deal with level 2 behavior is sufficient to deal also with level 3 behavior.[citation needed] Therefore, at some point only level 1 and level 3 behavior will be seen in machines.[citation needed] Definition[edit] Coherence defines the behavior of reads and writes to the same memory location. The coherence of caches is obtained if the following conditions are met:[3] 1. In a read made by a processor P to a location X that follows a write by the same processor P to X, with no writes of X by another processor occurring between the write and the read instructions made by P, X must always return the value written by P. This condition is related with the program order preservation, and this must be achieved even in monoprocessed architectures. 2. A read made by a processor P1 to location X that happens after a write by another processor P2 to X must return the written value made by P2 if no other writes to X made by any processor occur between the two accesses and the read and write are sufficiently separated. This condition defines the concept of coherent view of memory. If processors can read the same old value after the write made by P2, we can say that the memory is incoherent. 3. Writes to the same location must be sequenced. In other words, if location X received two different values A and B, in this order, from any two processors, the processors can never read location X as B and then read it as A. The location X must be seen with values A and B in that order.[2] These conditions are defined supposing that the read and write operations are made instantaneously. However, this doesn't happen in computer hardware given memory latency and other aspects of the architecture. A write by processor P1 may not be seen by a read from processor P2 if the read is made within a very small time after the write has been made. Thememory consistency model defines when a written value must be seen by a following read instruction made by the other processors. Rarely, and especially in algorithms, coherence can instead refer to the locality of reference. Coherency mechanisms[edit] Directory-based
  • 6. In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed the directory either updates or invalidates the other caches with that entry. Snooping This is a process where the individual caches monitor address lines for accesses to memory locations that they have cached.[4] It is called a write invalidate protocol when a write operation is observed to a location that a cache has a copy of and the cache controller invalidates its own copy of the snooped memory location.[5] Snarfing It is a mechanism where a cache controller watches both address and data in an attempt to update its own copy of a memory location when a second master modifies a location in main memory. When a write operation is observed to a location that a cache has a copy of, the cache controller updates its own copy of the snarfed memory location with the new data. Distributed shared memory systems mimic these mechanisms in an attempt to maintain consistency between blocks of memory in loosely coupled systems. The two most common mechanisms of ensuring coherency are snooping and directory- based, each having its own benefits and drawbacks. Snooping protocols tend to be faster, if enough bandwidth is available, since all transactions are a request/response seen by all processors. The drawback is that snooping isn't scalable. Every request must be broadcast to all nodes in a system, meaning that as the system gets larger, the size of the (logical or physical) bus and the bandwidth it provides must grow. Directories, on the other hand, tend to have longer latencies (with a 3 hop request/forward/respond) but use much less bandwidth since messages are point to point and not broadcast. For this reason, many of the larger systems (>64 processors) use this type of cache coherence. For the snooping mechanism, a snoop filter reduces the snooping traffic by maintaining a plurality of entries, each representing a cache line that may be owned by one or more nodes. When replacement of one of the entries is required, the snoop filter selects for replacement the entry representing the cache line or lines owned by the fewest nodes, as determined from a presence vector in each of the entries. A temporal or other type of algorithm is used to refine the selection if more than one cache line is owned by the fewest number of nodes.[6]
  • 7. Coherency protocol[edit] A coherency protocol is a protocol which maintains the consistency between all the caches in a system of distributed shared memory. The protocol maintains memory coherence according to a specific consistency model. Older multiprocessors support the sequential consistency model, while modern shared memory systems typically support therelease consistency or weak consistency models. Transitions between states in any specific implementation of these protocols may vary. For example, an implementation may choose different update and invalidation transitions such as update-on-read, update-on-write, invalidate-on-read, or invalidate- on-write. The choice of transition may affect the amount of inter-cache traffic, which in turn may affect the amount of cache bandwidth available for actual work. This should be taken into consideration in the design of distributed software that could cause strong contention between the caches of multiple processors. Various models and protocols have been devised for maintaining coherence, such as MSI, MESI (aka Illinois), MOSI, MOESI,MERSI, MESIF, write-once, and Synapse, Berkeley, Firefly and Dragon protocol.[1]:30–34
  • 8. Cache Coherence for Multi Processing In computing, cache coherence (also cache coherency) refers to the consistency of data stored in local caches of a shared resource. Cache coherence is a special case of memory coherence. When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system. Referring to the "Multiple Caches of Shared Resource" figure, if the top client has a copy of a memory block from a previous read and the bottom client changes that memory block, the top client could be left with an invalid cache of memory without any notification of the change. Cache coherence is intended to manage such conflicts and maintain consistency between cache and memory. Definition - What does Cache Coherence mean? Cache coherence is the regularity or consistency of data stored in cache memory. Maintaining cache and memory consistency is imperative for multiprocessors or distributed shared memory (DSM) systems. Cache management is structured to ensure that data is not overwritten or lost. Different techniques may be used to maintain cache coherency, including directory based coherence, bus snooping and snarfing. To maintain consistency, a DSM system imitates these techniques and uses a coherency protocol, which is essential to system operations. Cache coherence is also known as cache coherency or cache consistency. Techopedia explains Cache Coherence The majority of coherency protocols that support multiprocessors use a sequential consistency standard. DSM systems use a weak or release consistency standard. The following methods are used for cache coherence management and consistency in read/write (R/W) and instantaneous operations: Written data locations are sequenced. Write operations occur instantaneously. Program order preservation is maintained with RW data. A coherent memory view is maintained, where consistent values are provided through shared memory. Several types of cache coherency may be utilized by different structures, as follows: Directory based coherence: References a filter in which memory data is accessible to all processors. When memory area data changes, the cache is updated or invalidated. Bus snooping: Monitors and manages all cache memory and notifies the processor when there is a write operation. Used in smaller systems with fewer processors.Snarfing: Self-monitors and updates its address and data versions. Requires large amounts of bandwidth and resources compared to directory based coherence and bus snooping.
  • 9. Issues in Cache Memory Memory Hierarchy Issues We first illustrate the issues involved in optimizing memory system performance on multiprocessors, and define the terms that are used in this paper. True sharingcache misses occur whenever two processors access the same data word. True sharing requires the processors involved to explicitly synchronize with each other to ensure program correctness. A computation is said to have temporal locality if it re-uses much of the data it has been accessing; programs with high temporal locality tend to have less true sharing. The amount of true sharing in the program is a critical factor for performance on multiprocessors; high levels of true sharing and synchronization can easily overwhelm the advantage of parallelism. It is important to take synchronization and sharing into consideration when deciding on how to parallelize a loop nest and how to assign the iterations to processors. Consider the code shown in Figure 1(a). While all the iterations in the first two-deep loop nest can run in parallel, only the inner loop of the second loop nest is parallelizable. To minimize synchronization and sharing, we should also parallelize only the inner loop in the first loop nest. By assigning the ith iteration in each of the inner loops to the same processor, each processor always accesses the same rows of the arrays throughout the entire computation. Figure 1(b) shows the data accessed by each processor in the case where each processor is assigned a block of rows. In this way, no interprocessor communication or synchronization is necessary.
  • 10. Figure 1: A simple example: (a) sample code, (b) original data mapping and (c) optimized data mapping. The light grey arrows show the memory layout order. Due to characteristics found in typical data caches, it is not sufficient to just minimize sharing between processors. First, data are transferred in fixed-size units known as cache lines, which are typically 4 to 128 bytes long[16]. A computation is said to have spatial locality if it uses multiple words in a cache line before the line is displaced from the cache. While spatial locality is a consideration for both uni- and multiprocessors, false sharing is unique to multiprocessors. False sharing results when different processors use different data that happen to be co-located on the same cache line. Even if a processor re-uses a data item, the item may no longer be in the cache due to an intervening access by another processor to another word in the same cache line.
  • 11. Assuming the FORTRAN convention that arrays are allocated in column-major order, there is a significant amount of false sharing in our example, as shown in Figure1(b). If the number of rows accessed by each processor is smaller than the number of words in a cache line, every cache line is shared by at least two processors. Each time one of these lines is accessed, unwanted data are brought into the cache. Also, when one processor writes part of the cache line, that line is invalidated in the other processor's cache. This particular combination of computation mapping and data layout will result in poor cache performance. Another problematic characteristic of data caches is that they typically have a small set-associativity; that is, each memory location can only be cached in a small number of cache locations. Conflict misses occur whenever different memory locations contend for the same cache location. Since each processor only operates on a subset of the data, the addresses accessed by each processor may be distributed throughout the shared address space. Consider what happens to the example in Figure 1(b) if the arrays are of size and the target machine has a direct-mapped cache of size 64KB. Assuming that REALs are 4B long, the elements in every 16th column will map to the same cache location and cause conflict misses. This problem exists even if the caches are set-associative, given that existing caches usually only have a small degree of associativity. As shown above, the cache performance of multiprocessor code depends on how the computation is distributed as well as how the data are laid out. Instead of simply obeying the data layout convention used by the input language (e.g. column-major in FORTRAN and row-major in C), we can improve the cache performance by customizing the data layout for the specific program. We observe that multiprocessor cache performance problems can be minimized by making the data accessed by each processor contiguous in the shared address space, an example of which is shown in Figure 1(c). Such a layout enhances spatial locality, minimizes false sharing and also minimizes conflict misses. The importance of optimizing memory subsystem performance for multiprocessors has also been confirmed by several studies of hand optimizations on real applications. Singh et al. explored performance issues on scalable shared address space architectures; they improved cache behavior by transforming two-dimensional arrays into four-dimensional arrays so that each processor's local data are contiguous in memory[28]. Torrellas et al.[30] and Eggers et al.[11,12] also showed that improving spatial locality and reducing false sharing resulted in significant speedups for a set of programs on shared-memory machines. In summary, not only must we minimize sharing to achieve efficient parallelization, it is also important to optimize for the
  • 12. multi-word cache line and the small set associativity. The cache behavior depends on both the computation mapping and the data layout. Thus, besides choosing a good parallelization scheme and a good computation mapping, we may also wish to change the data structures in the program.