SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Multicore Computers
Mr. A. B. Shinde
Electronics Engineering
Contents…
 Hardware performance issues,
 Software performance issues,
 Multicore organization,
 Intel x 86 multicore organizations,
 ARM11 MPC core 699
2
Introduction
 A multicore computer, combines two or more processors on a
single computer chip.
 The use of more complex single-processor chips has reached a
limit due to hardware performance issues, (limits in instruction-level
parallelism & power limitations).
 The multicore architecture poses challenges to software developers
to exploit the capability for multithreading across multiple cores.
 The main variables in a multicore organization are the number of
processors on the chip, the number of levels of cache memory, and
the extent to which cache memory is shared.
3
Introduction
 A multicore computer, also known as a chip multiprocessor,
combines two or more processors (called cores) on a single piece of
silicon (called a die).
 Each core consists of all of the components of an independent
processor, such as registers, ALU, pipeline hardware, and control
unit, plus L1 instruction and data caches.
 In addition to the multiple cores, contemporary multicore chips also
include L2 cache and, in some cases, L3 cache.
4
Hardware Performance Issues
5
Hardware Performance Issues
 Microprocessor systems have experienced a steady, exponential
increase in execution performance for decades.
 This increase is due to refinements in the organization of the
processor on the chip, and the increase in the clock frequency.
6
Intel Microprocessor Performance
Hardware Performance Issues
 Increase in Parallelism:
 The organizational changes in processor design have primarily been
focused on increasing instruction-level parallelism, so that more work
could be done in each clock cycle.
7
Superscalar
Hardware Performance Issues
 Increase in Parallelism:
 The organizational changes in processor design have primarily been
focused on increasing instruction-level parallelism, so that more work
could be done in each clock cycle.
8
Simultaneous multithreading
Hardware Performance Issues
 Increase in Parallelism:
 The organizational changes in processor design have primarily been
focused on increasing instruction-level parallelism, so that more work
could be done in each clock cycle.
9
Multicore
Hardware Performance Issues
 Increase in Parallelism:
 Pipelining: Individual instructions are executed through a pipeline of
stages so that while one instruction is executing in one stage of the
pipeline, another instruction is executing in another stage of the
pipeline.
 Superscalar: Multiple pipelines are constructed by replicating
execution resources. This enables parallel execution of instructions in
parallel pipelines, so long as hazards are avoided.
 Simultaneous multithreading (SMT): Register banks are replicated
so that multiple threads can share the use of pipeline resources.
10
Hardware Performance Issues
 Increase in Parallelism:
 In the case of pipelining, simple three-stage pipelines were replaced
by pipelines with five stages, and then many more stages, with some
implementations having over a dozen stages.
 There is a practical limit to how far this trend can be taken, because
with more stages, there is the need for more logic (hardware), more
interconnections and more control signals.
11
Hardware Performance Issues
 Increase in Parallelism:
 With superscalar organization, performance increase can be
achieved by increasing the number of parallel pipelines.
 There are limitations, as the number of pipelines increases. More logic
is required to manage hazards and to stage instruction resources.
 A single thread of execution reaches the point where hazards and
resource dependencies prevent the full use of the multiple pipelines
available.
 As the complexity of managing multiple threads over a set of
pipelines limits the number of threads and number of pipelines that
can be effectively utilized.
12
Hardware Performance Issues
 Increase in Parallelism:
 The increase in complexity to deal with all of the logical issues related
to very long pipelines, multiple superscalar pipelines, and multiple SMT
register banks means that increasing amounts of the chip area is
occupied with coordinating and signal transfer logic.
 This increases the difficulty of designing, fabricating, and
debugging the chips.
 Power issues are another big challenge
13
Hardware Performance Issues
 Power Consumption:
 To maintain the higher performance, the number of transistors per chip
rise and high clock frequencies. Unfortunately, power requirements
have grown exponentially as chip density and clock frequency have
risen.
 One way to control power density is to use more of the chip area for
cache memory.
 Memory transistors are smaller and have a power density an order of
magnitude lower than that of logic.
14
Hardware Performance Issues
 Power Consumption:
15
Figure shows where the power consumption trend is leading.
Assuming about 50–60% of the chip area is devoted to memory, the chip
will support cache memory of about 100 MB and leave over 1 billion
transistors available for logic.
Hardware Performance Issues
 Power Consumption:
 The recent decades the Pollack’s rule was observed, which states that
performance increase is roughly proportional to square root of increase
in complexity.
 If you double the logic in a processor core, then it delivers only 40%
more performance.
 The use of multiple cores has the potential to provide near-linear
performance improvement with the increase in the number of cores.
16
Hardware Performance Issues
 Power Consumption:
 Power considerations provide another motive for moving toward a
multicore organization.
 The chip has such a huge amount of cache memory, it becomes
unlikely that any one thread of execution can effectively use all that
memory.
 In SMT, a number of relatively independent threads or processes has
a greater opportunity to take full advantage of the cache memory.
17
Software Performance Issues
18
Software Performance Issues
 A detailed examination of the software performance issues related to
multicore organization is huge task.
19
Software Performance Issues
 Software on Multicore:
 The performance benefits of a multicore organization depend on the
ability to effectively exploit the parallel resources.
 Consider, single application running on a multicore system.
20
The law assumes a program in which a fraction (1-f) of the execution
time involves code serial and a fraction f that involves code is
infinitely parallelizable with no scheduling overhead.
Software Performance Issues
 A number of classes of applications benefit directly from the ability to
scale throughput with the number of cores.
 Multithreaded native applications: Multithreaded applications are
characterized by having a small number of highly threaded processes.
 Examples of threaded applications include Lotus Domino or Siebel
CRM (Customer Relationship Manager).
 Multiprocess applications: Multiprocess applications are characterized
by the presence of many single-threaded processes.
 Examples of multi-process applications include the Oracle database,
SAP, and PeopleSoft.
21
Software Performance Issues
 Java applications:
 Java language greatly facilitate multithreaded applications.
 Java Virtual Machine is a multithreaded process that provides
scheduling and memory management for Java applications.
 Java applications that can benefit directly from multicore resources
include application servers such as Sun’s Java Application Server, BEA’s
Weblogic, IBM’s Websphere etc.
 All applications that use a Java 2 Platform, Enterprise Edition (J2EE
platform) application server can immediately benefit from multicore
technology.
22
Software Performance Issues
 Multiinstance applications:
 Even if an individual application does not scale to take advantage of a
large number of threads, it is still possible to gain advantage from
multicore architecture by running multiple instances of the
application in parallel.
 If multiple application instances require some degree of isolation,
then virtualization technology can be used to provide each of them
with its own separate and secure environment.
23
Multicore Organization
 The main variables in a multicore organization are as follows:
 The number of core processors on the chip
 The number of levels of cache memory
 The amount of cache memory that is shared
24
Multicore Organization
25
Dedicated L1 cache Dedicated L2 cache
Shared L2 cache Shared L3 cache
General
Organizations of
Multicore systems
Multicore Organization
 Figure shows an organization found in
some of the earlier multicore computer
chips and is still seen in embedded
chips.
 In this organization, the only on-chip
cache is L1 cache, each core having
its own dedicated L1 cache.
 Almost invariably, the L1 cache is
divided into instruction and data
caches.
 An example of this organization is the
ARM11 MPCore.
26
Dedicated L1 cache
Multicore Organization
 The organization is also one in which
there is no on-chip cache sharing.
 In this, there is enough area
available on the chip to allow for
L2 cache.
 An example of this organization is the
AMD Opteron.
27
Dedicated L2 cache
Multicore Organization
 Figure shows allocation of chip space to
memory, but with the use of a shared L2
cache.
 The Intel Core Duo has this organization.
 The amount of cache memory available
on the chip continues to grow,
performance considerations dictate
splitting off a separate, shared L3 cache,
with dedicated L1 and L2 caches for
each core processor.
 The Intel Core i7 is an example of this
organization.
28
Shared L2 cache
Shared L3 cache
Multicore Organization
 Advantages of shared L2/L3 cache on the chip:
1. Constructive interference can reduce overall miss rates.
If a thread on one core accesses a main memory location, this brings the
frame containing the referenced location into the shared cache.
If a thread on another core soon thereafter accesses the same memory
block, the memory locations will already be available in the shared on-
chip cache.
2. Data shared by multiple cores is not replicated at the shared cache
level.
29
Multicore Organization
 Advantages of shared L2 cache on the chip:
3. With proper frame replacement algorithms, the amount of shared
cache allocated to each core is dynamic, so that threads that have a less
locality can employ more cache.
4. Interprocessor communication is easy to implement, via shared
memory locations.
5. The use of a shared L2 cache confines the cache coherency
problem to the L1 cache level, which may provide some additional
performance advantage.
30
Multicore Organization
 A advantage of having only dedicated L2 caches on the chip is that
each core enjoys more rapid access to its private L2 cache.
 As both the amount of memory available and the number of cores
grow, the use of a shared L3 cache combined with either a shared
L2 cache or dedicated percore L2 caches provides better performance
than simply a massive shared L2 cache.
31
Multicore Organization
 Another organizational design decision in a multicore system is
whether the individual cores will be superscalar or will implement
simultaneous multithreading (SMT).
 For example, the Intel Core Duo uses superscalar cores, whereas the
Intel Core i7 uses SMT cores.
 SMT scales up the number of hardware level threads that the
multicore system supports.
 As software is developed to fully exploit parallel resources, an SMT
approach appears to be more attractive than a superscalar approach.
32
Intel X86 Multicore Organization
33
Intel X86 Multicore Organization
 Intel has introduced a number of multicore products in recent years.
 Examples:
1. The Intel Core Duo and
2. The Intel Core i7.
34
Intel X86: Intel Core Duo
 The general structure of the Intel
Core Duo is shown in Figure.
 Like a multicore systems, each core
has its own dedicated L1 cache.
 In this case, each core has a 32-KB
instruction cache and a 32-KB
data cache.
35
Intel Core Duo Block Diagram
Advanced Programmable Interrupt Controller
(APIC)
Intel X86: Intel Core Duo
 Each core has an independent thermal control unit. Thermal
management is a fundamental capability, especially for laptop and mobile
systems.
 The Core Duo thermal control unit is designed to manage chip heat
dissipation to maximize processor performance within thermal
constraints.
36
Intel X86: Intel Core Duo
 The thermal management unit monitors digital sensors for high-
accuracy die temperature measurements.
 The maximum temperature for each thermal zone is reported
separately via dedicated registers that can be polled by software.
 If the temperature in a core exceeds a threshold, the thermal control
unit reduces the clock rate for that core to reduce heat generation.
37
Intel X86: Intel Core Duo
 Advanced Programmable Interrupt Controller (APIC) is the second
key element.
 The APIC performs a number of functions, including the following:
1. The APIC can provide interprocessor interrupts, which allow any
process to interrupt any other processor or set of processors.
In the case of the Core Duo, a thread in one core can generate an
interrupt, which is accepted by the local APIC, routed to the APIC of the
other core, and communicated as an interrupt to the other core.
2. The APIC accepts I/O interrupts and routes these to the appropriate
core.
3. Each APIC includes a timer, which can be set by the OS to generate
an interrupt to the local core.
38
Intel X86: Intel Core Duo
 The power management logic is responsible for reducing power
consumption when possible.
 The power management logic monitors thermal conditions and CPU
activity and adjusts voltage levels and power consumption
appropriately.
 It has an advanced power-gating capability that allows for an ultra
fine-grained logic control that turns on individual processor logic
subsystems only if and when they are needed.
39
Intel X86: Intel Core Duo
 The Core Duo chip includes a shared 2-MB L2 cache.
 The cache logic allows for a dynamic allocation of cache space
based on current core needs, so that one core can be assigned up to
100% of the L2 cache.
 The L2 cache includes logic to support the MESI (Interconnect)
protocol for the attached L1 caches.
40
Intel X86: Intel Core Duo
 A cache line gets the M state when a processor writes to it; if the line
is not in E or M-state prior to writing it, the cache sends a Read-For-
Ownership (RFO) request.
 The Intel Core Duo extends this protocol to take into account the
case when there are multiple Core Duo chips organized as a
symmetric multiprocessor (SMP) system.
41
Intel X86: Intel Core Duo
 The L2 cache controller allow the system to distinguish between a
situation in which data are shared by the two local cores, a situation
in which the data are shared by one or more caches on the die as well as
by an agent on the external bus (can be another processor).
 Core issues an RFO, if the line is shared only by the other cache
within the local die.
 The bus interface connects to the external bus, known as the Front
Side Bus, which connects to main memory, I/O controllers, and other
processor chips.
42
Intel X86: Intel Core i7
 The general structure of
the Intel Core i7 is shown
in Figure.
 Each core has its own
dedicated L2 cache and
the four cores share an
8-MB L3 cache.
43
Intel Core i7 Block Diagram
Intel X86: Intel Core i7
 Table shows the cache access latency, in terms of clock cycles for two
Intel multicore systems running at the same clock frequency.
 The Core 2 Quad has a shared L2 cache, similar to the Core Duo.
 The Core i7 improves on L2 cache performance with the use of the
dedicated L2 caches, and provides a relatively high-speed access to the
L3 cache.
44
Intel X86: Intel Core i7
 The Core i7 chip supports two forms of external communications to
other chips.
 The DDR3 memory controller brings the memory controller for the
DDR main memory2 onto the chip.
 The QuickPath Interconnect (QPI) is a cache-coherent, point-to-
point link based electrical interconnect specification for Intel processors
and chipsets.
 It enables high-speed communications among connected processor
chips. The QPI link operates at 6.4 GT/s (transfers per second).
45
ARM11 MPCore
46
ARM11 MPCore
 The ARM11 MPCore is a multicore product based on the ARM11
processor family.
 The ARM11 MPCore can be configured with up to four processors,
each with its own L1 instruction and data caches, per chip.
47
ARM11 MPCore
48
ARM11 MPCore Processor Block Diagram
 Distributed interrupt controller
(DIC)
 Timer
 Watchdog
 CPU interface
 CPU interface
 CPU
 Vector floating-point (VFP) unit
 L1 cache
 Snoop control unit (SCU)
ARM11 MPCore
 The key elements of the system are as follows:
 Distributed interrupt controller (DIC): Handles interrupt detection
and interrupt prioritization. The DIC distributes interrupts to individual
processors.
 Timer: Each CPU has its own private timer that can generate
interrupts.
 Watchdog: Issues warning alerts in the event of software failures. If
the watchdog is enabled, it is set to a predetermined value and counts
down to 0. If the watchdog value reaches zero, an alert is issued.
 CPU interface: Handles interrupt acknowledgement, interrupt
masking, and interrupt completion acknowledgement.
49
ARM11 MPCore
 The key elements of the system are as follows:
 CPU: A single ARM11 processor. Individual CPUs are referred to as
MP11 CPUs.
 Vector floating-point (VFP) unit: A coprocessor that implements
floating point operations in hardware.
 L1 cache: Each CPU has its own dedicated L1 data cache and L1
instruction cache.
 Snoop control unit (SCU): Responsible for maintaining coherency
among L1 data caches.
50
ARM11 MPCore
51
ARM11 MPCore Configurable Options
ARM11 MPCore
 Interrupt Handling:
 The Distributed Interrupt Controller (DIC) collates interrupts from a large
number of sources.
 It provides:
 Masking of interrupts
 Prioritization of the interrupts
 Distribution of the interrupts to the target MP11 CPUs
 Tracking the status of interrupts
 Generation of interrupts by software
52
ARM11 MPCore
 Interrupt Handling:
 The DIC enables the number of interrupts supported in the system to
be independent of the MP11 CPU design.
 The DIC is memory mapped; that is, control registers for the DIC are
defined relative to a main memory base address.
 The DIC is designed to satisfy two functional requirements:
 Provide a means of routing an interrupt request to a single CPU or
multiple CPUs, as required.
 Provide a means of interprocessor communication so that a thread
on one CPU can cause activity by a thread on another CPU.
53
ARM11 MPCore
 Interrupt Handling:
 The DIC can route an interrupt to one or more CPUs in the following
three ways:
 An interrupt can be directed to a specific processor only.
 An interrupt can be directed to a defined group of processors.
 An interrupt can be directed to all processors.
54
ARM11 MPCore
 Interrupt Handling:
 From the point of view of an MP11 CPU, an interrupt can be
 Inactive: An Inactive interrupt is one that is nonasserted.
 Pending: A Pending interrupt is one that has been asserted, and for
which processing has not started on that CPU.
 Active: An Active interrupt is one that has been started on that CPU, but
processing is not complete.
55
ARM11 MPCore
 Interrupt Handling:
 Interrupts come from the following sources:
 Interprocessor interrupts (IPIs): Each CPU has private interrupts,
(ID0-ID15), triggered by software. The priority of an IPI depends on the
receiving CPU, not the sending CPU.
 Private timer and/or watchdog interrupts: These use interrupt IDs 29
and 30.
 Legacy FIQ line: In legacy IRQ mode, the legacy FIQ pin, bypasses the
Interrupt Distributor logic and directly drives interrupt requests into the
CPU.
 Hardware interrupts: Hardware interrupts are triggered by
programmable events on associated interrupt input lines. Hardware
interrupts start at ID32.
56
ARM11 MPCore
57Block diagram of the DIC
ARM11 MPCore
 The DIC is configurable to support between 0 and 255 hardware
interrupt inputs.
 The DIC maintains a list of interrupts, showing their priority and
status.
 The Interrupt Distributor transmits to each CPU Interface the
highest Pending interrupt for that interface. It receives back the
interrupt acknowledgement, and can then change the status of the
corresponding interrupt.
 The CPU Interface also transmits End of Interrupt Information (EOI),
which enables the Interrupt Distributor to update the status of this
interrupt from Active to Inactive.
58
ARM11 MPCore
 Cache Coherency:
 The MPCore’s Snoop Control Unit (SCU) is designed to resolve most
of the traditional bottlenecks related to access to shared data.
 The SCU introduces three types of optimization:
 Direct data intervention,
 Duplicated tag RAMs, and
 Migratory lines.
59
ARM11 MPCore
 Cache Coherency:
 Direct data intervention (DDI): Enables copying clean data from one
CPU L1 data cache to another CPU L1 data cache without accessing
external memory.
 This reduces read after read activity from the Level 1 cache to the
Level 2 cache.
 Thus, a local L1 cache miss is resolved in a remote L1 cache rather
than from access to the shared L2 cache.
60
ARM11 MPCore
 Cache Coherency:
 Main memory location of each line within a cache is identified by a tag
for that line.
 The tags can be implemented as a separate block of RAM of the same
length as the number of lines in the cache.
 In the SCU, duplicated tag RAMs, are duplicated versions of L1 tag
RAMs.
 Coherency commands are sent only to CPUs that must update their
coherent data cache.
 This reduces the power consumption.
 If the tag data is available locally, SCU limits cache manipulations to
processors that have cache lines in common.
61
62
This presentation is published only for educational purpose
shindesir.pvp@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memoryMazin Alwaaly
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architecturesAmit Kumar Rathi
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipeliningMazin Alwaaly
 
Instruction Set Architecture
Instruction  Set ArchitectureInstruction  Set Architecture
Instruction Set ArchitectureHaris456
 
Hyper threading technology
Hyper threading technologyHyper threading technology
Hyper threading technologyNikhil Venugopal
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantagesNitesh Tudu
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelismprashantdahake
 
Parallel and distributed Computing
Parallel and distributed Computing Parallel and distributed Computing
Parallel and distributed Computing MIANSHOAIB10
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memoryashishgy
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platformsVajira Thambawita
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory MultiprocessorsSalvatore La Bua
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...vtunotesbysree
 

Was ist angesagt? (20)

Multi processing
Multi processingMulti processing
Multi processing
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
 
Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architectures
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
 
Array Processor
Array ProcessorArray Processor
Array Processor
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
Instruction Set Architecture
Instruction  Set ArchitectureInstruction  Set Architecture
Instruction Set Architecture
 
Hyper threading technology
Hyper threading technologyHyper threading technology
Hyper threading technology
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
 
cache memory
cache memorycache memory
cache memory
 
Parallel and distributed Computing
Parallel and distributed Computing Parallel and distributed Computing
Parallel and distributed Computing
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memory
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
 

Ähnlich wie Multicore Computers

Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsateeq ateeq
 
Slot29-CH18-MultiCoreComputers-18-slides (1).pptx
Slot29-CH18-MultiCoreComputers-18-slides (1).pptxSlot29-CH18-MultiCoreComputers-18-slides (1).pptx
Slot29-CH18-MultiCoreComputers-18-slides (1).pptxvun24122002
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processorsHebeon1
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsJames McGalliard
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiAnkit Raj
 
Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH veena babu
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Area Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorArea Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorIOSR Journals
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009Léia de Sousa
 

Ähnlich wie Multicore Computers (20)

Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
 
Slot29-CH18-MultiCoreComputers-18-slides (1).pptx
Slot29-CH18-MultiCoreComputers-18-slides (1).pptxSlot29-CH18-MultiCoreComputers-18-slides (1).pptx
Slot29-CH18-MultiCoreComputers-18-slides (1).pptx
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
IEEExeonmem
IEEExeonmemIEEExeonmem
IEEExeonmem
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific Applications
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
 
Multi processor
Multi processorMulti processor
Multi processor
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
Ef35745749
Ef35745749Ef35745749
Ef35745749
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Area Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorArea Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips Processor
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009
 

Mehr von A B Shinde

Communication System Basics
Communication System BasicsCommunication System Basics
Communication System BasicsA B Shinde
 
MOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC AmplifierMOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC AmplifierA B Shinde
 
Color Image Processing: Basics
Color Image Processing: BasicsColor Image Processing: Basics
Color Image Processing: BasicsA B Shinde
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and SegmentationA B Shinde
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filtersA B Shinde
 
Image Enhancement in Spatial Domain
Image Enhancement in Spatial DomainImage Enhancement in Spatial Domain
Image Enhancement in Spatial DomainA B Shinde
 
Digital Image Fundamentals
Digital Image FundamentalsDigital Image Fundamentals
Digital Image FundamentalsA B Shinde
 
Resume Writing
Resume WritingResume Writing
Resume WritingA B Shinde
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing BasicsA B Shinde
 
Blooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering EducationBlooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering EducationA B Shinde
 
ISE 7.1i Software
ISE 7.1i SoftwareISE 7.1i Software
ISE 7.1i SoftwareA B Shinde
 
VHDL Coding Syntax
VHDL Coding SyntaxVHDL Coding Syntax
VHDL Coding SyntaxA B Shinde
 
VLSI Testing Techniques
VLSI Testing TechniquesVLSI Testing Techniques
VLSI Testing TechniquesA B Shinde
 
Selecting Engineering Project
Selecting Engineering ProjectSelecting Engineering Project
Selecting Engineering ProjectA B Shinde
 
Interview Techniques
Interview TechniquesInterview Techniques
Interview TechniquesA B Shinde
 
Semiconductors
SemiconductorsSemiconductors
SemiconductorsA B Shinde
 
Diode Applications & Transistor Basics
Diode Applications & Transistor BasicsDiode Applications & Transistor Basics
Diode Applications & Transistor BasicsA B Shinde
 

Mehr von A B Shinde (20)

Communication System Basics
Communication System BasicsCommunication System Basics
Communication System Basics
 
MOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC AmplifierMOSFETs: Single Stage IC Amplifier
MOSFETs: Single Stage IC Amplifier
 
MOSFETs
MOSFETsMOSFETs
MOSFETs
 
Color Image Processing: Basics
Color Image Processing: BasicsColor Image Processing: Basics
Color Image Processing: Basics
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filters
 
Image Enhancement in Spatial Domain
Image Enhancement in Spatial DomainImage Enhancement in Spatial Domain
Image Enhancement in Spatial Domain
 
Resume Format
Resume FormatResume Format
Resume Format
 
Digital Image Fundamentals
Digital Image FundamentalsDigital Image Fundamentals
Digital Image Fundamentals
 
Resume Writing
Resume WritingResume Writing
Resume Writing
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
 
Blooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering EducationBlooms Taxonomy in Engineering Education
Blooms Taxonomy in Engineering Education
 
ISE 7.1i Software
ISE 7.1i SoftwareISE 7.1i Software
ISE 7.1i Software
 
VHDL Coding Syntax
VHDL Coding SyntaxVHDL Coding Syntax
VHDL Coding Syntax
 
VHDL Programs
VHDL ProgramsVHDL Programs
VHDL Programs
 
VLSI Testing Techniques
VLSI Testing TechniquesVLSI Testing Techniques
VLSI Testing Techniques
 
Selecting Engineering Project
Selecting Engineering ProjectSelecting Engineering Project
Selecting Engineering Project
 
Interview Techniques
Interview TechniquesInterview Techniques
Interview Techniques
 
Semiconductors
SemiconductorsSemiconductors
Semiconductors
 
Diode Applications & Transistor Basics
Diode Applications & Transistor BasicsDiode Applications & Transistor Basics
Diode Applications & Transistor Basics
 

Kürzlich hochgeladen

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 

Kürzlich hochgeladen (20)

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 

Multicore Computers

  • 1. Multicore Computers Mr. A. B. Shinde Electronics Engineering
  • 2. Contents…  Hardware performance issues,  Software performance issues,  Multicore organization,  Intel x 86 multicore organizations,  ARM11 MPC core 699 2
  • 3. Introduction  A multicore computer, combines two or more processors on a single computer chip.  The use of more complex single-processor chips has reached a limit due to hardware performance issues, (limits in instruction-level parallelism & power limitations).  The multicore architecture poses challenges to software developers to exploit the capability for multithreading across multiple cores.  The main variables in a multicore organization are the number of processors on the chip, the number of levels of cache memory, and the extent to which cache memory is shared. 3
  • 4. Introduction  A multicore computer, also known as a chip multiprocessor, combines two or more processors (called cores) on a single piece of silicon (called a die).  Each core consists of all of the components of an independent processor, such as registers, ALU, pipeline hardware, and control unit, plus L1 instruction and data caches.  In addition to the multiple cores, contemporary multicore chips also include L2 cache and, in some cases, L3 cache. 4
  • 6. Hardware Performance Issues  Microprocessor systems have experienced a steady, exponential increase in execution performance for decades.  This increase is due to refinements in the organization of the processor on the chip, and the increase in the clock frequency. 6 Intel Microprocessor Performance
  • 7. Hardware Performance Issues  Increase in Parallelism:  The organizational changes in processor design have primarily been focused on increasing instruction-level parallelism, so that more work could be done in each clock cycle. 7 Superscalar
  • 8. Hardware Performance Issues  Increase in Parallelism:  The organizational changes in processor design have primarily been focused on increasing instruction-level parallelism, so that more work could be done in each clock cycle. 8 Simultaneous multithreading
  • 9. Hardware Performance Issues  Increase in Parallelism:  The organizational changes in processor design have primarily been focused on increasing instruction-level parallelism, so that more work could be done in each clock cycle. 9 Multicore
  • 10. Hardware Performance Issues  Increase in Parallelism:  Pipelining: Individual instructions are executed through a pipeline of stages so that while one instruction is executing in one stage of the pipeline, another instruction is executing in another stage of the pipeline.  Superscalar: Multiple pipelines are constructed by replicating execution resources. This enables parallel execution of instructions in parallel pipelines, so long as hazards are avoided.  Simultaneous multithreading (SMT): Register banks are replicated so that multiple threads can share the use of pipeline resources. 10
  • 11. Hardware Performance Issues  Increase in Parallelism:  In the case of pipelining, simple three-stage pipelines were replaced by pipelines with five stages, and then many more stages, with some implementations having over a dozen stages.  There is a practical limit to how far this trend can be taken, because with more stages, there is the need for more logic (hardware), more interconnections and more control signals. 11
  • 12. Hardware Performance Issues  Increase in Parallelism:  With superscalar organization, performance increase can be achieved by increasing the number of parallel pipelines.  There are limitations, as the number of pipelines increases. More logic is required to manage hazards and to stage instruction resources.  A single thread of execution reaches the point where hazards and resource dependencies prevent the full use of the multiple pipelines available.  As the complexity of managing multiple threads over a set of pipelines limits the number of threads and number of pipelines that can be effectively utilized. 12
  • 13. Hardware Performance Issues  Increase in Parallelism:  The increase in complexity to deal with all of the logical issues related to very long pipelines, multiple superscalar pipelines, and multiple SMT register banks means that increasing amounts of the chip area is occupied with coordinating and signal transfer logic.  This increases the difficulty of designing, fabricating, and debugging the chips.  Power issues are another big challenge 13
  • 14. Hardware Performance Issues  Power Consumption:  To maintain the higher performance, the number of transistors per chip rise and high clock frequencies. Unfortunately, power requirements have grown exponentially as chip density and clock frequency have risen.  One way to control power density is to use more of the chip area for cache memory.  Memory transistors are smaller and have a power density an order of magnitude lower than that of logic. 14
  • 15. Hardware Performance Issues  Power Consumption: 15 Figure shows where the power consumption trend is leading. Assuming about 50–60% of the chip area is devoted to memory, the chip will support cache memory of about 100 MB and leave over 1 billion transistors available for logic.
  • 16. Hardware Performance Issues  Power Consumption:  The recent decades the Pollack’s rule was observed, which states that performance increase is roughly proportional to square root of increase in complexity.  If you double the logic in a processor core, then it delivers only 40% more performance.  The use of multiple cores has the potential to provide near-linear performance improvement with the increase in the number of cores. 16
  • 17. Hardware Performance Issues  Power Consumption:  Power considerations provide another motive for moving toward a multicore organization.  The chip has such a huge amount of cache memory, it becomes unlikely that any one thread of execution can effectively use all that memory.  In SMT, a number of relatively independent threads or processes has a greater opportunity to take full advantage of the cache memory. 17
  • 19. Software Performance Issues  A detailed examination of the software performance issues related to multicore organization is huge task. 19
  • 20. Software Performance Issues  Software on Multicore:  The performance benefits of a multicore organization depend on the ability to effectively exploit the parallel resources.  Consider, single application running on a multicore system. 20 The law assumes a program in which a fraction (1-f) of the execution time involves code serial and a fraction f that involves code is infinitely parallelizable with no scheduling overhead.
  • 21. Software Performance Issues  A number of classes of applications benefit directly from the ability to scale throughput with the number of cores.  Multithreaded native applications: Multithreaded applications are characterized by having a small number of highly threaded processes.  Examples of threaded applications include Lotus Domino or Siebel CRM (Customer Relationship Manager).  Multiprocess applications: Multiprocess applications are characterized by the presence of many single-threaded processes.  Examples of multi-process applications include the Oracle database, SAP, and PeopleSoft. 21
  • 22. Software Performance Issues  Java applications:  Java language greatly facilitate multithreaded applications.  Java Virtual Machine is a multithreaded process that provides scheduling and memory management for Java applications.  Java applications that can benefit directly from multicore resources include application servers such as Sun’s Java Application Server, BEA’s Weblogic, IBM’s Websphere etc.  All applications that use a Java 2 Platform, Enterprise Edition (J2EE platform) application server can immediately benefit from multicore technology. 22
  • 23. Software Performance Issues  Multiinstance applications:  Even if an individual application does not scale to take advantage of a large number of threads, it is still possible to gain advantage from multicore architecture by running multiple instances of the application in parallel.  If multiple application instances require some degree of isolation, then virtualization technology can be used to provide each of them with its own separate and secure environment. 23
  • 24. Multicore Organization  The main variables in a multicore organization are as follows:  The number of core processors on the chip  The number of levels of cache memory  The amount of cache memory that is shared 24
  • 25. Multicore Organization 25 Dedicated L1 cache Dedicated L2 cache Shared L2 cache Shared L3 cache General Organizations of Multicore systems
  • 26. Multicore Organization  Figure shows an organization found in some of the earlier multicore computer chips and is still seen in embedded chips.  In this organization, the only on-chip cache is L1 cache, each core having its own dedicated L1 cache.  Almost invariably, the L1 cache is divided into instruction and data caches.  An example of this organization is the ARM11 MPCore. 26 Dedicated L1 cache
  • 27. Multicore Organization  The organization is also one in which there is no on-chip cache sharing.  In this, there is enough area available on the chip to allow for L2 cache.  An example of this organization is the AMD Opteron. 27 Dedicated L2 cache
  • 28. Multicore Organization  Figure shows allocation of chip space to memory, but with the use of a shared L2 cache.  The Intel Core Duo has this organization.  The amount of cache memory available on the chip continues to grow, performance considerations dictate splitting off a separate, shared L3 cache, with dedicated L1 and L2 caches for each core processor.  The Intel Core i7 is an example of this organization. 28 Shared L2 cache Shared L3 cache
  • 29. Multicore Organization  Advantages of shared L2/L3 cache on the chip: 1. Constructive interference can reduce overall miss rates. If a thread on one core accesses a main memory location, this brings the frame containing the referenced location into the shared cache. If a thread on another core soon thereafter accesses the same memory block, the memory locations will already be available in the shared on- chip cache. 2. Data shared by multiple cores is not replicated at the shared cache level. 29
  • 30. Multicore Organization  Advantages of shared L2 cache on the chip: 3. With proper frame replacement algorithms, the amount of shared cache allocated to each core is dynamic, so that threads that have a less locality can employ more cache. 4. Interprocessor communication is easy to implement, via shared memory locations. 5. The use of a shared L2 cache confines the cache coherency problem to the L1 cache level, which may provide some additional performance advantage. 30
  • 31. Multicore Organization  A advantage of having only dedicated L2 caches on the chip is that each core enjoys more rapid access to its private L2 cache.  As both the amount of memory available and the number of cores grow, the use of a shared L3 cache combined with either a shared L2 cache or dedicated percore L2 caches provides better performance than simply a massive shared L2 cache. 31
  • 32. Multicore Organization  Another organizational design decision in a multicore system is whether the individual cores will be superscalar or will implement simultaneous multithreading (SMT).  For example, the Intel Core Duo uses superscalar cores, whereas the Intel Core i7 uses SMT cores.  SMT scales up the number of hardware level threads that the multicore system supports.  As software is developed to fully exploit parallel resources, an SMT approach appears to be more attractive than a superscalar approach. 32
  • 33. Intel X86 Multicore Organization 33
  • 34. Intel X86 Multicore Organization  Intel has introduced a number of multicore products in recent years.  Examples: 1. The Intel Core Duo and 2. The Intel Core i7. 34
  • 35. Intel X86: Intel Core Duo  The general structure of the Intel Core Duo is shown in Figure.  Like a multicore systems, each core has its own dedicated L1 cache.  In this case, each core has a 32-KB instruction cache and a 32-KB data cache. 35 Intel Core Duo Block Diagram Advanced Programmable Interrupt Controller (APIC)
  • 36. Intel X86: Intel Core Duo  Each core has an independent thermal control unit. Thermal management is a fundamental capability, especially for laptop and mobile systems.  The Core Duo thermal control unit is designed to manage chip heat dissipation to maximize processor performance within thermal constraints. 36
  • 37. Intel X86: Intel Core Duo  The thermal management unit monitors digital sensors for high- accuracy die temperature measurements.  The maximum temperature for each thermal zone is reported separately via dedicated registers that can be polled by software.  If the temperature in a core exceeds a threshold, the thermal control unit reduces the clock rate for that core to reduce heat generation. 37
  • 38. Intel X86: Intel Core Duo  Advanced Programmable Interrupt Controller (APIC) is the second key element.  The APIC performs a number of functions, including the following: 1. The APIC can provide interprocessor interrupts, which allow any process to interrupt any other processor or set of processors. In the case of the Core Duo, a thread in one core can generate an interrupt, which is accepted by the local APIC, routed to the APIC of the other core, and communicated as an interrupt to the other core. 2. The APIC accepts I/O interrupts and routes these to the appropriate core. 3. Each APIC includes a timer, which can be set by the OS to generate an interrupt to the local core. 38
  • 39. Intel X86: Intel Core Duo  The power management logic is responsible for reducing power consumption when possible.  The power management logic monitors thermal conditions and CPU activity and adjusts voltage levels and power consumption appropriately.  It has an advanced power-gating capability that allows for an ultra fine-grained logic control that turns on individual processor logic subsystems only if and when they are needed. 39
  • 40. Intel X86: Intel Core Duo  The Core Duo chip includes a shared 2-MB L2 cache.  The cache logic allows for a dynamic allocation of cache space based on current core needs, so that one core can be assigned up to 100% of the L2 cache.  The L2 cache includes logic to support the MESI (Interconnect) protocol for the attached L1 caches. 40
  • 41. Intel X86: Intel Core Duo  A cache line gets the M state when a processor writes to it; if the line is not in E or M-state prior to writing it, the cache sends a Read-For- Ownership (RFO) request.  The Intel Core Duo extends this protocol to take into account the case when there are multiple Core Duo chips organized as a symmetric multiprocessor (SMP) system. 41
  • 42. Intel X86: Intel Core Duo  The L2 cache controller allow the system to distinguish between a situation in which data are shared by the two local cores, a situation in which the data are shared by one or more caches on the die as well as by an agent on the external bus (can be another processor).  Core issues an RFO, if the line is shared only by the other cache within the local die.  The bus interface connects to the external bus, known as the Front Side Bus, which connects to main memory, I/O controllers, and other processor chips. 42
  • 43. Intel X86: Intel Core i7  The general structure of the Intel Core i7 is shown in Figure.  Each core has its own dedicated L2 cache and the four cores share an 8-MB L3 cache. 43 Intel Core i7 Block Diagram
  • 44. Intel X86: Intel Core i7  Table shows the cache access latency, in terms of clock cycles for two Intel multicore systems running at the same clock frequency.  The Core 2 Quad has a shared L2 cache, similar to the Core Duo.  The Core i7 improves on L2 cache performance with the use of the dedicated L2 caches, and provides a relatively high-speed access to the L3 cache. 44
  • 45. Intel X86: Intel Core i7  The Core i7 chip supports two forms of external communications to other chips.  The DDR3 memory controller brings the memory controller for the DDR main memory2 onto the chip.  The QuickPath Interconnect (QPI) is a cache-coherent, point-to- point link based electrical interconnect specification for Intel processors and chipsets.  It enables high-speed communications among connected processor chips. The QPI link operates at 6.4 GT/s (transfers per second). 45
  • 47. ARM11 MPCore  The ARM11 MPCore is a multicore product based on the ARM11 processor family.  The ARM11 MPCore can be configured with up to four processors, each with its own L1 instruction and data caches, per chip. 47
  • 48. ARM11 MPCore 48 ARM11 MPCore Processor Block Diagram  Distributed interrupt controller (DIC)  Timer  Watchdog  CPU interface  CPU interface  CPU  Vector floating-point (VFP) unit  L1 cache  Snoop control unit (SCU)
  • 49. ARM11 MPCore  The key elements of the system are as follows:  Distributed interrupt controller (DIC): Handles interrupt detection and interrupt prioritization. The DIC distributes interrupts to individual processors.  Timer: Each CPU has its own private timer that can generate interrupts.  Watchdog: Issues warning alerts in the event of software failures. If the watchdog is enabled, it is set to a predetermined value and counts down to 0. If the watchdog value reaches zero, an alert is issued.  CPU interface: Handles interrupt acknowledgement, interrupt masking, and interrupt completion acknowledgement. 49
  • 50. ARM11 MPCore  The key elements of the system are as follows:  CPU: A single ARM11 processor. Individual CPUs are referred to as MP11 CPUs.  Vector floating-point (VFP) unit: A coprocessor that implements floating point operations in hardware.  L1 cache: Each CPU has its own dedicated L1 data cache and L1 instruction cache.  Snoop control unit (SCU): Responsible for maintaining coherency among L1 data caches. 50
  • 51. ARM11 MPCore 51 ARM11 MPCore Configurable Options
  • 52. ARM11 MPCore  Interrupt Handling:  The Distributed Interrupt Controller (DIC) collates interrupts from a large number of sources.  It provides:  Masking of interrupts  Prioritization of the interrupts  Distribution of the interrupts to the target MP11 CPUs  Tracking the status of interrupts  Generation of interrupts by software 52
  • 53. ARM11 MPCore  Interrupt Handling:  The DIC enables the number of interrupts supported in the system to be independent of the MP11 CPU design.  The DIC is memory mapped; that is, control registers for the DIC are defined relative to a main memory base address.  The DIC is designed to satisfy two functional requirements:  Provide a means of routing an interrupt request to a single CPU or multiple CPUs, as required.  Provide a means of interprocessor communication so that a thread on one CPU can cause activity by a thread on another CPU. 53
  • 54. ARM11 MPCore  Interrupt Handling:  The DIC can route an interrupt to one or more CPUs in the following three ways:  An interrupt can be directed to a specific processor only.  An interrupt can be directed to a defined group of processors.  An interrupt can be directed to all processors. 54
  • 55. ARM11 MPCore  Interrupt Handling:  From the point of view of an MP11 CPU, an interrupt can be  Inactive: An Inactive interrupt is one that is nonasserted.  Pending: A Pending interrupt is one that has been asserted, and for which processing has not started on that CPU.  Active: An Active interrupt is one that has been started on that CPU, but processing is not complete. 55
  • 56. ARM11 MPCore  Interrupt Handling:  Interrupts come from the following sources:  Interprocessor interrupts (IPIs): Each CPU has private interrupts, (ID0-ID15), triggered by software. The priority of an IPI depends on the receiving CPU, not the sending CPU.  Private timer and/or watchdog interrupts: These use interrupt IDs 29 and 30.  Legacy FIQ line: In legacy IRQ mode, the legacy FIQ pin, bypasses the Interrupt Distributor logic and directly drives interrupt requests into the CPU.  Hardware interrupts: Hardware interrupts are triggered by programmable events on associated interrupt input lines. Hardware interrupts start at ID32. 56
  • 58. ARM11 MPCore  The DIC is configurable to support between 0 and 255 hardware interrupt inputs.  The DIC maintains a list of interrupts, showing their priority and status.  The Interrupt Distributor transmits to each CPU Interface the highest Pending interrupt for that interface. It receives back the interrupt acknowledgement, and can then change the status of the corresponding interrupt.  The CPU Interface also transmits End of Interrupt Information (EOI), which enables the Interrupt Distributor to update the status of this interrupt from Active to Inactive. 58
  • 59. ARM11 MPCore  Cache Coherency:  The MPCore’s Snoop Control Unit (SCU) is designed to resolve most of the traditional bottlenecks related to access to shared data.  The SCU introduces three types of optimization:  Direct data intervention,  Duplicated tag RAMs, and  Migratory lines. 59
  • 60. ARM11 MPCore  Cache Coherency:  Direct data intervention (DDI): Enables copying clean data from one CPU L1 data cache to another CPU L1 data cache without accessing external memory.  This reduces read after read activity from the Level 1 cache to the Level 2 cache.  Thus, a local L1 cache miss is resolved in a remote L1 cache rather than from access to the shared L2 cache. 60
  • 61. ARM11 MPCore  Cache Coherency:  Main memory location of each line within a cache is identified by a tag for that line.  The tags can be implemented as a separate block of RAM of the same length as the number of lines in the cache.  In the SCU, duplicated tag RAMs, are duplicated versions of L1 tag RAMs.  Coherency commands are sent only to CPUs that must update their coherent data cache.  This reduces the power consumption.  If the tag data is available locally, SCU limits cache manipulations to processors that have cache lines in common. 61
  • 62. 62 This presentation is published only for educational purpose shindesir.pvp@gmail.com