SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
www.studentyogi.com www.studentyogi.com


               Seminar report
                     on




                       co om
    ITANIUM:The 64-
    ITANIUM:The 64-bit microprocessor from




                          m
                    intel




                    gi. .c
                  oogi
               ntyy
             eent
        t t dd
     ssuu
   w. .
   w
ww
ww




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com




                                ITANIUM




                                                        om
                                                     i.c
History:




                                     og
In 1994, Hewlett-Packard and Intel Corporation agreed to jointly design EPIC
(Explicitly Parallel Instruction Computing), a post-RISC and post-IA-32
technology. Using EPIC concepts, HP and Intel® then jointly defined
                                  nty
Itanium’s 64-bit Itanium Processor Family (IPF) architecture, the basis of
Intel’s future, high-performance microprocessor family a broad range of
technical and commercial applications at 1.0GHz. The starting place for the
                           de

team was a comprehensive understanding of the capabilities of the .18um bulk
technology transistors and interconnects with a view toward exploiting these
capabilities to the fullest. The Itanium processor began shipping in end-user
                 stu



pilot systems in late 2000. Intel intends to follow Itanium with additional
processors in the Itanium family: McKinley, Madison and Deerfield in late
2002.
         w.
ww




Need For Itanium:
Internet commerce and large database applications are dealing with ever-
increasing quantities of data, and demands placed on both server and
workstation resources are increasing correspondingly. One demand is for more
memory than the 4 GB provided by today’s 32-bit computer architectures.
Itanium’s ability to address a flat 64-bit memory address space in the millions

www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


of gigabytes has been the focus of attention. Beyond very large memory
(VLM) support, however, other traits, including a new Explicitly Parallel
Instruction Computing (EPIC) design philosophy that will handle parallel




                                                        om
processing differently than previous architectures, speculation, predication,
large register files, a register stack and advanced branch architecture. IA-64
also provides an enhanced system architecture supporting fast interrupt




                                                     i.c
response and a flexible, large virtual address mode. The 64-bit addressing
enabled by the Intel Itanium architecture will help overcome the scalability




                                     og
barriers and awkward, maintenance-intensive partitioning directory schemes of
current directory services on 32-bit platforms. , Intel has been assiduous in
providing backward compatibility with 32-bit binaries (IA-32), from the x86
                                  nty
families.   The Itanium has a complex, bleeding edge, forward looking
processor family that holds promise for huge gains in processing power.
                           de

The How’s & Wows of Itanium:
                 stu



Itanium defined a new architecture that provides a unique combination of
innovative features, which overcomes the performance limitations of traditional
            w.




architectures. The IA-64 architecture is based on innovative techniques/features
such as Explicit parallelism, Parallelism, and Predication and Speculation,
ww




resulting in superior Instruction Level Parallelism (ILP) and increased
instructions per cycle (IPC) to address the current and future requirements of
these demanding Internet, high end server, and workstation applications. In
addition, the IA-64 architecture provides headroom and scalability for
continued future growth.




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com



We take a look at all the features that make it possible, that are the bricks and
mortar of the Itanium; and then we shall see the architecture that emerges.




                                                         om
                                                      i.c
    EPIC:




                                      og
EPIC stands for Explicitly Parallel Instruction Computing–a new design
philosophy going beyond the RISC and CISC processors that are available
                                   nty
today. EPIC technology enables greater instruction level parallelism than
previous processor architectures, supporting higher levels of performance in
targeted application segments. The Itanium architecture is based on EPIC
technology. EPIC is based on a unique combination of innovative features such
                           de

as predication, speculation and explicit parallelism enabling world-class
performance for the high-end enterprise class of computing
                  stu



The EPIC architecture uses complex instruction wording that, in addition to the
basic instruction, contains information on how to run the instruction in parallel.
EPIC instructions are put together by the compiler into a threesome called a
          w.




"bundle." Bundles instructions are sent to the CPU together other instructions.
The "bundles," or their parts, are put together in an "instruction group" with
ww




other instructions. The IA-64 utilizes 128 bit bundles, organized as a template
plus three IA-64 instructions. The template contains information provided by
the compiler to the processor. The template contains the following information:

       1. Which instructions can be executed in parallel, and which
          instructions must be executed serially.


www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com
       2. Information relating to parallelism in respect to it's neighboring
          bundles.




                                                         om
       3. Maps instruction slots to execution types




                                                      i.c
Instructions in a bundle do not affect each other with the data they are working
on, so they can run together without getting in each other's way. There is no
limit to the size of an instruction group, and an instruction group can begin or




                                      og
end in the middle of a bundle. The instructions are actually bundled and
grouped together when software is compiled. This simplifies the process of
running multiple instructions at once on an Itanium CPU, allowing it to make
                                   nty
greater use of multiple execution units without having to rely on complex on-
die logic to determine what operations can run in parallel. The Itanium will still
use on-die logic to improve upon instruction level parallelism, but EPIC
                           de

instructions, at the minimum, provide a parallel blueprint for the Itanium
processor. Of course, for this reason, compiler technology and programming
                 stu


algorithms will have a massive impact on Itanium performance. The compiler
adds branch hints, register stack and rotation, data and control speculation, and
memory hints into EPIC instructions. Thus, EPIC makes processing faster,
         w.




much faster.
ww




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com




                                                      om
                                                   i.c
                                    og
                                 nty
                          de

EPIC Pipelining: The parallel execution core of EPIC can have upto 10
                 stu


pipelined stages. The first-generation Itanium processor is able to issue up to
six EPIC instructions in parallel every clock cycle.     The next generation
Itanium might yield 20 instructions per cycle, though not consistently, but the
         w.




potential is there and proper coding and compiling should yield efficient usage
of the CPU.
ww




    Predication:

Predication is a compiling technique used in the Itanium that optimizes or
removes branching code by working it so that much of the code runs in


www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


parallel. By designing software and compilers to rework branches into parallel
code with fewer or no branches, and by running this code on a "wide"
processor that can process this code in parallel, which the Itanium is intended




                                                         om
to be able to do, the number of cycles it takes to complete a task drops.

What predication does is minimize the time it takes to run if-then-else situation
and uses processor width to run both the 'then' and 'else' in parallel. When the




                                                      i.c
'if' branch is determined, the incorrect branch's result is discarded. By removing
branches and making code more parallel, predication reduces the number of




                                      og
cycles it takes to complete a task while making better use of a wide processor.
Also, there are less branch mispredicts. Branch mispredicts necessitate that the
pipeline be flushed, a very cycle expensive procedure, so by lessening the
                                   nty
number of branches, predication can greatly reduce wasted processor time.
                           de

    Speculation:

The latency of memory is a big performance bottleneck in today's systems. IA-
                 stu



64 architecture employs a technique called as speculation to initiate loads from
memory earlier in the instruction stream—even before a branch. Data
speculation thus, is caching and calling for data that may be needed or may be
         w.




changed before it is needed, so that, in the case that the data is needed and it
has not changed, the CPU does not have to take a latency impact from calling
ww




for the data. The processor, with the help of compiled instructions, looks ahead,
anticipates what information it may need, and then brings it to cache or into the
processor. This helps hide memory latency. Thus speculation increases
instruction level parallelism and reduces the impact of memory latency
resulting in handsome performance gains.



www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


Control speculation is a feature that runs deep within IA-64. Data while being
loaded might be erroneous. So, here data values are associated with a bit which
tells whether an error was generated when the data was loaded or not. This




                                                          om
error ("Nat") bit follows the data through arithmetical operations, moves, and
conditionals until the data is checked through a check instruction. This means
that with IA-64, the compiler can aggressively load data well ahead of time




                                                       i.c
(speculation) with out paying unnecessary exception penalties. A final benefit
of "Nats" is that it provides for structured error handling as is found in C and
C++. Because the compiler can schedule the "check" instruction whenever it




                                       og
wants, it is possible to safely defer errors until the best possible time to recover
from them. The architectural support for structured error handling in IA-64 is
                                    nty
an important feature promoting system reliability and performance




    Cache:
                            de

       When a processor is waiting for data or instructions, time is wasting.
                  stu


The longer it takes for data and instructions to get to the CPU, the worse it gets.
When data and instructions are in cache, the processor can grab them much
quicker than when having to go to slow main memory. Not only is cache
          w.




latency much lower than DRAM latency, the bandwidth is much higher.

There are some trick programming techniques in use out there to keep often-
ww




used data and instructions in cache and they are not the kind of techniques you
learn in your high school BASIC course. Still, the easiest way to keep data and
instructions in cache is to have a lot of cache to keep them in. Intel knew that
when they designed the Itanium.




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


The Itanium has three levels of cache. L1 and L2 are on-die while L3 is on
cartridge. According to Intel, the L3 cache weighs in at 2MB or 4MB of four-
way set associative cache on two or four 1MB chips. IDC reports that the L2




                                                         om
cache size is 96k in size, and the L1 cache, which does not deal with floating
point data, has a 16KB integer data and a 16KB instruction cache.

The 294.8 million transistors of (4MB) level three cache runs at the full




                                                      i.c
processor speed, giving 12.8GBps of memory bandwidth at 800MHz. With
2MB or 4MB of L3 cache on the Itanium, the chances of the required data and




                                      og
instructions being in cache are quite good, bus traffic can be reduced, and
performance increases. With six pipelines hungry for instructions and data, the
Itanium needs all the cache it can get. Caching is made effective through data
                                   nty
speculation and cache hints. Data speculation has been explained above. Cache
hints are two-bit markers for memory loads set by the compiler that help the
CPU find data in cache. This improves the speed of retrieving data from cache.
                           de
                  stu


    The Overall Architecture:


       The IA-64 architecture is based on innovative techniques/features such
         w.




as Explicit parallelism, and Predication and Speculation, resulting in superior
Instruction Level Parallelism (ILP) and increased instructions per cycle (IPC)
ww




to address the current and future requirements of these demanding Internet,
high end server, and workstation applications. In addition, the IA-64
architecture provides headroom and scalability for continued future growth.
The Itanium was not designed for small systems, it is intended for 1 to 4000
processor workstations and servers. The first-generation Itanium processor is
able to issue up to six EPIC instructions in parallel every clock cycle.


www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com



The six issue (two bundle) scheduler disperses instructions into nine functional




                                                         om
slots, two integer slots, two floating-point slots, two memory slots, and three
branch slots, giving a total of nine dispersal slots. Proper compiler design
should be able to handle most situations without overloading any type of
functional slot. In addition to pure clock speed boosts, futures Itanium CPUs




                                                      i.c
are sure to have more functional units (FMACs, ALUs).

The Itanium contains four pipelined FMAC units (Floating-point Multiply Add



                                      og
Calculator). The primary two are each capable of processing two single-
precision, two double precision, or two double-extended-precision floating-
                                   nty
point operations per clock. That yields up to 3.2GFLOPS of highly precise
floating point processing. There are an additional two FMACs tuned for 3D
applications. They are each capable of processing up to two single-precision
floating-point operations per clock. That yields another 3.2GFLOPS of single-
                           de

precision processing power. All together, the Itanium has a theoretical max of
6.4GLOPS of single-precision floating point processing power. There are four
                  stu


pipelined ALUs (Arithmetic Logic Unit) in the original Itanium. Each can
process one integer calculation per cycle. They can also process MMX type
instructions.
          w.




The Itanium comes with 128 floating point and 128 integer registers. When
processing up to 20 operations in a single clock, the registers give plenty of
ww




room for data inside the processor. The registers also have the ability to rotate.
Rotating registers allows the processor to perform an operation on multiple
software accessible registers in turn efficient parallel execution is achieved for
integer operations with the six 1-cycle integer and six 2-cycle multi-media
units which accomplish full symmetric bypassing with each other and the L1D
cache in combination with a 20 ported, 128 X 65b register file. Virtually any

www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


instruction can be predicated off with a prior compare instruction. The integer
units and register file are 4mm X 1.9mm including hardware support for IA-32
code. The dual 82b FMAC units have a 4-cycle latency and are fully bypassed




                                                         om
with each other. In combination with the 14 ported, 128 X 82b register file and
other miscellaneous FP support, these units are 9mm X 2.2mm

There are several Itanium features designed to help with hardware scalability: a




                                                      i.c
full-CPU-speed Level 2 bus, a large L3 cache, deferred-transaction support and
flexible page sizes. The full-CPU-speed Level 3 bus provides quick




                                      og
communication between CPUs. The large L2 cache reduces inter-CPU bus
traffic by keeping data close to the CPU that needs it. Deferred-transaction
support can stop one CPU from getting in the way of another.
                                   nty
Flexible page sizes, from 4KB to 256MB, give the Itanium family the
flexibility to access small amounts of memory in small chunks and massive
amounts of memory in massive chunks without the pre-validated, 4 port 16KB
                           de

L1D cache [1] is tightly coupled to the integer units to achieve the half cycle
load. As a result, the less latency sensitive FPU directly interfaces to the L2D
                 stu


cache [1] with 4 82b load ports (6 cycle latency) and 2 82b store ports. The
3MB, 12 cycle latency L3 cache [1] is implemented with 135 separate
"subarrays" that enable high density and the ability to conform to the irregular
         w.




shape of the processor core with flexible subarray placement. Each level of on-
chip cache has matched bandwidths at 32GB/s across the hierarchy he
overhead of smaller page sizes.
ww




The front-end instruction fetch pipe stages are de-coupled from the backend
stages via an 8-bundle queue. The same pre-validated, single-ended cache
technology [1] that enables the 1/2 cycle latency L1D cache is leveraged to
improve instruction fetch and branch restore. Each set of 6 instructions (2
bundles) stored in the 16KB L1 cache is accompanied by branch target address

www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


and branch prediction information. The data is read out of the cache in the first
half-cycle. In the next half-cycle, the prediction information is examined and if
the branch to which the stored target corresponds is predicted taken; this




                                                         om
address is input to the instruction pointer mux for the next instruction fetch.

The Itanium has extensive error handling capabilities. It features ECC and
parity error checking on most processor caches and busses. The processor also




                                                      i.c
has the capability to kill an application or thread that has experienced a
machine error without having to reboot. A major link in the food delivery




                                      og
system for the Itanium is the system bus. The Itanium uses a 2.1GBps multi-
drop system bus to keep well fed with data and instructions.
                                   nty
The first generation of Itanium systems, using the 460GX chipset, is
expandable with up to 64GB of memory. Generations beyond that are able to
take more memory. Higher end Itanium systems designed by the likes of SGI,
IBM and HP should eventually be able to take far more than 64GB.
                           de
                  stu
          w.
ww




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


Specifications:

Here are the Itanium specifications to summarize things.




                                                           om
    Physical Characteristics:

      •   25.4M transistors




                                                        i.c
      •   .18micron CMOS process
      •   6 metal layers
      •   C4 (flip-chip) assembly technology




                                         og
      •   1012-pad organic land grid array
      •   733MHz and 800MHz initial release clock speeds
                                      nty
      Instruction Dispersal:
                                de

      •   2 bundle dispersal windows
      •   3 instructions per bundle
      •   9 function unit slots
                     stu



      •   2 integer slots
      •   2 floating point slots
      •   2 memory slots
            w.




      •   3 branch slots
      •   Maximum of 6 instructions issued each cycle
ww




      Floating Point Units:

      •   2 extended and double precision FMACs (Floating-point Multiply Add
          Calculators)


www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com
    •   4 double or single precision operations per clock maximum
    •   3.2 GFLOPS of peak double precision floating point performance at
        800MHz
    •   2 additional single precision FMACs
    •   4 single precision operations per clock maximum




                                                         om
    •   6.4 GFLOPS of peak single precision floating point performance total at
        800MHz




                                                      i.c
    Integer and Branch Units:

    •   4 single cycle integer ALUs




                                       og
    •   4 MMX units
    •   3 branch units
                                    nty
Compatibility:
                             de

        The Itanium is fully x86 compatible in hardware. Applications and
        operating systems can run without any changes. A decoder internal to
        the CPU decodes x86 instructions into EPIC instructions, than
                  stu



        dynamically schedules them to run with increased parallelism. While the
        Itanium is compatible with x86 software, it is not expected to do it
        quickly. Compatibility is being included to ease the transition from x86
          w.




        code to EPIC code.

        Most of the compilers that are used to create all the programs have
ww




        already been written for the IA-64 architecture, and others are in pursuit.
        So writing programs for the Itanium is not a permanent headache,
        though it may seem troublesome for the time being, but that’s what the
        31/2” floppy diskette seemed at first.




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com



              Compatibility           Scheduler IA-32                     Ia-32
           Fetch and Decode              Dynamic                      Retirement &
                                                                       Exceptions




                                                        om
                                                         Shared
             Shared I -cache                            Execution




                                                     i.c
                                                          core



    Fig. Seamless Architecture allows Full Itanium Performance on IA32system




                                      og
                                   Functions.


Applications:
                                   nty
          The new IA-64 architecture finds its applications in various fields.
        This processor has a tremendous potential in terms of speed and
                           de

        performance. The inherently scalable nature of the architecture makes it
        very compelling for the high-end server and workstation market
        segments. The Itanium was not designed for small systems, it is
                  stu


        intended for 1 to 4000 processor workstations and servers.

        The biggest, toughest computing challenges in the world are tackled—
           w.




        and very often solved through high performance computing (HPC).
        Such diverse and life-essential research areas like meteorological
        modeling, automotive crash test simulations, human genome mapping,
ww




        and nuclear blast modeling are all part of HPC. Solutions built on open
        standards-based Intel® platforms provide supercomputing capabilities at
        significant cost savings for cutting-edge scientific, research, industry,
        and enterprise HPC applications.




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com


     We look at some of its applications, which due to its innovative
     architecture make it the best bet.




                                                       om
    Business Intelligence
     Itanium’s 64-bit VLM support directly benefits multi-terabyte data
     warehousing and data mining, while other Itanium architecture




                                                    i.c
     techniques — such as predication — improve parallel transaction code
     execution even in the face of unpredictable control flows. Itanium’s




                                    og
     floating-point performance is also significant for software performing
     complex numerical analyses of large data sets, as is the ability to
     explicitly specify code execution order in well understood algorithms.
                                 nty
    Technical and Scientific
      Performance is a primary driver in electronic design automation (EDA),
                         de

     mechanical design automation (MDA), digital content creation (DCC),
     financial services, and scientific application purchases. Itanium’s
     floating-point features, large memory addressability for large data sets,
               stu



     and increased parallelism for complex processes combine to deliver new
     performance and scalability benefits to a market that is already highly
     receptive to the price/performance delivered by Intel architectures.
        w.




    Security
ww




     The performance of encryption and decryption operations limits the
     scope of security system deployment because encrypting all network
     traffic from a client or a server requires 10 to 20 times the processor
     resources of unencrypted traffic. At a minimum, Itanium security
     performance is likely to benefit those e-Business applications that make



www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com
       some use of security protocols but are not dedicated to security
       operations.


    Directory Services




                                                        om
       The IA-64 architecture and its first microprocessor implementation, the
       Intel® Itanium™ processor, provide capabilities that enhance the
       performance and scalability of directory services. The 64-bit addressing
       enabled by the Intel Itanium architecture will help overcome the




                                                     i.c
       scalability barriers and awkward, maintenance-intensive partitioning
       directory schemes of current directory services on 32-bit platforms.




                                     og
The Itanium is Intel's first bid into the 64-bit enterprise computing market.
                                  nty
Primary competition is coming from the likes of Sun, IBM, HP, Compaq, and
SGI, all of whom have their own 64-bit processor solutions. In addition to the
promise of performance, Intel offers their ability to standardize the 64-bit
market onto one processor family and instruction set. IBM, HP, Compaq, and
                           de

SGI will all offer Itanium solutions alongside their own. Sun may remain as the
main holdout, and we expect a large and long battle between Sun and Intel.
                 stu


AMD may also enter the market at a later date. For now, AMD is beginning to
lay the groundwork of their move to the 64-bit arena with their announcement
of x86-64, a 64-bit extension to x86.
         w.
ww




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com




Conclusion:




                                                          om
First processor with Itanium architecture. Developed, manufactured, and
marketed by Intel® .In 1994, Hewlett-Packard and Intel Corporation agreed to




                                                       i.c
jointly design EPIC (Explicitly Parallel Instruction Computing), a post-RISC
and post-IA-32 technology. Using EPIC concepts, HP and Intel® then jointly
defined Itanium’s 64-bit Itanium Processor Family (IPF) architecture, the basis




                                       og
of Intel’s future, high-performance microprocessor family a broad range of
technical and commercial applications at 1.0GHz. The easiest way to keep data
and instructions in cache is to have a lot of cache to keep them in. Intel® knew
                                    nty
that when they designed the Itanium.

The Itanium has a complex, bleeding edge, forward looking processor family
                            de

that holds promise for huge gains in processing power. The processor uses the
entirely new EPIC architecture that has the potential to deliver large
improvements in processor parallelism. It is all about speed, and the Itanium
                  stu


has the paper pedigree to deliver it. If Intel can deliver, expect to see blood in
the enterprise server water.
          w.
ww




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com




GLOSSARY:




                                                        om
Architecture—Implementation of an instruction set for a processing method
(for example, PA-RISC, IA-32, MIPS, UltraSPARC).




                                                     i.c
Branch—A branch in the program that precedes a decision about what path to




                                     og
take (for example, IF… THEN… ELSE).


CISC (Complex Instruction Set Computing)—Processing method with a
                                  nty
complex set of machine instructions of variable length. Up to now has been the
opposite of RISC.
                          de

Compiler—A software tool that converts the instructions of a higher-level
programming language that are the language of the microprocessor.
                 stu



EPIC (Explicit Parallel Instruction Computing)—Processing method,
jointly developed by Hewlett-Packard and Intel, that enhances and replaces
         w.




processing according to the CISC and RISC procedures.


Explicit parallelism—The ability of the compiler to directly inform the
ww




processor of the independent nature of operations.


IA-32 (Intel 32-bit architecture)—The instruction set that forms the basis for
the broad spectrum of Intel processors for notebooks, PCs, workstations, and
servers. Executed in the Intel Pentium® processor, for example.


www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com



IA-64 or Itanium (Intel 64-bit architecture)—The Intel 64-bit architecture
that implements EPIC concepts. It provides full IA-32 and PA-RISC




                                                        om
compatibility. Now known as IPF (Intel Processor Family) architecture.


Implicit parallelism—Found in conventional microprocessor architectures,




                                                     i.c
this requires the Compiler to create sequential machine code that can interact
with the processor.




                                     og
ISA (Instruction Set Architecture)—The operating instructions that tell a
chip how to perform software functions and direct operations within the
                                  nty
microprocessor. HP and Intel jointly developed a new 64-bit ISA architecture,
known as the IPF architecture, which integrates technical concepts of the EPIC
technology.
                           de

IPF (Itanium Processor Family) architecture—Processor architecture that
implements the EPIC processing principle.
                  stu



Itanium— First processor with Itanium architecture. Developed,
manufactured, and marketed by Intel.
         w.




Latency, latency period, memory latency—The time the processor waits for
the completion of loading instructions to get data from memory.
ww




Merced™ processor—Early code name of the first processor from Intel in the
Itanium family.


Mispredict—A wrong decision regarding which path to take.


www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com
Parallelism—The ability to execute multiple instructions at the same time.
This is the opposite of sequential processing, one instruction after the other.


Predication—A technical concept that contributes to increasing overall
performance by the removal of branches and associated mispredicts.




                                                         om
Processor—Semiconductor chip whose components process machine
instructions of a specific architecture.




                                                      i.c
RISC (Reduced Instruction Set Computing)— Processing method with a




                                       og
reduced set of machine instructions of the same length (for example, 32 bits).


Speculation—A method to initiate a request for information, even before it is
                                    nty
definitely known that the information will be needed.
                            de
                  stu
          w.
ww




www.studentyogi.com www.studentyogi.com
p
www.studentyogi.com www.studentyogi.com




References-




                                                     om
    1. http://www.intel.com/ebusiness/itanium/
    2. IEEE Spectrum- July 2001, May 2000 issues.




                                                  i.c
    3. Computer System Architecture- M. Morris Mano. PHI.
    4. Computer Organization and Architecture- William Stallings. PHI




                                    og
    5. www.mit.edu - MIT’s website
    6. www.cpus.hp.com - host of information about CPU’s.
    7. www.cs.nmsu.com.
                                 nty
    8. http://www.geek.com/procspec/features/itanium/
                          de

    .
                 stu
         w.
ww




www.studentyogi.com www.studentyogi.com
p

Weitere ähnliche Inhalte

Ähnlich wie 64 Bit Multi Processor Paper Presentation

Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH veena babu
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGAIJERA Editor
 
Intel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down ApproachIntel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down ApproachEditor IJCATR
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsROHIT89352
 
A New Direction for Computer Architecture Research
A New Direction for Computer Architecture ResearchA New Direction for Computer Architecture Research
A New Direction for Computer Architecture Researchdbpublications
 
CO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptxCO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptxgagarwazir7
 
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.pptCSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.pptEricSifuna1
 
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...IDES Editor
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
Possibility of hpc application on cloud infrastructure by container cluster
Possibility of hpc application on cloud infrastructure by container clusterPossibility of hpc application on cloud infrastructure by container cluster
Possibility of hpc application on cloud infrastructure by container clusterKyunam Cho
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...ssusercf2bc0
 

Ähnlich wie 64 Bit Multi Processor Paper Presentation (20)

QSpiders - Basic intel architecture
QSpiders - Basic intel architectureQSpiders - Basic intel architecture
QSpiders - Basic intel architecture
 
Ef35745749
Ef35745749Ef35745749
Ef35745749
 
Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGA
 
Intel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down ApproachIntel Microprocessors- a Top down Approach
Intel Microprocessors- a Top down Approach
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
 
A New Direction for Computer Architecture Research
A New Direction for Computer Architecture ResearchA New Direction for Computer Architecture Research
A New Direction for Computer Architecture Research
 
CO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptxCO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptx
 
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.pptCSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
 
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
Isometric Making Essay
Isometric Making EssayIsometric Making Essay
Isometric Making Essay
 
Possibility of hpc application on cloud infrastructure by container cluster
Possibility of hpc application on cloud infrastructure by container clusterPossibility of hpc application on cloud infrastructure by container cluster
Possibility of hpc application on cloud infrastructure by container cluster
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...
Martin P. Bates - Interfacing PIC Microcontrollers_ Embedded Design by Intera...
 

Mehr von guestac67362

5 I N T R O D U C T I O N T O A E R O S P A C E T R A N S P O R T A T I O...
5  I N T R O D U C T I O N  T O  A E R O S P A C E  T R A N S P O R T A T I O...5  I N T R O D U C T I O N  T O  A E R O S P A C E  T R A N S P O R T A T I O...
5 I N T R O D U C T I O N T O A E R O S P A C E T R A N S P O R T A T I O...guestac67362
 
4 G Paper Presentation
4 G  Paper  Presentation4 G  Paper  Presentation
4 G Paper Presentationguestac67362
 
Bluetooth Technology Paper Presentation
Bluetooth Technology Paper PresentationBluetooth Technology Paper Presentation
Bluetooth Technology Paper Presentationguestac67362
 
Ce052391 Environmental Studies Set1
Ce052391 Environmental Studies Set1Ce052391 Environmental Studies Set1
Ce052391 Environmental Studies Set1guestac67362
 
Bluetooth Technology In Wireless Communications
Bluetooth Technology In Wireless CommunicationsBluetooth Technology In Wireless Communications
Bluetooth Technology In Wireless Communicationsguestac67362
 
Bio Chip Paper Presentation
Bio Chip Paper PresentationBio Chip Paper Presentation
Bio Chip Paper Presentationguestac67362
 
Bluetooth Paper Presentation
Bluetooth Paper PresentationBluetooth Paper Presentation
Bluetooth Paper Presentationguestac67362
 
Bio Metrics Paper Presentation
Bio Metrics Paper PresentationBio Metrics Paper Presentation
Bio Metrics Paper Presentationguestac67362
 
Bio Medical Instrumentation
Bio Medical InstrumentationBio Medical Instrumentation
Bio Medical Instrumentationguestac67362
 
Bluetooth Abstract Paper Presentation
Bluetooth Abstract Paper PresentationBluetooth Abstract Paper Presentation
Bluetooth Abstract Paper Presentationguestac67362
 
Basic Electronics Jntu Btech 2008
Basic Electronics Jntu Btech 2008Basic Electronics Jntu Btech 2008
Basic Electronics Jntu Btech 2008guestac67362
 
Basic electronic devices and circuits
Basic electronic devices and circuitsBasic electronic devices and circuits
Basic electronic devices and circuitsguestac67362
 
Automatic Speed Control System Paper Presentation
Automatic Speed Control System Paper PresentationAutomatic Speed Control System Paper Presentation
Automatic Speed Control System Paper Presentationguestac67362
 
Artificial Intelligence Techniques In Power Systems Paper Presentation
Artificial Intelligence Techniques In Power Systems Paper PresentationArtificial Intelligence Techniques In Power Systems Paper Presentation
Artificial Intelligence Techniques In Power Systems Paper Presentationguestac67362
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networksguestac67362
 
Automata And Compiler Design
Automata And Compiler DesignAutomata And Compiler Design
Automata And Compiler Designguestac67362
 
Auto Configuring Artificial Neural Paper Presentation
Auto Configuring Artificial Neural Paper PresentationAuto Configuring Artificial Neural Paper Presentation
Auto Configuring Artificial Neural Paper Presentationguestac67362
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentationguestac67362
 
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...guestac67362
 

Mehr von guestac67362 (20)

5 I N T R O D U C T I O N T O A E R O S P A C E T R A N S P O R T A T I O...
5  I N T R O D U C T I O N  T O  A E R O S P A C E  T R A N S P O R T A T I O...5  I N T R O D U C T I O N  T O  A E R O S P A C E  T R A N S P O R T A T I O...
5 I N T R O D U C T I O N T O A E R O S P A C E T R A N S P O R T A T I O...
 
4 G Paper Presentation
4 G  Paper  Presentation4 G  Paper  Presentation
4 G Paper Presentation
 
Bluetooth Technology Paper Presentation
Bluetooth Technology Paper PresentationBluetooth Technology Paper Presentation
Bluetooth Technology Paper Presentation
 
Ce052391 Environmental Studies Set1
Ce052391 Environmental Studies Set1Ce052391 Environmental Studies Set1
Ce052391 Environmental Studies Set1
 
Bluetooth Technology In Wireless Communications
Bluetooth Technology In Wireless CommunicationsBluetooth Technology In Wireless Communications
Bluetooth Technology In Wireless Communications
 
Bio Chip Paper Presentation
Bio Chip Paper PresentationBio Chip Paper Presentation
Bio Chip Paper Presentation
 
Bluetooth Paper Presentation
Bluetooth Paper PresentationBluetooth Paper Presentation
Bluetooth Paper Presentation
 
Bio Metrics Paper Presentation
Bio Metrics Paper PresentationBio Metrics Paper Presentation
Bio Metrics Paper Presentation
 
Bio Medical Instrumentation
Bio Medical InstrumentationBio Medical Instrumentation
Bio Medical Instrumentation
 
Bluetooth Abstract Paper Presentation
Bluetooth Abstract Paper PresentationBluetooth Abstract Paper Presentation
Bluetooth Abstract Paper Presentation
 
Basic Electronics Jntu Btech 2008
Basic Electronics Jntu Btech 2008Basic Electronics Jntu Btech 2008
Basic Electronics Jntu Btech 2008
 
Basic electronic devices and circuits
Basic electronic devices and circuitsBasic electronic devices and circuits
Basic electronic devices and circuits
 
Awp
AwpAwp
Awp
 
Automatic Speed Control System Paper Presentation
Automatic Speed Control System Paper PresentationAutomatic Speed Control System Paper Presentation
Automatic Speed Control System Paper Presentation
 
Artificial Intelligence Techniques In Power Systems Paper Presentation
Artificial Intelligence Techniques In Power Systems Paper PresentationArtificial Intelligence Techniques In Power Systems Paper Presentation
Artificial Intelligence Techniques In Power Systems Paper Presentation
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networks
 
Automata And Compiler Design
Automata And Compiler DesignAutomata And Compiler Design
Automata And Compiler Design
 
Auto Configuring Artificial Neural Paper Presentation
Auto Configuring Artificial Neural Paper PresentationAuto Configuring Artificial Neural Paper Presentation
Auto Configuring Artificial Neural Paper Presentation
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentation
 
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
 

64 Bit Multi Processor Paper Presentation

  • 1. www.studentyogi.com www.studentyogi.com Seminar report on co om ITANIUM:The 64- ITANIUM:The 64-bit microprocessor from m intel gi. .c oogi ntyy eent t t dd ssuu w. . w ww ww www.studentyogi.com www.studentyogi.com p
  • 2. www.studentyogi.com www.studentyogi.com ITANIUM om i.c History: og In 1994, Hewlett-Packard and Intel Corporation agreed to jointly design EPIC (Explicitly Parallel Instruction Computing), a post-RISC and post-IA-32 technology. Using EPIC concepts, HP and Intel® then jointly defined nty Itanium’s 64-bit Itanium Processor Family (IPF) architecture, the basis of Intel’s future, high-performance microprocessor family a broad range of technical and commercial applications at 1.0GHz. The starting place for the de team was a comprehensive understanding of the capabilities of the .18um bulk technology transistors and interconnects with a view toward exploiting these capabilities to the fullest. The Itanium processor began shipping in end-user stu pilot systems in late 2000. Intel intends to follow Itanium with additional processors in the Itanium family: McKinley, Madison and Deerfield in late 2002. w. ww Need For Itanium: Internet commerce and large database applications are dealing with ever- increasing quantities of data, and demands placed on both server and workstation resources are increasing correspondingly. One demand is for more memory than the 4 GB provided by today’s 32-bit computer architectures. Itanium’s ability to address a flat 64-bit memory address space in the millions www.studentyogi.com www.studentyogi.com p
  • 3. www.studentyogi.com www.studentyogi.com of gigabytes has been the focus of attention. Beyond very large memory (VLM) support, however, other traits, including a new Explicitly Parallel Instruction Computing (EPIC) design philosophy that will handle parallel om processing differently than previous architectures, speculation, predication, large register files, a register stack and advanced branch architecture. IA-64 also provides an enhanced system architecture supporting fast interrupt i.c response and a flexible, large virtual address mode. The 64-bit addressing enabled by the Intel Itanium architecture will help overcome the scalability og barriers and awkward, maintenance-intensive partitioning directory schemes of current directory services on 32-bit platforms. , Intel has been assiduous in providing backward compatibility with 32-bit binaries (IA-32), from the x86 nty families. The Itanium has a complex, bleeding edge, forward looking processor family that holds promise for huge gains in processing power. de The How’s & Wows of Itanium: stu Itanium defined a new architecture that provides a unique combination of innovative features, which overcomes the performance limitations of traditional w. architectures. The IA-64 architecture is based on innovative techniques/features such as Explicit parallelism, Parallelism, and Predication and Speculation, ww resulting in superior Instruction Level Parallelism (ILP) and increased instructions per cycle (IPC) to address the current and future requirements of these demanding Internet, high end server, and workstation applications. In addition, the IA-64 architecture provides headroom and scalability for continued future growth. www.studentyogi.com www.studentyogi.com p
  • 4. www.studentyogi.com www.studentyogi.com We take a look at all the features that make it possible, that are the bricks and mortar of the Itanium; and then we shall see the architecture that emerges. om i.c EPIC: og EPIC stands for Explicitly Parallel Instruction Computing–a new design philosophy going beyond the RISC and CISC processors that are available nty today. EPIC technology enables greater instruction level parallelism than previous processor architectures, supporting higher levels of performance in targeted application segments. The Itanium architecture is based on EPIC technology. EPIC is based on a unique combination of innovative features such de as predication, speculation and explicit parallelism enabling world-class performance for the high-end enterprise class of computing stu The EPIC architecture uses complex instruction wording that, in addition to the basic instruction, contains information on how to run the instruction in parallel. EPIC instructions are put together by the compiler into a threesome called a w. "bundle." Bundles instructions are sent to the CPU together other instructions. The "bundles," or their parts, are put together in an "instruction group" with ww other instructions. The IA-64 utilizes 128 bit bundles, organized as a template plus three IA-64 instructions. The template contains information provided by the compiler to the processor. The template contains the following information: 1. Which instructions can be executed in parallel, and which instructions must be executed serially. www.studentyogi.com www.studentyogi.com p
  • 5. www.studentyogi.com www.studentyogi.com 2. Information relating to parallelism in respect to it's neighboring bundles. om 3. Maps instruction slots to execution types i.c Instructions in a bundle do not affect each other with the data they are working on, so they can run together without getting in each other's way. There is no limit to the size of an instruction group, and an instruction group can begin or og end in the middle of a bundle. The instructions are actually bundled and grouped together when software is compiled. This simplifies the process of running multiple instructions at once on an Itanium CPU, allowing it to make nty greater use of multiple execution units without having to rely on complex on- die logic to determine what operations can run in parallel. The Itanium will still use on-die logic to improve upon instruction level parallelism, but EPIC de instructions, at the minimum, provide a parallel blueprint for the Itanium processor. Of course, for this reason, compiler technology and programming stu algorithms will have a massive impact on Itanium performance. The compiler adds branch hints, register stack and rotation, data and control speculation, and memory hints into EPIC instructions. Thus, EPIC makes processing faster, w. much faster. ww www.studentyogi.com www.studentyogi.com p
  • 6. www.studentyogi.com www.studentyogi.com om i.c og nty de EPIC Pipelining: The parallel execution core of EPIC can have upto 10 stu pipelined stages. The first-generation Itanium processor is able to issue up to six EPIC instructions in parallel every clock cycle. The next generation Itanium might yield 20 instructions per cycle, though not consistently, but the w. potential is there and proper coding and compiling should yield efficient usage of the CPU. ww Predication: Predication is a compiling technique used in the Itanium that optimizes or removes branching code by working it so that much of the code runs in www.studentyogi.com www.studentyogi.com p
  • 7. www.studentyogi.com www.studentyogi.com parallel. By designing software and compilers to rework branches into parallel code with fewer or no branches, and by running this code on a "wide" processor that can process this code in parallel, which the Itanium is intended om to be able to do, the number of cycles it takes to complete a task drops. What predication does is minimize the time it takes to run if-then-else situation and uses processor width to run both the 'then' and 'else' in parallel. When the i.c 'if' branch is determined, the incorrect branch's result is discarded. By removing branches and making code more parallel, predication reduces the number of og cycles it takes to complete a task while making better use of a wide processor. Also, there are less branch mispredicts. Branch mispredicts necessitate that the pipeline be flushed, a very cycle expensive procedure, so by lessening the nty number of branches, predication can greatly reduce wasted processor time. de Speculation: The latency of memory is a big performance bottleneck in today's systems. IA- stu 64 architecture employs a technique called as speculation to initiate loads from memory earlier in the instruction stream—even before a branch. Data speculation thus, is caching and calling for data that may be needed or may be w. changed before it is needed, so that, in the case that the data is needed and it has not changed, the CPU does not have to take a latency impact from calling ww for the data. The processor, with the help of compiled instructions, looks ahead, anticipates what information it may need, and then brings it to cache or into the processor. This helps hide memory latency. Thus speculation increases instruction level parallelism and reduces the impact of memory latency resulting in handsome performance gains. www.studentyogi.com www.studentyogi.com p
  • 8. www.studentyogi.com www.studentyogi.com Control speculation is a feature that runs deep within IA-64. Data while being loaded might be erroneous. So, here data values are associated with a bit which tells whether an error was generated when the data was loaded or not. This om error ("Nat") bit follows the data through arithmetical operations, moves, and conditionals until the data is checked through a check instruction. This means that with IA-64, the compiler can aggressively load data well ahead of time i.c (speculation) with out paying unnecessary exception penalties. A final benefit of "Nats" is that it provides for structured error handling as is found in C and C++. Because the compiler can schedule the "check" instruction whenever it og wants, it is possible to safely defer errors until the best possible time to recover from them. The architectural support for structured error handling in IA-64 is nty an important feature promoting system reliability and performance Cache: de When a processor is waiting for data or instructions, time is wasting. stu The longer it takes for data and instructions to get to the CPU, the worse it gets. When data and instructions are in cache, the processor can grab them much quicker than when having to go to slow main memory. Not only is cache w. latency much lower than DRAM latency, the bandwidth is much higher. There are some trick programming techniques in use out there to keep often- ww used data and instructions in cache and they are not the kind of techniques you learn in your high school BASIC course. Still, the easiest way to keep data and instructions in cache is to have a lot of cache to keep them in. Intel knew that when they designed the Itanium. www.studentyogi.com www.studentyogi.com p
  • 9. www.studentyogi.com www.studentyogi.com The Itanium has three levels of cache. L1 and L2 are on-die while L3 is on cartridge. According to Intel, the L3 cache weighs in at 2MB or 4MB of four- way set associative cache on two or four 1MB chips. IDC reports that the L2 om cache size is 96k in size, and the L1 cache, which does not deal with floating point data, has a 16KB integer data and a 16KB instruction cache. The 294.8 million transistors of (4MB) level three cache runs at the full i.c processor speed, giving 12.8GBps of memory bandwidth at 800MHz. With 2MB or 4MB of L3 cache on the Itanium, the chances of the required data and og instructions being in cache are quite good, bus traffic can be reduced, and performance increases. With six pipelines hungry for instructions and data, the Itanium needs all the cache it can get. Caching is made effective through data nty speculation and cache hints. Data speculation has been explained above. Cache hints are two-bit markers for memory loads set by the compiler that help the CPU find data in cache. This improves the speed of retrieving data from cache. de stu The Overall Architecture: The IA-64 architecture is based on innovative techniques/features such w. as Explicit parallelism, and Predication and Speculation, resulting in superior Instruction Level Parallelism (ILP) and increased instructions per cycle (IPC) ww to address the current and future requirements of these demanding Internet, high end server, and workstation applications. In addition, the IA-64 architecture provides headroom and scalability for continued future growth. The Itanium was not designed for small systems, it is intended for 1 to 4000 processor workstations and servers. The first-generation Itanium processor is able to issue up to six EPIC instructions in parallel every clock cycle. www.studentyogi.com www.studentyogi.com p
  • 10. www.studentyogi.com www.studentyogi.com The six issue (two bundle) scheduler disperses instructions into nine functional om slots, two integer slots, two floating-point slots, two memory slots, and three branch slots, giving a total of nine dispersal slots. Proper compiler design should be able to handle most situations without overloading any type of functional slot. In addition to pure clock speed boosts, futures Itanium CPUs i.c are sure to have more functional units (FMACs, ALUs). The Itanium contains four pipelined FMAC units (Floating-point Multiply Add og Calculator). The primary two are each capable of processing two single- precision, two double precision, or two double-extended-precision floating- nty point operations per clock. That yields up to 3.2GFLOPS of highly precise floating point processing. There are an additional two FMACs tuned for 3D applications. They are each capable of processing up to two single-precision floating-point operations per clock. That yields another 3.2GFLOPS of single- de precision processing power. All together, the Itanium has a theoretical max of 6.4GLOPS of single-precision floating point processing power. There are four stu pipelined ALUs (Arithmetic Logic Unit) in the original Itanium. Each can process one integer calculation per cycle. They can also process MMX type instructions. w. The Itanium comes with 128 floating point and 128 integer registers. When processing up to 20 operations in a single clock, the registers give plenty of ww room for data inside the processor. The registers also have the ability to rotate. Rotating registers allows the processor to perform an operation on multiple software accessible registers in turn efficient parallel execution is achieved for integer operations with the six 1-cycle integer and six 2-cycle multi-media units which accomplish full symmetric bypassing with each other and the L1D cache in combination with a 20 ported, 128 X 65b register file. Virtually any www.studentyogi.com www.studentyogi.com p
  • 11. www.studentyogi.com www.studentyogi.com instruction can be predicated off with a prior compare instruction. The integer units and register file are 4mm X 1.9mm including hardware support for IA-32 code. The dual 82b FMAC units have a 4-cycle latency and are fully bypassed om with each other. In combination with the 14 ported, 128 X 82b register file and other miscellaneous FP support, these units are 9mm X 2.2mm There are several Itanium features designed to help with hardware scalability: a i.c full-CPU-speed Level 2 bus, a large L3 cache, deferred-transaction support and flexible page sizes. The full-CPU-speed Level 3 bus provides quick og communication between CPUs. The large L2 cache reduces inter-CPU bus traffic by keeping data close to the CPU that needs it. Deferred-transaction support can stop one CPU from getting in the way of another. nty Flexible page sizes, from 4KB to 256MB, give the Itanium family the flexibility to access small amounts of memory in small chunks and massive amounts of memory in massive chunks without the pre-validated, 4 port 16KB de L1D cache [1] is tightly coupled to the integer units to achieve the half cycle load. As a result, the less latency sensitive FPU directly interfaces to the L2D stu cache [1] with 4 82b load ports (6 cycle latency) and 2 82b store ports. The 3MB, 12 cycle latency L3 cache [1] is implemented with 135 separate "subarrays" that enable high density and the ability to conform to the irregular w. shape of the processor core with flexible subarray placement. Each level of on- chip cache has matched bandwidths at 32GB/s across the hierarchy he overhead of smaller page sizes. ww The front-end instruction fetch pipe stages are de-coupled from the backend stages via an 8-bundle queue. The same pre-validated, single-ended cache technology [1] that enables the 1/2 cycle latency L1D cache is leveraged to improve instruction fetch and branch restore. Each set of 6 instructions (2 bundles) stored in the 16KB L1 cache is accompanied by branch target address www.studentyogi.com www.studentyogi.com p
  • 12. www.studentyogi.com www.studentyogi.com and branch prediction information. The data is read out of the cache in the first half-cycle. In the next half-cycle, the prediction information is examined and if the branch to which the stored target corresponds is predicted taken; this om address is input to the instruction pointer mux for the next instruction fetch. The Itanium has extensive error handling capabilities. It features ECC and parity error checking on most processor caches and busses. The processor also i.c has the capability to kill an application or thread that has experienced a machine error without having to reboot. A major link in the food delivery og system for the Itanium is the system bus. The Itanium uses a 2.1GBps multi- drop system bus to keep well fed with data and instructions. nty The first generation of Itanium systems, using the 460GX chipset, is expandable with up to 64GB of memory. Generations beyond that are able to take more memory. Higher end Itanium systems designed by the likes of SGI, IBM and HP should eventually be able to take far more than 64GB. de stu w. ww www.studentyogi.com www.studentyogi.com p
  • 13. www.studentyogi.com www.studentyogi.com Specifications: Here are the Itanium specifications to summarize things. om Physical Characteristics: • 25.4M transistors i.c • .18micron CMOS process • 6 metal layers • C4 (flip-chip) assembly technology og • 1012-pad organic land grid array • 733MHz and 800MHz initial release clock speeds nty Instruction Dispersal: de • 2 bundle dispersal windows • 3 instructions per bundle • 9 function unit slots stu • 2 integer slots • 2 floating point slots • 2 memory slots w. • 3 branch slots • Maximum of 6 instructions issued each cycle ww Floating Point Units: • 2 extended and double precision FMACs (Floating-point Multiply Add Calculators) www.studentyogi.com www.studentyogi.com p
  • 14. www.studentyogi.com www.studentyogi.com • 4 double or single precision operations per clock maximum • 3.2 GFLOPS of peak double precision floating point performance at 800MHz • 2 additional single precision FMACs • 4 single precision operations per clock maximum om • 6.4 GFLOPS of peak single precision floating point performance total at 800MHz i.c Integer and Branch Units: • 4 single cycle integer ALUs og • 4 MMX units • 3 branch units nty Compatibility: de The Itanium is fully x86 compatible in hardware. Applications and operating systems can run without any changes. A decoder internal to the CPU decodes x86 instructions into EPIC instructions, than stu dynamically schedules them to run with increased parallelism. While the Itanium is compatible with x86 software, it is not expected to do it quickly. Compatibility is being included to ease the transition from x86 w. code to EPIC code. Most of the compilers that are used to create all the programs have ww already been written for the IA-64 architecture, and others are in pursuit. So writing programs for the Itanium is not a permanent headache, though it may seem troublesome for the time being, but that’s what the 31/2” floppy diskette seemed at first. www.studentyogi.com www.studentyogi.com p
  • 15. www.studentyogi.com www.studentyogi.com Compatibility Scheduler IA-32 Ia-32 Fetch and Decode Dynamic Retirement & Exceptions om Shared Shared I -cache Execution i.c core Fig. Seamless Architecture allows Full Itanium Performance on IA32system og Functions. Applications: nty The new IA-64 architecture finds its applications in various fields. This processor has a tremendous potential in terms of speed and de performance. The inherently scalable nature of the architecture makes it very compelling for the high-end server and workstation market segments. The Itanium was not designed for small systems, it is stu intended for 1 to 4000 processor workstations and servers. The biggest, toughest computing challenges in the world are tackled— w. and very often solved through high performance computing (HPC). Such diverse and life-essential research areas like meteorological modeling, automotive crash test simulations, human genome mapping, ww and nuclear blast modeling are all part of HPC. Solutions built on open standards-based Intel® platforms provide supercomputing capabilities at significant cost savings for cutting-edge scientific, research, industry, and enterprise HPC applications. www.studentyogi.com www.studentyogi.com p
  • 16. www.studentyogi.com www.studentyogi.com We look at some of its applications, which due to its innovative architecture make it the best bet. om Business Intelligence Itanium’s 64-bit VLM support directly benefits multi-terabyte data warehousing and data mining, while other Itanium architecture i.c techniques — such as predication — improve parallel transaction code execution even in the face of unpredictable control flows. Itanium’s og floating-point performance is also significant for software performing complex numerical analyses of large data sets, as is the ability to explicitly specify code execution order in well understood algorithms. nty Technical and Scientific Performance is a primary driver in electronic design automation (EDA), de mechanical design automation (MDA), digital content creation (DCC), financial services, and scientific application purchases. Itanium’s floating-point features, large memory addressability for large data sets, stu and increased parallelism for complex processes combine to deliver new performance and scalability benefits to a market that is already highly receptive to the price/performance delivered by Intel architectures. w. Security ww The performance of encryption and decryption operations limits the scope of security system deployment because encrypting all network traffic from a client or a server requires 10 to 20 times the processor resources of unencrypted traffic. At a minimum, Itanium security performance is likely to benefit those e-Business applications that make www.studentyogi.com www.studentyogi.com p
  • 17. www.studentyogi.com www.studentyogi.com some use of security protocols but are not dedicated to security operations. Directory Services om The IA-64 architecture and its first microprocessor implementation, the Intel® Itanium™ processor, provide capabilities that enhance the performance and scalability of directory services. The 64-bit addressing enabled by the Intel Itanium architecture will help overcome the i.c scalability barriers and awkward, maintenance-intensive partitioning directory schemes of current directory services on 32-bit platforms. og The Itanium is Intel's first bid into the 64-bit enterprise computing market. nty Primary competition is coming from the likes of Sun, IBM, HP, Compaq, and SGI, all of whom have their own 64-bit processor solutions. In addition to the promise of performance, Intel offers their ability to standardize the 64-bit market onto one processor family and instruction set. IBM, HP, Compaq, and de SGI will all offer Itanium solutions alongside their own. Sun may remain as the main holdout, and we expect a large and long battle between Sun and Intel. stu AMD may also enter the market at a later date. For now, AMD is beginning to lay the groundwork of their move to the 64-bit arena with their announcement of x86-64, a 64-bit extension to x86. w. ww www.studentyogi.com www.studentyogi.com p
  • 18. www.studentyogi.com www.studentyogi.com Conclusion: om First processor with Itanium architecture. Developed, manufactured, and marketed by Intel® .In 1994, Hewlett-Packard and Intel Corporation agreed to i.c jointly design EPIC (Explicitly Parallel Instruction Computing), a post-RISC and post-IA-32 technology. Using EPIC concepts, HP and Intel® then jointly defined Itanium’s 64-bit Itanium Processor Family (IPF) architecture, the basis og of Intel’s future, high-performance microprocessor family a broad range of technical and commercial applications at 1.0GHz. The easiest way to keep data and instructions in cache is to have a lot of cache to keep them in. Intel® knew nty that when they designed the Itanium. The Itanium has a complex, bleeding edge, forward looking processor family de that holds promise for huge gains in processing power. The processor uses the entirely new EPIC architecture that has the potential to deliver large improvements in processor parallelism. It is all about speed, and the Itanium stu has the paper pedigree to deliver it. If Intel can deliver, expect to see blood in the enterprise server water. w. ww www.studentyogi.com www.studentyogi.com p
  • 19. www.studentyogi.com www.studentyogi.com GLOSSARY: om Architecture—Implementation of an instruction set for a processing method (for example, PA-RISC, IA-32, MIPS, UltraSPARC). i.c Branch—A branch in the program that precedes a decision about what path to og take (for example, IF… THEN… ELSE). CISC (Complex Instruction Set Computing)—Processing method with a nty complex set of machine instructions of variable length. Up to now has been the opposite of RISC. de Compiler—A software tool that converts the instructions of a higher-level programming language that are the language of the microprocessor. stu EPIC (Explicit Parallel Instruction Computing)—Processing method, jointly developed by Hewlett-Packard and Intel, that enhances and replaces w. processing according to the CISC and RISC procedures. Explicit parallelism—The ability of the compiler to directly inform the ww processor of the independent nature of operations. IA-32 (Intel 32-bit architecture)—The instruction set that forms the basis for the broad spectrum of Intel processors for notebooks, PCs, workstations, and servers. Executed in the Intel Pentium® processor, for example. www.studentyogi.com www.studentyogi.com p
  • 20. www.studentyogi.com www.studentyogi.com IA-64 or Itanium (Intel 64-bit architecture)—The Intel 64-bit architecture that implements EPIC concepts. It provides full IA-32 and PA-RISC om compatibility. Now known as IPF (Intel Processor Family) architecture. Implicit parallelism—Found in conventional microprocessor architectures, i.c this requires the Compiler to create sequential machine code that can interact with the processor. og ISA (Instruction Set Architecture)—The operating instructions that tell a chip how to perform software functions and direct operations within the nty microprocessor. HP and Intel jointly developed a new 64-bit ISA architecture, known as the IPF architecture, which integrates technical concepts of the EPIC technology. de IPF (Itanium Processor Family) architecture—Processor architecture that implements the EPIC processing principle. stu Itanium— First processor with Itanium architecture. Developed, manufactured, and marketed by Intel. w. Latency, latency period, memory latency—The time the processor waits for the completion of loading instructions to get data from memory. ww Merced™ processor—Early code name of the first processor from Intel in the Itanium family. Mispredict—A wrong decision regarding which path to take. www.studentyogi.com www.studentyogi.com p
  • 21. www.studentyogi.com www.studentyogi.com Parallelism—The ability to execute multiple instructions at the same time. This is the opposite of sequential processing, one instruction after the other. Predication—A technical concept that contributes to increasing overall performance by the removal of branches and associated mispredicts. om Processor—Semiconductor chip whose components process machine instructions of a specific architecture. i.c RISC (Reduced Instruction Set Computing)— Processing method with a og reduced set of machine instructions of the same length (for example, 32 bits). Speculation—A method to initiate a request for information, even before it is nty definitely known that the information will be needed. de stu w. ww www.studentyogi.com www.studentyogi.com p
  • 22. www.studentyogi.com www.studentyogi.com References- om 1. http://www.intel.com/ebusiness/itanium/ 2. IEEE Spectrum- July 2001, May 2000 issues. i.c 3. Computer System Architecture- M. Morris Mano. PHI. 4. Computer Organization and Architecture- William Stallings. PHI og 5. www.mit.edu - MIT’s website 6. www.cpus.hp.com - host of information about CPU’s. 7. www.cs.nmsu.com. nty 8. http://www.geek.com/procspec/features/itanium/ de . stu w. ww www.studentyogi.com www.studentyogi.com p