Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 24 Anzeige

Herunterladen, um offline zu lesen

checking dependencies between instructions to determine which instructions can be grouped together for parallel execution;
assigning instructions to the functional units on the hardware;
determining when instructions are initiated placed together into a single word.

checking dependencies between instructions to determine which instructions can be grouped together for parallel execution;
assigning instructions to the functional units on the hardware;
determining when instructions are initiated placed together into a single word.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie Vliw (20)

Anzeige

Aktuellste (20)

Vliw

  1. 1. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Superscalar and VLIW Architectures VLSI ARCHITECTURES
  2. 2. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Outline • Types of architectures • Superscalar • Differences between CISC, RISC and VLIW • VLIW ( very long instruction word )
  3. 3. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Parallel processing Processing instructions in parallel requires three major tasks: 1. checking dependencies between instructions to determine which instructions can be grouped together for parallel execution; 2. assigning instructions to the functional units on the hardware; 3. determining when instructions are initiated placed together into a single word.
  4. 4. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Major categories VLIW – Very Long Instruction Word EPIC – Explicitly Parallel Instruction Computing
  5. 5. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Superscalar Processors • Superscalar processors are designed to exploit more instruction-level parallelism in user programs. • Only independent instructions can be executed in parallel without causing a wait state. • The amount of instruction-level parallelism varies widely depending on the type of code being executed.
  6. 6. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Pipelining in Superscalar Processors • In order to fully utilise a superscalar processor of degree m, m instructions must be executable in parallel. This situation may not be true in all clock cycles. In that case, some of the pipelines may be stalling in a wait state. • In a superscalar processor, the simple operation latency should require only one cycle, as in the base scalar processor.
  7. 7. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA
  8. 8. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Superscalar Execution
  9. 9. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Superscalar Implementation • Simultaneously fetch multiple instructions • Logic to determine true dependencies involving register values • Mechanisms to communicate these values • Mechanisms to initiate multiple instructions in parallel • Resources for parallel execution of multiple instructions • Mechanisms for committing process state in correct order
  10. 10. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA VLIW History The term coined by J.A. Fisher (Yale) in 1983 ELI S12 (prototype) Trace (Commercial) Origin lies in horizontal microcode optimization Another pioneering work by B. Ramakrishna Rau in 1982 Poly cyclic (Prototype) Cydra-5 (Commercial) Recent developments Trimedia – Philips TMS320C6X – Texas Instruments
  11. 11. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA The VLIW Architecture • A typical VLIW (very long instruction word) machine has instruction words hundreds of bits in length. • Multiple functional units are used concurrently in a VLIW processor. • All functional units share the use of a common large register file.
  12. 12. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Why Superscalar Processors are commercially more popular as compared to VLIW processor ? Binary code compatibility among scalar & superscalar processors of same family Same compiler works for all processors (scalars and superscalars) of same family Assembly programming of VLIWs is tedious Code density in VLIWs is very poor - Instruction encoding schemes Area Performance
  13. 13. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Data path : A simple VLIW Architecture FU FU FU Register file Scalability ? Access time, area, power consumption sharply increase with number of register ports
  14. 14. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Data path : Clustered VLIW Architecture (distributed register file) FU FU Register file FU FU Register file FU FU Register file Interconnection Network
  15. 15. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Coarse grain Fus with VLIW core MULT RAM ALU Coarse grain FU Reg2 Reg1 Reg1 Reg1 Reg2 Reg2 Multiplexer network Micro Code IR Prg. Counter Logic Embedded (co)-processors as Fus in a VLIW architecture
  16. 16. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Application Specific FUs FUfunctionality number of inputs number of outputs latency initiation interval I/O time shape Functional Units
  17. 17. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Comparison: CISC, RISC, VLIW
  18. 18. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA
  19. 19. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Advantages of VLIW Compiler prepares fixed packets of multiple operations that give the full "plan of execution" – dependencies are determined by compiler and used to schedule according to function unit latencies – function units are assigned by compiler and correspond to the position within the instruction packet ("slotting") – compiler produces fully-scheduled, hazard-free code => hardware doesn't have to "rediscover" dependencies or schedule
  20. 20. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Disadvantages of VLIW Compatibility across implementations is a major problem – VLIW code won't run properly with different number of function units or different latencies – unscheduled events (e.g., cache miss) stall entire processor Code density is another problem – low slot utilization (mostly nops) – reduce nops by compression ("flexible VLIW", "variable-length VLIW")
  21. 21. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA
  22. 22. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA
  23. 23. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA References 1. Advanced Computer Architectures, Parallelism, Scalability, Programmability, K. Hwang, 1993. 2. M. Smotherman, "Understanding EPIC Architectures and Implementations" (pdf) http://www.cs.clemson.edu/~mark/464/acmse_epic.pdf 3. Lecture notes of Mark Smotherman, http://www.cs.clemson.edu/~mark/464/hp3e4.html 4. An Introduction To Very-Long Instruction Word (VLIW) Computer Architecture, Philips Semiconductors, http://www.semiconductors.philips.com/acrobat_download/other /vliw-wp.pdf 5. Texas Instruments, Tutorial on TMS320C6000 VelociTI Advanced VLIW Architecture. http://www.acm.org/sigs/sigmicro/existing/micro31/pdf/m31_sesha n.pdf
  24. 24. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA Thanks

×