SlideShare ist ein Scribd-Unternehmen logo
1 von 40
ECE 321
 Computer Architecture
             Chapter 1
Computer Abstractions and Technology
Course Overview
                                   Input                                     Input
                                 Multiplicand                               Multiplier

                                                   32

                                      Multiplicand
                                       Register                   LoadMp
                                                                                                                                                     Computer Arithmetic
                                                                                                         Arithmetic
                                         32=>34
                                         signEx
                                                                                              32
                                       <<1
                                                        34
                                             34
          32=>34                         1          0
          signEx                       34x2 MUX
                                                             Multi x2/x1
                  34                              34




                       34-bit ALU                Sub/Add
                                                                                                                                   Control
                                                                                                                                    Logic
                             34
                                                                                                                            [0]"




32        2                  32                                                      ShiftAll
                                                                                                                         "LO




                                                                                                                ENC[2]
                                                                                      LO[1]




                                                                                                      Encoder




              2                                    2
     2 bits




                                                                                                       Booth




                       HI register                            LO register
     Extra




                                                                                                                ENC[1]
                                                                                               Prev




                       (16x2 bits)                            (16x2 bits)
                                                                                                                ENC[0]
                                                                                       2
                                                                            LoadLO
                                       ClearHI
                             LoadHI




                                                                                     LO[1:0]




                             32                                        32


                       Result[HI]                             Result[LO]




                                                              Single/multicycle
                                                                 Datapaths                                                                           Datapaths
Course Overview [contd…]
IFetchDcd   Exec Mem    WB

      IFetchDcd   Exec Mem    WB

            IFetchDcd   Exec Mem   WB
                                              Performance
                  IFetchDcd   Exec Mem   WB



Pipelining




                                               Memory


  Memory Systems
What You Will Learn
• How programs are translated into the
  machine language
  – And how the hardware executes them
• The hardware/software interface
• What determines program performance
  – And how it can be improved
• How hardware designers improve
  performance
• What is parallel processing
What’s In It For Me ?
• In-depth understanding of the inner-workings of
  modern computers, their evolution, and trade-
  offs present at the hardware/software boundary.
  – Insight into fast/slow operations that are easy/hard to
    implementation hardware


• Experience with the design process in the
  context of a large complex (hardware) design.
  – Functional Spec --> Control & Datapath --> Physical
    implementation
  – Modern CAD tools
Computer Architecture - Definition
• Computer Architecture = ISA + MO

• Instruction Set Architecture
  – What the executable can “see” as underlying hardware
  – Logical View



• Machine Organization
  – How the hardware implements ISA ?
  – Physical View
Computer Architecture – Changing Definition
 • 1950s to 1960s: Computer Architecture Course:
     –Computer Arithmetic

 • 1970s to mid 1980s: Computer Architecture Course:
     –Instruction Set Design, especially ISA appropriate for compilers

 • 1990s: Computer Architecture Course:
        Design of CPU, memory system, I/O system, Multiprocessors,
        Networks
 • 2000s: Computer Architecture Course:
     –Non Von-Neumann architectures, Reconfiguration


 • DNA Computing, Quantum Computing ????
Some Examples …
° Digital Alpha     (v1, v3)      1992-97
° HP PA-RISC        (v1.1, v2.0) 1986-96
° Sun SPARC         (v8, v9)      1987-95
° SGI MIPS (MIPS I, II, III, IV, V) 1986-96
° IA-16/32 (8086,286,386, 486, 1978-1999
      Pentium, MMX, SSE, …)
° IA-64 (Itanium)                 1996-now
° AMD64/EMT64                     2002-now
° IBM POWER (PowerPC,…)           1990-now
° Many dead processor architectures live on in
  microcontrollers
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
   – Up to 100 devices on a chip
• Medium scale integration - to 1971
   – 100-3,000 devices on a chip
• Large scale integration - 1971-1977
   – 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 to date
   – 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration
   – Over 100,000,000 devices on a chip
The MIPS R3000 ISA (Summary)
• Instruction Categories
   – Load/Store                                 R0 - R31
   – Computational
   – Jump and Branch
   – Floating Point
                                                      PC
       • coprocessor                                  HI
   – Memory Management                                LO
   – Special
3 Instruction Formats: all 32 bits wide
        OP        rs        rt          rd      sa         funct

        OP         rs        rt           immediate
        OP                        jump target
“What” is Computer Architecture ?
                  Application
                                Operating
                                  System
                    Compiler       Firmware
                                               Instruction Set
                                                Architecture
ECE 321          Instr. Set Proc. I/O system
                    Datapath & Control
                      Digital Design
                      Circuit Design
                          Layout




• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
• Design, Measurement, and Evaluation
Impact of Changing ISA
• Early 1990’s Apple switched instruction set
  architecture of the Macintosh
  – From Motorola 68000-based machines
  – To PowerPC architecture

• Intel 80x86 Family: many implementations
  of same architecture
  – program written in 1978 for 8086 can be run
    on latest Pentium chip
Factors Affecting ISA ???
       Technology           Programming
                            Languages

Applications
                  Computer        Cleverness
                 Architecture



     Operating
     Systems
                                   History
ISA: Critical Interface

software



                          instruction set



hardware




           Examples: 80x86 50,000,000 vs. MIPS 5500,000 ???
The Big Picture

            Processor
                                 Input
             Control
                        Memory


            Datapath
                                 Output




Since 1946 all computers have had 5 components!!!
Example Organization
• TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
                               MBus Module
    SuperSPARC

    Floating-point Unit
                                 L2     CC                    DRAM
   Integer Unit                  $               MBus         Controller



  Inst     Ref        Data     L64852 MBus control
                                            M-S Adapter       STDIO
  Cache    MMU        Cache
                                  SBus                           serial
                      Store                        SCSI          kbd
                                      SBus                       mouse
                      Buffer          DMA          Ethernet      audio
                                                                 RTC
      Bus Interface                   SBus
                                      Cards                      Floppy
Moore’s Law
• Increased density of components on chip
• Gordon Moore - cofounder of Intel
• Number of transistors on a chip will double 18-24
  months
• Since 1970’s development has slowed a little
   – Number of transistors doubles every 24 months
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical paths,
  giving higher performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Technology Trends
• Processor
   – logic capacity: about 30% per year
   – clock rate:     about 20% per year
• Memory
   – DRAM capacity: about 60% per year (4x every 3 years)
   – Memory speed: about 10% per year
   – Cost per bit: improves about 25% per year
• Disk
   – capacity: about 60% per year
   – Total use of data: 100% per 9 months!
• Network Bandwidth
   – Bandwidth increasing more than 100% per year!
Technology Trends
                                                           Microprocessor Logic Density
     DRAM chip capacity                             100000000




              DRAM
                                                     10000000
       Year      Size                                                            uP -Nam e
                                                                                                                         R10000
                                                                                                                      Pentium
       1980      64 Kb                                                                                                R4400
                                                                                                             i80486
       1983      256 Kb                              1000000




                                      Transistors
       1986      1 Mb                                                                               i80386

                                                                                               i80286
       1989      4 Mb                                 100000
                                                                                                        R3010

       1992      16 Mb                                                                 i8086
                                                                                                  SU MIPS                    i80x86

       1996      64 Mb                                 10000
                                                                                                                             M68K
                                                                                                                             MIPS

       1999      256 Mb                                                                                                      Alpha
                                                                         i4004
       2002      1 Gb                                   1000
                                                           1965   1970      1975       1980     1985    1990          1995      2000   2005




°   In ~1985 the single-chip processor (32-bit) and the single-board computer emerged

°   In ~2002 started having multiple processor cores on a chip (IBM POWER4)
Technology Trends




Smaller feature sizes – higher speed, density
Technology Trends




Number of transistors doubles every 18 months
(amended to 24 months)
Levels of Representation
                                         temp = v[k];
High Level Language                      v[k] = v[k+1];
   Program
                                         v[k+1] = temp;
           Compiler
                                            •     lw          $15,    0($2)
Assembly Language
  Program                                   •     lw          $16,    4($2)
                                            •     sw          $16,    0($2)
            Assembler
                                            •     sw          $15,    4($2)
                             0000    1001       1100   0110   1010   1111   0101   1000
Machine Language             1010    1111       0101   1000   0000   1001   1100   0110
  Program                    1100    0110       1010   1111   0101   1000   0000   1001
                             0101    1000       0000   1001   1100   0110   1010   1111

            Machine Interpretation

Control Signal                       ALUOP[0:3] <= InstReg[9:11] & MASK
  Specification
Execution Cycle
Instruction     Obtain instruction from program storage
  Fetch

Instruction     Determine required actions and instruction size
 Decode

 Operand        Locate and obtain operand data
  Fetch

 Execute        Compute result value or status

  Result        Deposit results in storage for later use
  Store

   Next
                Determine successor instruction
Instruction
The Role of Performance
Understanding Performance
• Algorithm
  – Determines number of operations executed
• Programming language, compiler, architecture
  – Determine number of machine instructions executed
    per operation
• Processor and memory system
  – Determine how fast instructions are executed
• I/O system (including OS)
  – Determines how fast I/O operations are executed
Example of Performance Measure
Performance Metrics
• Response Time
  – Delay between start and end time of a task


• Throughput
  – Numbers of tasks per given time


• New: Power/Energy
  – Energy per task, power
CPU Clocking
• Operation of digital hardware governed by a
  constant-rate clock
                     Clock period

Clock (cycles)

Data transfer
and computation
Update state


        Clock period: duration of a clock cycle
              e.g., 250ps = 0.25ns = 250 10–12s
        Clock frequency (rate): cycles per second
              e.g., 4.0GHz = 4000MHz = 4.0 109Hz
Examples
    (Throughput/Performance)

• Replace the processor with a faster
  version?
  – 3.8 GHz instead of 3.2 GHz


• Add an additional processor to a system?
  – Core Duo instead of P4
Measuring Performance
• Wall clock time vs. Total execution time

• CPU Time
  – User Time
  – System Time


  Try using time command on UNIX system
Relating the Metrics
• Performance = 1/Execution Time

• CPU Execution Time = CPU clock cycles
  for program x Clock cycle time

• CPU clock cycles = Instructions for a
  program x Average clock cycles per
  Instruction
Performance Summary
The BIG Picture

            Instructio ns   Clock cycles    Seconds
CPU Time
             Program        Instructio n   Clock cycle


• Performance depends on
  – Algorithm: affects IC, possibly CPI
  – Programming language: affects IC, CPI
  – Compiler: affects IC, CPI
  – Instruction set architecture: affects IC, CPI, Tc
SPEC CPU Benchmark
• Programs used to measure performance
  – Supposedly typical of actual workload
• Standard Performance Evaluation Corp (SPEC)
  – Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006
  – Elapsed time to execute a selection of programs
     • Negligible I/O, so focuses on CPU performance
  – Normalize relative to reference machine
  – Summarize as geometric mean of performance ratios
     • CINT2006 (integer) and CFP2006 (floating-point)

                n
            n         Execution   time ratio   i
                i 1
CINT2006 for Opteron X4 2356
Name         Description                     IC 109    CPI    Tc (ns)   Exec time   Ref time   SPECratio
perl         Interpreted string processing    2,118    0.75     0.40         637      9,777           15.3
bzip2        Block-sorting compression        2,389    0.85     0.40         817      9,650           11.8
gcc          GNU C Compiler                   1,050    1.72     0.47          24      8,050           11.1
mcf          Combinatorial optimization        336    10.00     0.40       1,345      9,120            6.8
go           Go game (AI)                     1,658    1.09     0.40         721     10,490           14.6
hmmer        Search gene sequence             2,783    0.80     0.40         890      9,330           10.5
sjeng        Chess game (AI)                  2,176    0.96     0.48          37     12,100           14.5
libquantum   Quantum computer simulation      1,623    1.61     0.40       1,047     20,720           19.8
h264avc      Video compression                3,102    0.80     0.40         993     22,130           22.3
omnetpp      Discrete event simulation         587     2.94     0.40         690      6,250            9.1
astar        Games/path finding               1,082    1.79     0.40         773      7,020            9.1
xalancbmk    XML parsing                      1,058    2.70     0.40       1,143      6,900            6.0
Geometric mean                                                                                 11.7



              High cache miss rates
Amdahl’s Law
• Pitfall: Expecting the improvement of one aspect
  of a machine to increase overall performance by
  an amount proportional to the size of
  improvement
Amhdahl’s Law [contd…]
• A program runs in 100 seconds on a machine
• Multiply operations responsible for 80 seconds of this time.
• How much do I have to improve the speed of multiplication if I want
  my program to run 5 times faster ?

• Execution Time After improvement =
  (exec time affected by improvement/amount of improvement) + exec
  time unaffected
   exec time after improvement = (80 seconds / n) + (100 – 80 seconds)

   We want performance to be 5 times faster =>
   20 seconds = 80/n seconds + 20 seconds

   0 = 80 / n !!!!
Amdahl’s Law [contd…]
• Opportunity for improvement is affected by
  how much time the event consumes
• Make the common case fast
• Very high speedup requires making nearly
  every case fast
• Focus on overall performance, not one
  aspect
Summary
• Computer Architecture = Instruction Set Architure + Machine
  Organization
• All computers consist of five components
   – Processor: (1) datapath and (2) control
   – (3) Memory
   – (4) Input devices and (5) Output devices
• Not all “memory” are created equally
   – Cache: fast (expensive) memory are placed closer to the
      processor
   – Main memory: less expensive memory--we can have more
• Interfaces are where the problems are - between functional units
  and between the computer and the outside world
• Need to design against constraints of performance, power, area and
  cost
Summary
• Performance “eye of the beholder”
Seconds/program =


(Instructions/Pgm)x(Clk Cycles/Instructions)x(Seconds/Clk cycles)


• Amdahl’s Law “Make the Common Case
  Fast”
Homework
• Chapter 1
• 1.3, 1.4, 1.10, 1.15, 1.16 (first 4 parts of
  each question)
• Due Next Tuesday

Weitere ähnliche Inhalte

Mehr von ececourse

Machine Problem 2
Machine Problem 2Machine Problem 2
Machine Problem 2ececourse
 
Machine Problem 1
Machine Problem 1Machine Problem 1
Machine Problem 1ececourse
 
Chapter 2 Hw
Chapter 2 HwChapter 2 Hw
Chapter 2 Hwececourse
 
Chapter 2 Part2 C
Chapter 2 Part2 CChapter 2 Part2 C
Chapter 2 Part2 Cececourse
 
C:\Fakepath\Chapter 2 Part2 B
C:\Fakepath\Chapter 2 Part2 BC:\Fakepath\Chapter 2 Part2 B
C:\Fakepath\Chapter 2 Part2 Bececourse
 
Chapter 2 Part2 A
Chapter 2 Part2 AChapter 2 Part2 A
Chapter 2 Part2 Aececourse
 

Mehr von ececourse (10)

Chapter 5 a
Chapter 5 aChapter 5 a
Chapter 5 a
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Auxiliary
AuxiliaryAuxiliary
Auxiliary
 
Mem Tb
Mem TbMem Tb
Mem Tb
 
Machine Problem 2
Machine Problem 2Machine Problem 2
Machine Problem 2
 
Machine Problem 1
Machine Problem 1Machine Problem 1
Machine Problem 1
 
Chapter 2 Hw
Chapter 2 HwChapter 2 Hw
Chapter 2 Hw
 
Chapter 2 Part2 C
Chapter 2 Part2 CChapter 2 Part2 C
Chapter 2 Part2 C
 
C:\Fakepath\Chapter 2 Part2 B
C:\Fakepath\Chapter 2 Part2 BC:\Fakepath\Chapter 2 Part2 B
C:\Fakepath\Chapter 2 Part2 B
 
Chapter 2 Part2 A
Chapter 2 Part2 AChapter 2 Part2 A
Chapter 2 Part2 A
 

Kürzlich hochgeladen

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Kürzlich hochgeladen (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Chapter1

  • 1. ECE 321 Computer Architecture Chapter 1 Computer Abstractions and Technology
  • 2. Course Overview Input Input Multiplicand Multiplier 32 Multiplicand Register LoadMp Computer Arithmetic Arithmetic 32=>34 signEx 32 <<1 34 34 32=>34 1 0 signEx 34x2 MUX Multi x2/x1 34 34 34-bit ALU Sub/Add Control Logic 34 [0]" 32 2 32 ShiftAll "LO ENC[2] LO[1] Encoder 2 2 2 bits Booth HI register LO register Extra ENC[1] Prev (16x2 bits) (16x2 bits) ENC[0] 2 LoadLO ClearHI LoadHI LO[1:0] 32 32 Result[HI] Result[LO] Single/multicycle Datapaths Datapaths
  • 3. Course Overview [contd…] IFetchDcd Exec Mem WB IFetchDcd Exec Mem WB IFetchDcd Exec Mem WB Performance IFetchDcd Exec Mem WB Pipelining Memory Memory Systems
  • 4. What You Will Learn • How programs are translated into the machine language – And how the hardware executes them • The hardware/software interface • What determines program performance – And how it can be improved • How hardware designers improve performance • What is parallel processing
  • 5. What’s In It For Me ? • In-depth understanding of the inner-workings of modern computers, their evolution, and trade- offs present at the hardware/software boundary. – Insight into fast/slow operations that are easy/hard to implementation hardware • Experience with the design process in the context of a large complex (hardware) design. – Functional Spec --> Control & Datapath --> Physical implementation – Modern CAD tools
  • 6. Computer Architecture - Definition • Computer Architecture = ISA + MO • Instruction Set Architecture – What the executable can “see” as underlying hardware – Logical View • Machine Organization – How the hardware implements ISA ? – Physical View
  • 7. Computer Architecture – Changing Definition • 1950s to 1960s: Computer Architecture Course: –Computer Arithmetic • 1970s to mid 1980s: Computer Architecture Course: –Instruction Set Design, especially ISA appropriate for compilers • 1990s: Computer Architecture Course: Design of CPU, memory system, I/O system, Multiprocessors, Networks • 2000s: Computer Architecture Course: –Non Von-Neumann architectures, Reconfiguration • DNA Computing, Quantum Computing ????
  • 8. Some Examples … ° Digital Alpha (v1, v3) 1992-97 ° HP PA-RISC (v1.1, v2.0) 1986-96 ° Sun SPARC (v8, v9) 1987-95 ° SGI MIPS (MIPS I, II, III, IV, V) 1986-96 ° IA-16/32 (8086,286,386, 486, 1978-1999 Pentium, MMX, SSE, …) ° IA-64 (Itanium) 1996-now ° AMD64/EMT64 2002-now ° IBM POWER (PowerPC,…) 1990-now ° Many dead processor architectures live on in microcontrollers
  • 9. Generations of Computer • Vacuum tube - 1946-1957 • Transistor - 1958-1964 • Small scale integration - 1965 on – Up to 100 devices on a chip • Medium scale integration - to 1971 – 100-3,000 devices on a chip • Large scale integration - 1971-1977 – 3,000 - 100,000 devices on a chip • Very large scale integration - 1978 to date – 100,000 - 100,000,000 devices on a chip • Ultra large scale integration – Over 100,000,000 devices on a chip
  • 10. The MIPS R3000 ISA (Summary) • Instruction Categories – Load/Store R0 - R31 – Computational – Jump and Branch – Floating Point PC • coprocessor HI – Memory Management LO – Special 3 Instruction Formats: all 32 bits wide OP rs rt rd sa funct OP rs rt immediate OP jump target
  • 11. “What” is Computer Architecture ? Application Operating System Compiler Firmware Instruction Set Architecture ECE 321 Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation
  • 12. Impact of Changing ISA • Early 1990’s Apple switched instruction set architecture of the Macintosh – From Motorola 68000-based machines – To PowerPC architecture • Intel 80x86 Family: many implementations of same architecture – program written in 1978 for 8086 can be run on latest Pentium chip
  • 13. Factors Affecting ISA ??? Technology Programming Languages Applications Computer Cleverness Architecture Operating Systems History
  • 14. ISA: Critical Interface software instruction set hardware Examples: 80x86 50,000,000 vs. MIPS 5500,000 ???
  • 15. The Big Picture Processor Input Control Memory Datapath Output Since 1946 all computers have had 5 components!!!
  • 16. Example Organization • TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20 MBus Module SuperSPARC Floating-point Unit L2 CC DRAM Integer Unit $ MBus Controller Inst Ref Data L64852 MBus control M-S Adapter STDIO Cache MMU Cache SBus serial Store SCSI kbd SBus mouse Buffer DMA Ethernet audio RTC Bus Interface SBus Cards Floppy
  • 17. Moore’s Law • Increased density of components on chip • Gordon Moore - cofounder of Intel • Number of transistors on a chip will double 18-24 months • Since 1970’s development has slowed a little – Number of transistors doubles every 24 months • Cost of a chip has remained almost unchanged • Higher packing density means shorter electrical paths, giving higher performance • Smaller size gives increased flexibility • Reduced power and cooling requirements • Fewer interconnections increases reliability
  • 18. Technology Trends • Processor – logic capacity: about 30% per year – clock rate: about 20% per year • Memory – DRAM capacity: about 60% per year (4x every 3 years) – Memory speed: about 10% per year – Cost per bit: improves about 25% per year • Disk – capacity: about 60% per year – Total use of data: 100% per 9 months! • Network Bandwidth – Bandwidth increasing more than 100% per year!
  • 19. Technology Trends Microprocessor Logic Density DRAM chip capacity 100000000 DRAM 10000000 Year Size uP -Nam e R10000 Pentium 1980 64 Kb R4400 i80486 1983 256 Kb 1000000 Transistors 1986 1 Mb i80386 i80286 1989 4 Mb 100000 R3010 1992 16 Mb i8086 SU MIPS i80x86 1996 64 Mb 10000 M68K MIPS 1999 256 Mb Alpha i4004 2002 1 Gb 1000 1965 1970 1975 1980 1985 1990 1995 2000 2005 ° In ~1985 the single-chip processor (32-bit) and the single-board computer emerged ° In ~2002 started having multiple processor cores on a chip (IBM POWER4)
  • 20. Technology Trends Smaller feature sizes – higher speed, density
  • 21. Technology Trends Number of transistors doubles every 18 months (amended to 24 months)
  • 22. Levels of Representation temp = v[k]; High Level Language v[k] = v[k+1]; Program v[k+1] = temp; Compiler • lw $15, 0($2) Assembly Language Program • lw $16, 4($2) • sw $16, 0($2) Assembler • sw $15, 4($2) 0000 1001 1100 0110 1010 1111 0101 1000 Machine Language 1010 1111 0101 1000 0000 1001 1100 0110 Program 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Interpretation Control Signal ALUOP[0:3] <= InstReg[9:11] & MASK Specification
  • 23. Execution Cycle Instruction Obtain instruction from program storage Fetch Instruction Determine required actions and instruction size Decode Operand Locate and obtain operand data Fetch Execute Compute result value or status Result Deposit results in storage for later use Store Next Determine successor instruction Instruction
  • 24. The Role of Performance
  • 25. Understanding Performance • Algorithm – Determines number of operations executed • Programming language, compiler, architecture – Determine number of machine instructions executed per operation • Processor and memory system – Determine how fast instructions are executed • I/O system (including OS) – Determines how fast I/O operations are executed
  • 27. Performance Metrics • Response Time – Delay between start and end time of a task • Throughput – Numbers of tasks per given time • New: Power/Energy – Energy per task, power
  • 28. CPU Clocking • Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state  Clock period: duration of a clock cycle  e.g., 250ps = 0.25ns = 250 10–12s  Clock frequency (rate): cycles per second  e.g., 4.0GHz = 4000MHz = 4.0 109Hz
  • 29. Examples (Throughput/Performance) • Replace the processor with a faster version? – 3.8 GHz instead of 3.2 GHz • Add an additional processor to a system? – Core Duo instead of P4
  • 30. Measuring Performance • Wall clock time vs. Total execution time • CPU Time – User Time – System Time Try using time command on UNIX system
  • 31. Relating the Metrics • Performance = 1/Execution Time • CPU Execution Time = CPU clock cycles for program x Clock cycle time • CPU clock cycles = Instructions for a program x Average clock cycles per Instruction
  • 32. Performance Summary The BIG Picture Instructio ns Clock cycles Seconds CPU Time Program Instructio n Clock cycle • Performance depends on – Algorithm: affects IC, possibly CPI – Programming language: affects IC, CPI – Compiler: affects IC, CPI – Instruction set architecture: affects IC, CPI, Tc
  • 33. SPEC CPU Benchmark • Programs used to measure performance – Supposedly typical of actual workload • Standard Performance Evaluation Corp (SPEC) – Develops benchmarks for CPU, I/O, Web, … • SPEC CPU2006 – Elapsed time to execute a selection of programs • Negligible I/O, so focuses on CPU performance – Normalize relative to reference machine – Summarize as geometric mean of performance ratios • CINT2006 (integer) and CFP2006 (floating-point) n n Execution time ratio i i 1
  • 34. CINT2006 for Opteron X4 2356 Name Description IC 109 CPI Tc (ns) Exec time Ref time SPECratio perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3 bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8 gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1 mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8 go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6 hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5 sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5 libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8 h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3 omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1 astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1 xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0 Geometric mean 11.7 High cache miss rates
  • 35. Amdahl’s Law • Pitfall: Expecting the improvement of one aspect of a machine to increase overall performance by an amount proportional to the size of improvement
  • 36. Amhdahl’s Law [contd…] • A program runs in 100 seconds on a machine • Multiply operations responsible for 80 seconds of this time. • How much do I have to improve the speed of multiplication if I want my program to run 5 times faster ? • Execution Time After improvement = (exec time affected by improvement/amount of improvement) + exec time unaffected exec time after improvement = (80 seconds / n) + (100 – 80 seconds) We want performance to be 5 times faster => 20 seconds = 80/n seconds + 20 seconds 0 = 80 / n !!!!
  • 37. Amdahl’s Law [contd…] • Opportunity for improvement is affected by how much time the event consumes • Make the common case fast • Very high speedup requires making nearly every case fast • Focus on overall performance, not one aspect
  • 38. Summary • Computer Architecture = Instruction Set Architure + Machine Organization • All computers consist of five components – Processor: (1) datapath and (2) control – (3) Memory – (4) Input devices and (5) Output devices • Not all “memory” are created equally – Cache: fast (expensive) memory are placed closer to the processor – Main memory: less expensive memory--we can have more • Interfaces are where the problems are - between functional units and between the computer and the outside world • Need to design against constraints of performance, power, area and cost
  • 39. Summary • Performance “eye of the beholder” Seconds/program = (Instructions/Pgm)x(Clk Cycles/Instructions)x(Seconds/Clk cycles) • Amdahl’s Law “Make the Common Case Fast”
  • 40. Homework • Chapter 1 • 1.3, 1.4, 1.10, 1.15, 1.16 (first 4 parts of each question) • Due Next Tuesday