SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
CSL718 : Architecture of
High Performance Systems
         Introduction
      12th January, 2009
Some basic questions
                          – Rate of computation
• What is high
  performance?            – Time to compute
                          – Weather prediction,
• Who needs high
                            complex design, scientific
  performance systems?
                            computation etc.
                          – Every one needs it.
• How do you achieve      – Technology
  high performance?       – Circuit / logic design
                          – Architecture
                          – Theoretical models
• How to analyse or
                          – Simulation
  evaluate performance?
                          – Experimentation
                                               slide 2
Anshul Kumar, CSE IITD
Execution Time and Clock Period
         Instruction execution time = Tinst = CPI* Δt
                          Δt



             IF       D    RF EX/AG M          WB

     Program exec time = Tprog = N * Tinst
                        = N * CPI * Δt
          N:              Number of instructions
          CPI :           Cycles per instruction(Av)
          Δt :            Clock cycle time
                                                        slide 3
 Anshul Kumar, CSE IITD
What influences clock period?

Tprog = N * CPI * Δt
  Technology - Δt ⇓
                     ⇓
  Software -     N
  Architecture - N * CPI * Δt           ⇓
       Instruction set architecture (ISA)
                           N vs CPI * Δt
              trade-off
       Micro architecture (ÎźA)
                           CPI vs Δt
              trade-off
                                            slide 4
Anshul Kumar, CSE IITD
Relative performance per unit cost
Relative performance per unit cost

Year           Technology           Perf/cost
1951           Vacuum tube                  1
1965           Transistor                 35
1975           Integrated circuit         900
1995           VLSI                 2,400,000




                                                slide 5
Anshul Kumar, CSE IITD
Increase in workstation performance
           1200
                                                                                  DEC Alpha 21264/600
           1100

           1000

                900

                800
  Performance




                700

                600

                500
                                                                                                DEC Alpha 5/500
                400

                300
                                                                                       DEC Alpha 5/300
                200
                                                                              DEC Alpha 4/266
                      SUN-4/ MIPS           IBM
                                      MIPS                             IBM POWER 100
                100
                      260             M2000 RS6000               DEC AXP/500
                             M/120
                                                        HP 9000/750
                  0
                  1987     1988      1989   1990     1991   1992    1993   1994    1995     1996    1997
                                                            Year

                                                                                                                  slide 6
         Anshul Kumar, CSE IITD
Growth in DRAM Capacity

                100,000

                                                                                                               64M
                                                                                              16M

                 10,000
                                                                                  4M
Kbit capacity




                                                                1M
                  1000
                                                  256K

                   100
                                       64K
                          16K
                     10
                                                                                                              1996
                      1976      1978     1980   1982     1984        1986   1988       1990   1992   1994
                                                           Year of introduction

                                                                                                            slide 7
                Anshul Kumar, CSE IITD
CPU-Memory Performance Gap
CPU-Memory
• Semiconductor
   –   Registers                      CPU speed
                         Random Access
   –   SRAM
   –   DRAM
   –   FLASH
• Magnetic                            Slow
   – FDD
   – HDD
• Optical                Random + sequential
   – CD                                  Very slow
   – DVD

                                                 slide 8
Anshul Kumar, CSE IITD
Memory Hierarchy Principle
                                                                              hit
                CPU
Speed                           Size     Cost / bit
                                                          access
                                                                            miss
Fastest       Memory          Smallest     Highest



                                                      Temporal Locality
              Memory
                                                         – References repeated in
                                                           time
                                                      Spatial Locality
                                                         – References repeated in
Slowest       Memory          Biggest      Lowest
                                                           space
                                                         – Special case: Sequential
                                                           Locality
                                                                         slide 9
     Anshul Kumar, CSE IITD
Parallelism : Flynn’s Classification
  Parallelism : Flynn’s Classification


                 Architecture Categories




SISD                 SIMD          MISD    MIMD


                                            slide 10
Anshul Kumar, CSE IITD
SISD



IS                  IS          DS
                                     M
       C                   P




                                     slide 11
Anshul Kumar, CSE IITD
SIMD

                                DS
                           P

IS
                                     M
       C

                                DS
                           P



                                     slide 12
Anshul Kumar, CSE IITD
MISD

IS                  IS          DS
       C                   P


                                     M

IS                  IS          DS
       C                   P




                                     slide 13
Anshul Kumar, CSE IITD
MIMD

IS                  IS          DS
       C                   P


                                     M

IS                  IS          DS
       C                   P




                                     slide 14
Anshul Kumar, CSE IITD
Feng’s Classification
                    Feng’s Classification

     16K         •MPP

                                    •PEPE
       256       •STARAN
bit slice
                                               •IlliacIV
length 64

       16               •C.mmP

                         •PDP11      •IBM370    •CRAY-1
        1
             1          16          32         64
                               word length

                                                slide 15
   Anshul Kumar, CSE IITD
Händler’s Classification
            Händler’s Classification

  < K x K’ , D x D’ , W x W’ >
   control    data     word
        dash → degree of pipelining
TI - ASC       <1, 4, 64 x 8>
CDC 6600       <1, 1 x 10, 60> x <10, 1, 12> (I/O)
C.mmP          <16,1,16> + <1x16,1,16> + <1,16,16>
PEPE           <1 x 3, 288, 32>
Cray-1         <1, 12 x 8, 64 x (1 ~ 14)>

                                                slide 16
Anshul Kumar, CSE IITD
Modern Classification
               Modern Classification


                              Parallel
                           architectures




                                           Function-parallel
Data-parallel
                                             architectures
architectures


                                                      slide 17
  Anshul Kumar, CSE IITD
Data Parallel Architectures
• SIMD Processors
   – Multiple processing elements driven by a single
     instruction stream
• Vector Processors
   – Uni-processors with vector instructions
• Associative Processors
   – SIMD like processors with associative memory
• Systolic Arrays
   – Application specific VLSI structures


                                                slide 18
Anshul Kumar, CSE IITD
Function Parallel Architectures
         Function Parallel Architectures

                            Function-parallel
                             architectures


         Instr level          Thread level      Process level
        Parallel Arch         Parallel Arch     Parallel Arch
                                                 (MIMDs)
           (ILPs)



                                                        Shared
 Pipelined VLIWs Superscalar              Distributed
                                                        Memory
processors       processors                Memory
                                                         MIMD
                                             MIMD
                                                        slide 19
   Anshul Kumar, CSE IITD
Pipelining

  Simple multicycle design :
  •resource sharing across cycles
  • all instructions may not take same cycles



            IF       D     RF EX/AG M   WB


  • faster throughput with pipelining
                                             slide 20
Anshul Kumar, CSE IITD
Limits of Pipelining
• Structural hazards
   – Resource conflicts - two instruction require
     same resource in the same cycle
• Data hazards
   – Data dependencies - one instruction needs data
     which is yet to be produced by another
     instruction
• Control Hazards
   – Decision about next instruction needs more
     cycles
                                                    slide 21
Anshul Kumar, CSE IITD
ILP in VLIW processors
  Cache/              Fetch
  memory               Unit        Single multi-operation instruction




                                     FU                FU
                              FU




                                      Register file
multi-operation instruction



                                                      slide 22
 Anshul Kumar, CSE IITD
ILP in Superscalar processors
                                   Decode
      Cache/           Fetch
                                  and issue
     memory             Unit
                                     unit           Multiple instruction




                                                   FU              FU
                                              FU



          Sequential stream of instructions


          Instruction/control
                                                   Register file
               Data

FU             Funtional Unit


                                                                   slide 23
     Anshul Kumar, CSE IITD
Superscalar and VLIW processors
Superscalar and VLIW processors




                            slide 24
Anshul Kumar, CSE IITD
Issues in ILP Architectures
                   FU        FU             FU



                            Register file

•Scalability with increase in number of register ports
•ILP detection – special compilers / special hardware
•Code compatibility
•Code density, Instruction encoding
•Maintaining consistency
                                                         slide 25
   Anshul Kumar, CSE IITD
ILP and Multithreading
                              ILP           Coarse MT   Fine MT   SMT
Hennessy and Patterson




                                                                   slide 26
                         Anshul Kumar, CSE IITD
Why Process level Parallel Architectures?
Why Process level Parallel Architectures?

                          Function-parallel
Data-parallel
                           architectures
architectures

        Instruction              Thread           Process
         level PAs             level PAs         level PAs
                                                 (MIMDs)

   Built using
 general purpose
                                                         Shared
                                           Distributed
   processors
                                                         Memory
                                            Memory
                                                          MIMD
                                              MIMD
                                                         slide 27
 Anshul Kumar, CSE IITD
Issues from user’s perspective
             user’s
• Specification / Program design
   – explicit parallelism or
   – implicit parallelism + parallelizing compiler
• Partitioning / mapping to processors
• Scheduling / mapping to time instants
   – static or dynamic
• Communication and Synchronization


                                                     slide 28
Anshul Kumar, CSE IITD
Parallel programming models


        Concurrent          Functional or   Vector/array
        control flow       logic program     operations

          Concurrent
tasks/processes/threads/objects

                                    Relationship between
    With shared variables
                                    programming model
    or message passing
                                    and architecture ?
                                                    slide 29
  Anshul Kumar, CSE IITD
Issues from architect’s perspective
Issues from architect’s perspective
• Coherence problem in shared memory with
  caches
• Efficient interconnection networks




                                      slide 30
Anshul Kumar, CSE IITD
Shared Memory Multiprocessor
     Shared Memory Multiprocessor
M        M       M                             M       M       M
                          M                                               M

P        P        P          P                 P       P       P              P

Interconnection Network                         Interconnection Network

     M       M        M                            M       M       M


                      Global Interconnection Network

                                 M   M     M
                                                                   slide 31
    Anshul Kumar, CSE IITD
Cache Coherence Problem
Multiple copies of data may exist
⇒ Problem of cache coherence
Options for coherence protocols
• What action is taken?
   – Invalidate or Update
• Which processors/caches communicate?
   – Snoopy (broadcast) or directory based
• Status of each block?
                                             slide 32
Anshul Kumar, CSE IITD
Interconnection Networks
• Architectural Variations:
   – Topology
   – Direct or Indirect (through switches)
   – Static (fixed connections) or Dynamic (connections
     established as required)
   – Routing type store and forward/worm hole)
• Efficiency:
   – Delay
   – Bandwidth
   – Cost

                                                          slide 33
Anshul Kumar, CSE IITD
Quest for Performance
1946 ENIAC ($0.5 M, 18K VTs, 150 kW)
add/sub         5000 per sec
mult             385 per sec
div               40 per sec
sqrt               3 per sec


1962 Atlas (Pipelined, Int + FPU)
          200K FLOPs
1962 Burroughs D825 (4 CPUs 16 Mem)
1964 CDC 6600 (first supercomputer)
          multiple FUs, dynamic scheduling
1972 ILLIAC-IV (64 PEs, 4 MFLOPs each)
                                             slide 34
  Anshul Kumar, CSE IITD
Fastest Supercomputer
                  (ref www.top500.org)
                  (ref www.top500.org)

• IBM’s Blue Gene/L at Lawrence Livermore Lab
  topped in June 2006 with 280.6 teraflops
• Japan’s Earth simulator introduced in 2002 was
  fastest with 35.8 teraflops till Blue Gene took over in
  2004.
• Japan’s proposal (2005) to build a supercomputer 73
  times faster than the current best. Target: 10
  petaflops, budget $800 - $900 million, date 2011.
• Tata sons’ EKA entered 4th spot in 2007 with 132.8
  teraflops
• Energy efficiency (max 488 mflopr/watt) also listed
  in June 2008
                                                    slide 35
  Anshul Kumar, CSE IITD
June 2008 list
                              June 2008 list
                    Site                                Computer
Rank

                                       Roadrunner - BladeCenter QS22/LS21 Cluster,
 1     DOE/NNSA/LANL United States     PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz ,
                                       Voltaire Infiniband, IBM (1026 teraflops)
 2     DOE/NNSA/LLNL United States     BlueGene/L - eServer Blue Gene Solution, IBM
       Argonne National Laboratory
 3                                     Blue Gene/P Solution, IBM
       United States
       Texas Advanced Computing
                                       Ranger - SunBlade x6420, Opteron Quad 2Ghz,
 4     Center/Univ. of Texas United
                                       Infiniband, Sun Microsystems
       States
       DOE/Oak Ridge National
 5                                     Jaguar - Cray XT4 QuadCore 2.1 GHz, Cray Inc.
       Laboratory United States
 6     Forschungszentrum Juelich (FZJ) JUGENE - Blue Gene/P Solution, IBM
       New Mexico Computing
                                       Encanto - SGI Altix ICE 8200, Xeon quad core
 7     Applications Center (NMCAC)
                                       3.0 GHz, SGI
       United States
       Computational Research          EKA - Cluster Platform 3000 BL460c, Xeon 53xx
 8
       Laboratories, TATA SONS India   3GHz, Infiniband, HP (133 teraflops)
Blue Gene Supercomputer
• 32 x 32 x 64 3D torus (65,536 nodes)
• Global reduction tree - max/sum in a
  few Îźs
• Fast synch across entire machine within
  a few Îźs
• 1,024 gbps links to a global parallel file
  system




                                               slide 37
Anshul Kumar, CSE IITD
Blue Gene Supercomputer contd.
Blue Gene Supercomputer contd.




                            slide 38
Anshul Kumar, CSE IITD
Embedded vs GP Computing
•   Fixed functionality
•   Part of a larger system
•   Interact with environment
•   Real-time requirements
•   Power constraints
•   Environmental contraints

• Performance can not be increased simply by
  increasing clock frequency

                                               slide 39
Anshul Kumar, CSE IITD
Cradle CT 3616 Architecture




                          slide 40
Anshul Kumar, CSE IITD
IBM Cell
 IBM Cell
Architecture
Architecture
•   Clock speed: > 4 GHz
•   Peak performance (single
    precision): > 256 GFlops
•   Peak performance
    (double precision): >26
    GFlops
•   SPU registers 128 x 128b
•   Local storage size per
    SPU: 256KB
•   Area: 221 mm²
•   Technology 90nm SOI
•   Total number of
    transistors: 234M



                               slide 41
    Anshul Kumar, CSE IITD
Books
1.    D.A. Patterson, J.L. Hennessy, quot;Computer Architecture : A
      Quantitative Approachquot;, Morgan Kaufmann Publishers, 2006.
2.    D. Sima, T. Fountain, P. Kacsuk, quot;Advanced Computer
      Architectures : A Design Space Approachquot;, Addison Wesley,
      1997.
3.    M.J. Flynn, quot;Computer Architecture : Pipelined and Parallel
      Processor Designquot;, Narosa Publishing House/ Jones and
      Bartlett, 1996.
4.    K. Hwang, quot;Advanced Computer Architecture : Parallelism,
      Scalability, Programmabilityquot;, McGraw Hill, 1993.
5.    H.G. Cragon, quot;Memory Systems and Pipelined Processorsquot;,
      Narosa Publishing House/ Jones and Bartlett, 1998.
6.    D.E. Culler, J.P Singh and Anoop Gupta, quot;Parallel Computer
      Architecture, A Hardware/Software Approachquot;, Harcourt Asia
      / Morgan Kaufmann Publishers, 2000.
                                                          slide 42
     Anshul Kumar, CSE IITD

Weitere ähnliche Inhalte

Ähnlich wie Lec Jan12 2009

IBM Deep Computing for Education Indusrty
IBM Deep Computing for Education IndusrtyIBM Deep Computing for Education Indusrty
IBM Deep Computing for Education IndusrtyJyothi Satyanathan
 
Thoughts Beyond High Performance Computing: A Personal Assessment
Thoughts Beyond High Performance Computing: A Personal AssessmentThoughts Beyond High Performance Computing: A Personal Assessment
Thoughts Beyond High Performance Computing: A Personal AssessmentMarek Michalewicz
 
L15 micro evlutn
L15 micro evlutnL15 micro evlutn
L15 micro evlutnnithilgeorge
 
System-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design ChallengesSystem-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design Challengespboulet
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
Technology overview
Technology overviewTechnology overview
Technology overviewvirtuehm
 
Lec Jan29 2009
Lec Jan29 2009Lec Jan29 2009
Lec Jan29 2009Ravi Soni
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de InformĂĄtica UCM
 
Compression for DB2 for z/OS
Compression for DB2 for z/OS Compression for DB2 for z/OS
Compression for DB2 for z/OS Willie Favero
 
Isc group llc gral presentation final revised 050212
Isc group llc gral   presentation final revised  050212Isc group llc gral   presentation final revised  050212
Isc group llc gral presentation final revised 050212Joelchait
 
Soc lect1
Soc lect1Soc lect1
Soc lect1ecemaster
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best PracticesJeff Larkin
 
Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007Wing Venture Capital
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...npinto
 
My ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture KeynoteMy ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture KeynoteDileep Bhandarkar
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overviewNabil Chouba
 
Trinity press deck 10 2 2012
Trinity press deck 10 2 2012Trinity press deck 10 2 2012
Trinity press deck 10 2 2012AMD
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveJason Shih
 

Ähnlich wie Lec Jan12 2009 (20)

IBM Deep Computing for Education Indusrty
IBM Deep Computing for Education IndusrtyIBM Deep Computing for Education Indusrty
IBM Deep Computing for Education Indusrty
 
Thoughts Beyond High Performance Computing: A Personal Assessment
Thoughts Beyond High Performance Computing: A Personal AssessmentThoughts Beyond High Performance Computing: A Personal Assessment
Thoughts Beyond High Performance Computing: A Personal Assessment
 
L15 micro evlutn
L15 micro evlutnL15 micro evlutn
L15 micro evlutn
 
System-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design ChallengesSystem-on-Chip Design, Embedded System Design Challenges
System-on-Chip Design, Embedded System Design Challenges
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Asic
AsicAsic
Asic
 
Technology overview
Technology overviewTechnology overview
Technology overview
 
Lec Jan29 2009
Lec Jan29 2009Lec Jan29 2009
Lec Jan29 2009
 
Seven deadly
Seven deadly Seven deadly
Seven deadly
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Compression for DB2 for z/OS
Compression for DB2 for z/OS Compression for DB2 for z/OS
Compression for DB2 for z/OS
 
Isc group llc gral presentation final revised 050212
Isc group llc gral   presentation final revised  050212Isc group llc gral   presentation final revised  050212
Isc group llc gral presentation final revised 050212
 
Soc lect1
Soc lect1Soc lect1
Soc lect1
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best Practices
 
Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007
 
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architect...
 
My ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture KeynoteMy ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
 
Semiconductor overview
Semiconductor overviewSemiconductor overview
Semiconductor overview
 
Trinity press deck 10 2 2012
Trinity press deck 10 2 2012Trinity press deck 10 2 2012
Trinity press deck 10 2 2012
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
 

Mehr von Ravi Soni

Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you Ravi Soni
 
Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209Ravi Soni
 
Lec 6 Structure (Types) 196
Lec 6  Structure (Types) 196Lec 6  Structure (Types) 196
Lec 6 Structure (Types) 196Ravi Soni
 
Lec 3 Organizational Effectiveness 184
Lec 3  Organizational Effectiveness 184Lec 3  Organizational Effectiveness 184
Lec 3 Organizational Effectiveness 184Ravi Soni
 
Lec 2 Multidisciplinary 183
Lec 2  Multidisciplinary 183Lec 2  Multidisciplinary 183
Lec 2 Multidisciplinary 183Ravi Soni
 
Lec 1 182
Lec 1 182Lec 1 182
Lec 1 182Ravi Soni
 
Lec 5 Structure (Basics) 186
Lec 5  Structure (Basics) 186Lec 5  Structure (Basics) 186
Lec 5 Structure (Basics) 186Ravi Soni
 
Lec Jan15 2009
Lec Jan15 2009Lec Jan15 2009
Lec Jan15 2009Ravi Soni
 
Lec Feb05 2009
Lec Feb05 2009Lec Feb05 2009
Lec Feb05 2009Ravi Soni
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln ViewRavi Soni
 
Lec Feb09 2009
Lec Feb09 2009Lec Feb09 2009
Lec Feb09 2009Ravi Soni
 
Lec Jan19 2009
Lec Jan19 2009Lec Jan19 2009
Lec Jan19 2009Ravi Soni
 
Lec Feb02 2009
Lec Feb02 2009Lec Feb02 2009
Lec Feb02 2009Ravi Soni
 

Mehr von Ravi Soni (13)

Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
Google Never Dies Meetup ( Obbserv + SEMrush ) the vision of digital you
 
Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209Stakeholder Theory, Ethics 209
Stakeholder Theory, Ethics 209
 
Lec 6 Structure (Types) 196
Lec 6  Structure (Types) 196Lec 6  Structure (Types) 196
Lec 6 Structure (Types) 196
 
Lec 3 Organizational Effectiveness 184
Lec 3  Organizational Effectiveness 184Lec 3  Organizational Effectiveness 184
Lec 3 Organizational Effectiveness 184
 
Lec 2 Multidisciplinary 183
Lec 2  Multidisciplinary 183Lec 2  Multidisciplinary 183
Lec 2 Multidisciplinary 183
 
Lec 1 182
Lec 1 182Lec 1 182
Lec 1 182
 
Lec 5 Structure (Basics) 186
Lec 5  Structure (Basics) 186Lec 5  Structure (Basics) 186
Lec 5 Structure (Basics) 186
 
Lec Jan15 2009
Lec Jan15 2009Lec Jan15 2009
Lec Jan15 2009
 
Lec Feb05 2009
Lec Feb05 2009Lec Feb05 2009
Lec Feb05 2009
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln View
 
Lec Feb09 2009
Lec Feb09 2009Lec Feb09 2009
Lec Feb09 2009
 
Lec Jan19 2009
Lec Jan19 2009Lec Jan19 2009
Lec Jan19 2009
 
Lec Feb02 2009
Lec Feb02 2009Lec Feb02 2009
Lec Feb02 2009
 

KĂźrzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

KĂźrzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Lec Jan12 2009

  • 1. CSL718 : Architecture of High Performance Systems Introduction 12th January, 2009
  • 2. Some basic questions – Rate of computation • What is high performance? – Time to compute – Weather prediction, • Who needs high complex design, scientific performance systems? computation etc. – Every one needs it. • How do you achieve – Technology high performance? – Circuit / logic design – Architecture – Theoretical models • How to analyse or – Simulation evaluate performance? – Experimentation slide 2 Anshul Kumar, CSE IITD
  • 3. Execution Time and Clock Period Instruction execution time = Tinst = CPI* Δt Δt IF D RF EX/AG M WB Program exec time = Tprog = N * Tinst = N * CPI * Δt N: Number of instructions CPI : Cycles per instruction(Av) Δt : Clock cycle time slide 3 Anshul Kumar, CSE IITD
  • 4. What influences clock period? Tprog = N * CPI * Δt Technology - Δt ⇓ ⇓ Software - N Architecture - N * CPI * Δt ⇓ Instruction set architecture (ISA) N vs CPI * Δt trade-off Micro architecture (ÎźA) CPI vs Δt trade-off slide 4 Anshul Kumar, CSE IITD
  • 5. Relative performance per unit cost Relative performance per unit cost Year Technology Perf/cost 1951 Vacuum tube 1 1965 Transistor 35 1975 Integrated circuit 900 1995 VLSI 2,400,000 slide 5 Anshul Kumar, CSE IITD
  • 6. Increase in workstation performance 1200 DEC Alpha 21264/600 1100 1000 900 800 Performance 700 600 500 DEC Alpha 5/500 400 300 DEC Alpha 5/300 200 DEC Alpha 4/266 SUN-4/ MIPS IBM MIPS IBM POWER 100 100 260 M2000 RS6000 DEC AXP/500 M/120 HP 9000/750 0 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 Year slide 6 Anshul Kumar, CSE IITD
  • 7. Growth in DRAM Capacity 100,000 64M 16M 10,000 4M Kbit capacity 1M 1000 256K 100 64K 16K 10 1996 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 Year of introduction slide 7 Anshul Kumar, CSE IITD
  • 8. CPU-Memory Performance Gap CPU-Memory • Semiconductor – Registers CPU speed Random Access – SRAM – DRAM – FLASH • Magnetic Slow – FDD – HDD • Optical Random + sequential – CD Very slow – DVD slide 8 Anshul Kumar, CSE IITD
  • 9. Memory Hierarchy Principle hit CPU Speed Size Cost / bit access miss Fastest Memory Smallest Highest Temporal Locality Memory – References repeated in time Spatial Locality – References repeated in Slowest Memory Biggest Lowest space – Special case: Sequential Locality slide 9 Anshul Kumar, CSE IITD
  • 10. Parallelism : Flynn’s Classification Parallelism : Flynn’s Classification Architecture Categories SISD SIMD MISD MIMD slide 10 Anshul Kumar, CSE IITD
  • 11. SISD IS IS DS M C P slide 11 Anshul Kumar, CSE IITD
  • 12. SIMD DS P IS M C DS P slide 12 Anshul Kumar, CSE IITD
  • 13. MISD IS IS DS C P M IS IS DS C P slide 13 Anshul Kumar, CSE IITD
  • 14. MIMD IS IS DS C P M IS IS DS C P slide 14 Anshul Kumar, CSE IITD
  • 15. Feng’s Classification Feng’s Classification 16K •MPP •PEPE 256 •STARAN bit slice •IlliacIV length 64 16 •C.mmP •PDP11 •IBM370 •CRAY-1 1 1 16 32 64 word length slide 15 Anshul Kumar, CSE IITD
  • 16. Händler’s Classification Händler’s Classification < K x K’ , D x D’ , W x W’ > control data word dash → degree of pipelining TI - ASC <1, 4, 64 x 8> CDC 6600 <1, 1 x 10, 60> x <10, 1, 12> (I/O) C.mmP <16,1,16> + <1x16,1,16> + <1,16,16> PEPE <1 x 3, 288, 32> Cray-1 <1, 12 x 8, 64 x (1 ~ 14)> slide 16 Anshul Kumar, CSE IITD
  • 17. Modern Classification Modern Classification Parallel architectures Function-parallel Data-parallel architectures architectures slide 17 Anshul Kumar, CSE IITD
  • 18. Data Parallel Architectures • SIMD Processors – Multiple processing elements driven by a single instruction stream • Vector Processors – Uni-processors with vector instructions • Associative Processors – SIMD like processors with associative memory • Systolic Arrays – Application specific VLSI structures slide 18 Anshul Kumar, CSE IITD
  • 19. Function Parallel Architectures Function Parallel Architectures Function-parallel architectures Instr level Thread level Process level Parallel Arch Parallel Arch Parallel Arch (MIMDs) (ILPs) Shared Pipelined VLIWs Superscalar Distributed Memory processors processors Memory MIMD MIMD slide 19 Anshul Kumar, CSE IITD
  • 20. Pipelining Simple multicycle design : •resource sharing across cycles • all instructions may not take same cycles IF D RF EX/AG M WB • faster throughput with pipelining slide 20 Anshul Kumar, CSE IITD
  • 21. Limits of Pipelining • Structural hazards – Resource conflicts - two instruction require same resource in the same cycle • Data hazards – Data dependencies - one instruction needs data which is yet to be produced by another instruction • Control Hazards – Decision about next instruction needs more cycles slide 21 Anshul Kumar, CSE IITD
  • 22. ILP in VLIW processors Cache/ Fetch memory Unit Single multi-operation instruction FU FU FU Register file multi-operation instruction slide 22 Anshul Kumar, CSE IITD
  • 23. ILP in Superscalar processors Decode Cache/ Fetch and issue memory Unit unit Multiple instruction FU FU FU Sequential stream of instructions Instruction/control Register file Data FU Funtional Unit slide 23 Anshul Kumar, CSE IITD
  • 24. Superscalar and VLIW processors Superscalar and VLIW processors slide 24 Anshul Kumar, CSE IITD
  • 25. Issues in ILP Architectures FU FU FU Register file •Scalability with increase in number of register ports •ILP detection – special compilers / special hardware •Code compatibility •Code density, Instruction encoding •Maintaining consistency slide 25 Anshul Kumar, CSE IITD
  • 26. ILP and Multithreading ILP Coarse MT Fine MT SMT Hennessy and Patterson slide 26 Anshul Kumar, CSE IITD
  • 27. Why Process level Parallel Architectures? Why Process level Parallel Architectures? Function-parallel Data-parallel architectures architectures Instruction Thread Process level PAs level PAs level PAs (MIMDs) Built using general purpose Shared Distributed processors Memory Memory MIMD MIMD slide 27 Anshul Kumar, CSE IITD
  • 28. Issues from user’s perspective user’s • Specification / Program design – explicit parallelism or – implicit parallelism + parallelizing compiler • Partitioning / mapping to processors • Scheduling / mapping to time instants – static or dynamic • Communication and Synchronization slide 28 Anshul Kumar, CSE IITD
  • 29. Parallel programming models Concurrent Functional or Vector/array control flow logic program operations Concurrent tasks/processes/threads/objects Relationship between With shared variables programming model or message passing and architecture ? slide 29 Anshul Kumar, CSE IITD
  • 30. Issues from architect’s perspective Issues from architect’s perspective • Coherence problem in shared memory with caches • Efficient interconnection networks slide 30 Anshul Kumar, CSE IITD
  • 31. Shared Memory Multiprocessor Shared Memory Multiprocessor M M M M M M M M P P P P P P P P Interconnection Network Interconnection Network M M M M M M Global Interconnection Network M M M slide 31 Anshul Kumar, CSE IITD
  • 32. Cache Coherence Problem Multiple copies of data may exist ⇒ Problem of cache coherence Options for coherence protocols • What action is taken? – Invalidate or Update • Which processors/caches communicate? – Snoopy (broadcast) or directory based • Status of each block? slide 32 Anshul Kumar, CSE IITD
  • 33. Interconnection Networks • Architectural Variations: – Topology – Direct or Indirect (through switches) – Static (fixed connections) or Dynamic (connections established as required) – Routing type store and forward/worm hole) • Efficiency: – Delay – Bandwidth – Cost slide 33 Anshul Kumar, CSE IITD
  • 34. Quest for Performance 1946 ENIAC ($0.5 M, 18K VTs, 150 kW) add/sub 5000 per sec mult 385 per sec div 40 per sec sqrt 3 per sec 1962 Atlas (Pipelined, Int + FPU) 200K FLOPs 1962 Burroughs D825 (4 CPUs 16 Mem) 1964 CDC 6600 (first supercomputer) multiple FUs, dynamic scheduling 1972 ILLIAC-IV (64 PEs, 4 MFLOPs each) slide 34 Anshul Kumar, CSE IITD
  • 35. Fastest Supercomputer (ref www.top500.org) (ref www.top500.org) • IBM’s Blue Gene/L at Lawrence Livermore Lab topped in June 2006 with 280.6 teraflops • Japan’s Earth simulator introduced in 2002 was fastest with 35.8 teraflops till Blue Gene took over in 2004. • Japan’s proposal (2005) to build a supercomputer 73 times faster than the current best. Target: 10 petaflops, budget $800 - $900 million, date 2011. • Tata sons’ EKA entered 4th spot in 2007 with 132.8 teraflops • Energy efficiency (max 488 mflopr/watt) also listed in June 2008 slide 35 Anshul Kumar, CSE IITD
  • 36. June 2008 list June 2008 list Site Computer Rank Roadrunner - BladeCenter QS22/LS21 Cluster, 1 DOE/NNSA/LANL United States PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz , Voltaire Infiniband, IBM (1026 teraflops) 2 DOE/NNSA/LLNL United States BlueGene/L - eServer Blue Gene Solution, IBM Argonne National Laboratory 3 Blue Gene/P Solution, IBM United States Texas Advanced Computing Ranger - SunBlade x6420, Opteron Quad 2Ghz, 4 Center/Univ. of Texas United Infiniband, Sun Microsystems States DOE/Oak Ridge National 5 Jaguar - Cray XT4 QuadCore 2.1 GHz, Cray Inc. Laboratory United States 6 Forschungszentrum Juelich (FZJ) JUGENE - Blue Gene/P Solution, IBM New Mexico Computing Encanto - SGI Altix ICE 8200, Xeon quad core 7 Applications Center (NMCAC) 3.0 GHz, SGI United States Computational Research EKA - Cluster Platform 3000 BL460c, Xeon 53xx 8 Laboratories, TATA SONS India 3GHz, Infiniband, HP (133 teraflops)
  • 37. Blue Gene Supercomputer • 32 x 32 x 64 3D torus (65,536 nodes) • Global reduction tree - max/sum in a few Îźs • Fast synch across entire machine within a few Îźs • 1,024 gbps links to a global parallel file system slide 37 Anshul Kumar, CSE IITD
  • 38. Blue Gene Supercomputer contd. Blue Gene Supercomputer contd. slide 38 Anshul Kumar, CSE IITD
  • 39. Embedded vs GP Computing • Fixed functionality • Part of a larger system • Interact with environment • Real-time requirements • Power constraints • Environmental contraints • Performance can not be increased simply by increasing clock frequency slide 39 Anshul Kumar, CSE IITD
  • 40. Cradle CT 3616 Architecture slide 40 Anshul Kumar, CSE IITD
  • 41. IBM Cell IBM Cell Architecture Architecture • Clock speed: > 4 GHz • Peak performance (single precision): > 256 GFlops • Peak performance (double precision): >26 GFlops • SPU registers 128 x 128b • Local storage size per SPU: 256KB • Area: 221 mm² • Technology 90nm SOI • Total number of transistors: 234M slide 41 Anshul Kumar, CSE IITD
  • 42. Books 1. D.A. Patterson, J.L. Hennessy, quot;Computer Architecture : A Quantitative Approachquot;, Morgan Kaufmann Publishers, 2006. 2. D. Sima, T. Fountain, P. Kacsuk, quot;Advanced Computer Architectures : A Design Space Approachquot;, Addison Wesley, 1997. 3. M.J. Flynn, quot;Computer Architecture : Pipelined and Parallel Processor Designquot;, Narosa Publishing House/ Jones and Bartlett, 1996. 4. K. Hwang, quot;Advanced Computer Architecture : Parallelism, Scalability, Programmabilityquot;, McGraw Hill, 1993. 5. H.G. Cragon, quot;Memory Systems and Pipelined Processorsquot;, Narosa Publishing House/ Jones and Bartlett, 1998. 6. D.E. Culler, J.P Singh and Anoop Gupta, quot;Parallel Computer Architecture, A Hardware/Software Approachquot;, Harcourt Asia / Morgan Kaufmann Publishers, 2000. slide 42 Anshul Kumar, CSE IITD