SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Systolic Arrays & Their
     Applications
Overview
 What it is
 N-body problem
 Matrix multiplication
 Cannon’s method
 Other applications
 Conclusion
Introduction – “Systolic”
What Is a Systolic Array?
A systolic array is an arrangement of processors in an array
where data flows synchronously across the array between
neighbors, usually with different data flowing in different directions

Each processor at each step takes in data from one or more
neighbors (e.g. North and West), processes it and, in the next
step, outputs results in the opposite direction (South and East).


H. T. Kung and Charles Leiserson were the first to publish
a paper on systolic arrays in 1978, and coined the name.
What Is a Systolic Array?
 A specialized form of parallel computing.
 Multiple processors connected by short
  wires.
 Unlike many forms of parallelism which
  lose speed through their connection.
 Cells(processors), compute data and store it
  independently of each other.
Systolic Unit(cell)
 Each unit is an independent processor.
 Every processor has some registers and an
  ALU.
 The cells share information with their
  neighbors, after performing the needed
  operations on the data.
Some simple examples of systolic array models.
N-body Problem
 Conventional method: N2
 Systolic method: N
 We will need one processing element for
  each body, lets call them planets.
B2
                          B3

B1
                                          B4

       B6

                        B5



Six planets in orbit with different masses, and different forces
acting on each other, so are systolic array will look like this.
Each processor will hold the newtonian force formula:
Fij = kmimj/d2ij , k being the gravitational constant.

Then load the array with values.




  B1     B2     B3     B4     B5    B6



 Now that the array has the mass and the coordinates of
 Each body we can begin are computation.
B6, B5, B4, B3, B2, B1




B1m   B2m     B3m     B4m       B5m       B6m
B1c   B2c     B3c     B4c       B5c       B6c




            Fij = kmimj/d2ij
B6, B5, B4, B3, B2




B1m   B2m     B3m     B4m          B5m
                                            B6*B1
B1c   B2c     B3c     B4c          B5c




            Fij = kmimj/d2ij
B6, B5, B4, B3




B1m   B2m     B3m     B4m
                               B5*B1    B6*B2
B1c   B2c     B3c     B4c




            Fij = kmimj/d2ij
B6, B5, B4




B1m   B2m     B3m
                      B4*B1    B5*B2   B6*B3
B1c   B2c     B3c




            Fij = kmimj/d2ij
B6, B5




B1m   B2m
             B3*B1    B4*B2    B5*B3    B6*B4
B1c   B2c




            Fij = kmimj/d2ij
B6




B1m
      B2*B1    B3*B2    B4*B3    B5*B4   B6*B5
B1c




              Fij = kmimj/d2ij
Done   Done     Done    Done     Done   Done




              Fij = kmimj/d2ij
Matrix Multiplication

a11 a12 a13         b11 b12 b13        c11 c12 c13
a21 a22 a23
a31 a32 a33     *   b21 b22 b23
                    b31 b32 b33
                                  =    c21 c22 c23
                                       c31 c32 c33


Conventional Method: N3

For I = 1 to N
   For J = 1 to N
      For K = 1 to N
          C[I,J] = C[I,J] + A[J,K] * B[K,J];
Systolic Method
This will run in O(n) time!

To run in N time we need N x N processing units, in this case
we need 9.

                  P1     P2     P3

                  P4     P5     P6

                  P7     P8     P9
We need to modify the input data, like so:

                             a13 a12 a11
Flip columns 1 & 3           a23 a22 a21
                             a33 a32 a31


                             b31 b32 b33
 Flip rows 1 & 3             b21 b22 b23
                             b11 b12 b13



 and finally stagger the data sets for input.
b33
                              b32   b23
                       b31    b22   b13
                       b21    b12
                       b11

         a13 a12 a11   P1     P2     P3

     a23 a22 a21       P4     P5     P6

a33 a32 a31            P7     P8     P9


At every tick of the global system clock data is passed to each
processor from two different directions, then it is multiplied
and the result is saved in a register.
3 4 2               3 4 2                23 36 28
2 5 3
3 2 5
            *       2 5 3
                    3 2 5
                                 =       25 39 34
                                         28 32 37

Lets try this using a systolic array.          5
                                          2    3
                                     3    5    2
                                     2    4
                                     3

                     2   4   3    P1     P2   P3

                3    5   2        P4     P5   P6

            5   2    3            P7     P8   P9
Clock tick: 1
                                            5
                                    2       3
                           3        5       2
                           2        4

             2    4    3*3

         3   5    2

    5    2   3




    P1       P2       P3           P4       P5       P6       P7       P8       P9

9        0        0            0        0        0        0        0        0
Clock tick: 2
                                          5
                                 2        3
                        3        5        2

               2    4*2         3*4

           3   5    2*3

      5    2   3




 P1       P2       P3           P4        P5       P6       P7       P8       P9

17    12       0            6         0        0        0        0        0
Clock tick: 3


                                    5
                            2       3

                 2*3    4*5     3*2

            3    5*2    2*4

        5   2    3*3




 P1    P2       P3      P4          P5       P6       P7       P8       P9

23    32    6          16       8        0        9        0        0
Clock tick: 4



                               5

                        2*2   4*3

                 3*3    5*5   2*2

            5    2*2    3*4




 P1    P2       P3      P4     P5       P6    P7      P8         P9

23    36    18         25     33    4        13     12       0
Clock tick: 5




                              2*5

                        3*2   5*3

                 5*3    5*2   3*2




 P1    P2    P3         P4     P5    P6    P7      P8         P9

23    36    28         25     39    19    28     22       6
Clock tick: 6




                         3*5

                   5*2   2*3




 P1    P2    P3    P4     P5    P6    P7      P8      P9

23    36    28    25     39    34    28     32       12
Clock tick: 7




                        5*5




 P1    P2    P3    P4    P5    P6    P7      P8      P9

23    36    28    25    39    34    28     32       37
Same answer! In 2n + 1 time, can we do better?
           The answer is yes, there is an optimization.


                    23        36    28

                    25        39    34

                    28        32    37




 P1    P2        P3       P4        P5    P6     P7         P8    P9

23    36       28        25        39    34    28      32        37
Cannon’s Technique
 No processor is idle.
 Instead of feeding the data, it is cycled.
 Data is staggered, but slightly different than
  before.
 Not including the loading which can be
  done in parallel for a time of 1, this
  technique is effectively N.
Other Applications
 Signal processing
 Image processing
 Solutions of differential equations
 Graph algorithms
 Biological sequence comparison
 Other computationally intensive tasks
Samba: Systolic Accelerator
      for Molecular Biological
           Applications
This systolic array contains 128 processors shared into 32 full custom
VLSI chips. One chip houses 4 processors, and one processor performs
10 millions matrix cells per second.
Why Systolic?
 Extremely fast.
 Easily scalable architecture.
 Can do many tasks single processor
  machines cannot attain.
 Turns some exponential problems into
  linear or polynomial time.
Why Not Systolic?
 Expensive.
 Not needed on most applications, they are a
  highly specialized processor type.
 Difficult to implement and build.

Weitere ähnliche Inhalte

Mehr von Gichelle Amon (20)

Kerberos
KerberosKerberos
Kerberos
 
Network security
Network securityNetwork security
Network security
 
Os module 2 d
Os module 2 dOs module 2 d
Os module 2 d
 
Os module 2 c
Os module 2 cOs module 2 c
Os module 2 c
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 
Lec3 final
Lec3 finalLec3 final
Lec3 final
 
Lec 3
Lec 3Lec 3
Lec 3
 
Lec 4
Lec 4Lec 4
Lec 4
 
Lec1 final
Lec1 finalLec1 final
Lec1 final
 
Module 3 law of contracts
Module 3  law of contractsModule 3  law of contracts
Module 3 law of contracts
 
Transport triggered architecture
Transport triggered architectureTransport triggered architecture
Transport triggered architecture
 
Time triggered arch.
Time triggered arch.Time triggered arch.
Time triggered arch.
 
Subnetting
SubnettingSubnetting
Subnetting
 
Os module 2 c
Os module 2 cOs module 2 c
Os module 2 c
 
Os module 2 ba
Os module 2 baOs module 2 ba
Os module 2 ba
 
Lec5
Lec5Lec5
Lec5
 
Delivery
DeliveryDelivery
Delivery
 
Addressing
AddressingAddressing
Addressing
 
6 spatial filtering p2
6 spatial filtering p26 spatial filtering p2
6 spatial filtering p2
 
5 spatial filtering p1
5 spatial filtering p15 spatial filtering p1
5 spatial filtering p1
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Lec2 final

  • 1. Systolic Arrays & Their Applications
  • 2. Overview  What it is  N-body problem  Matrix multiplication  Cannon’s method  Other applications  Conclusion
  • 4. What Is a Systolic Array? A systolic array is an arrangement of processors in an array where data flows synchronously across the array between neighbors, usually with different data flowing in different directions Each processor at each step takes in data from one or more neighbors (e.g. North and West), processes it and, in the next step, outputs results in the opposite direction (South and East). H. T. Kung and Charles Leiserson were the first to publish a paper on systolic arrays in 1978, and coined the name.
  • 5. What Is a Systolic Array?  A specialized form of parallel computing.  Multiple processors connected by short wires.  Unlike many forms of parallelism which lose speed through their connection.  Cells(processors), compute data and store it independently of each other.
  • 6. Systolic Unit(cell)  Each unit is an independent processor.  Every processor has some registers and an ALU.  The cells share information with their neighbors, after performing the needed operations on the data.
  • 7. Some simple examples of systolic array models.
  • 8. N-body Problem  Conventional method: N2  Systolic method: N  We will need one processing element for each body, lets call them planets.
  • 9. B2 B3 B1 B4 B6 B5 Six planets in orbit with different masses, and different forces acting on each other, so are systolic array will look like this.
  • 10. Each processor will hold the newtonian force formula: Fij = kmimj/d2ij , k being the gravitational constant. Then load the array with values. B1 B2 B3 B4 B5 B6 Now that the array has the mass and the coordinates of Each body we can begin are computation.
  • 11. B6, B5, B4, B3, B2, B1 B1m B2m B3m B4m B5m B6m B1c B2c B3c B4c B5c B6c Fij = kmimj/d2ij
  • 12. B6, B5, B4, B3, B2 B1m B2m B3m B4m B5m B6*B1 B1c B2c B3c B4c B5c Fij = kmimj/d2ij
  • 13. B6, B5, B4, B3 B1m B2m B3m B4m B5*B1 B6*B2 B1c B2c B3c B4c Fij = kmimj/d2ij
  • 14. B6, B5, B4 B1m B2m B3m B4*B1 B5*B2 B6*B3 B1c B2c B3c Fij = kmimj/d2ij
  • 15. B6, B5 B1m B2m B3*B1 B4*B2 B5*B3 B6*B4 B1c B2c Fij = kmimj/d2ij
  • 16. B6 B1m B2*B1 B3*B2 B4*B3 B5*B4 B6*B5 B1c Fij = kmimj/d2ij
  • 17. Done Done Done Done Done Done Fij = kmimj/d2ij
  • 18. Matrix Multiplication a11 a12 a13 b11 b12 b13 c11 c12 c13 a21 a22 a23 a31 a32 a33 * b21 b22 b23 b31 b32 b33 = c21 c22 c23 c31 c32 c33 Conventional Method: N3 For I = 1 to N For J = 1 to N For K = 1 to N C[I,J] = C[I,J] + A[J,K] * B[K,J];
  • 19. Systolic Method This will run in O(n) time! To run in N time we need N x N processing units, in this case we need 9. P1 P2 P3 P4 P5 P6 P7 P8 P9
  • 20. We need to modify the input data, like so: a13 a12 a11 Flip columns 1 & 3 a23 a22 a21 a33 a32 a31 b31 b32 b33 Flip rows 1 & 3 b21 b22 b23 b11 b12 b13 and finally stagger the data sets for input.
  • 21. b33 b32 b23 b31 b22 b13 b21 b12 b11 a13 a12 a11 P1 P2 P3 a23 a22 a21 P4 P5 P6 a33 a32 a31 P7 P8 P9 At every tick of the global system clock data is passed to each processor from two different directions, then it is multiplied and the result is saved in a register.
  • 22. 3 4 2 3 4 2 23 36 28 2 5 3 3 2 5 * 2 5 3 3 2 5 = 25 39 34 28 32 37 Lets try this using a systolic array. 5 2 3 3 5 2 2 4 3 2 4 3 P1 P2 P3 3 5 2 P4 P5 P6 5 2 3 P7 P8 P9
  • 23. Clock tick: 1 5 2 3 3 5 2 2 4 2 4 3*3 3 5 2 5 2 3 P1 P2 P3 P4 P5 P6 P7 P8 P9 9 0 0 0 0 0 0 0 0
  • 24. Clock tick: 2 5 2 3 3 5 2 2 4*2 3*4 3 5 2*3 5 2 3 P1 P2 P3 P4 P5 P6 P7 P8 P9 17 12 0 6 0 0 0 0 0
  • 25. Clock tick: 3 5 2 3 2*3 4*5 3*2 3 5*2 2*4 5 2 3*3 P1 P2 P3 P4 P5 P6 P7 P8 P9 23 32 6 16 8 0 9 0 0
  • 26. Clock tick: 4 5 2*2 4*3 3*3 5*5 2*2 5 2*2 3*4 P1 P2 P3 P4 P5 P6 P7 P8 P9 23 36 18 25 33 4 13 12 0
  • 27. Clock tick: 5 2*5 3*2 5*3 5*3 5*2 3*2 P1 P2 P3 P4 P5 P6 P7 P8 P9 23 36 28 25 39 19 28 22 6
  • 28. Clock tick: 6 3*5 5*2 2*3 P1 P2 P3 P4 P5 P6 P7 P8 P9 23 36 28 25 39 34 28 32 12
  • 29. Clock tick: 7 5*5 P1 P2 P3 P4 P5 P6 P7 P8 P9 23 36 28 25 39 34 28 32 37
  • 30. Same answer! In 2n + 1 time, can we do better? The answer is yes, there is an optimization. 23 36 28 25 39 34 28 32 37 P1 P2 P3 P4 P5 P6 P7 P8 P9 23 36 28 25 39 34 28 32 37
  • 31. Cannon’s Technique  No processor is idle.  Instead of feeding the data, it is cycled.  Data is staggered, but slightly different than before.  Not including the loading which can be done in parallel for a time of 1, this technique is effectively N.
  • 32. Other Applications  Signal processing  Image processing  Solutions of differential equations  Graph algorithms  Biological sequence comparison  Other computationally intensive tasks
  • 33. Samba: Systolic Accelerator for Molecular Biological Applications This systolic array contains 128 processors shared into 32 full custom VLSI chips. One chip houses 4 processors, and one processor performs 10 millions matrix cells per second.
  • 34. Why Systolic?  Extremely fast.  Easily scalable architecture.  Can do many tasks single processor machines cannot attain.  Turns some exponential problems into linear or polynomial time.
  • 35. Why Not Systolic?  Expensive.  Not needed on most applications, they are a highly specialized processor type.  Difficult to implement and build.

Hinweis der Redaktion

  1. I thought it was interesting enough to point out that the term Systolic actually refers to Systole which essentially means the “rhythmic beating of the heart.” This is appropriate because of the way a systolic array fans in values and relies on them being in a specific place at a specific time so that they can be “pumped” through the system.