SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
InfiniBand? Problems? Do you care?




Christian Kniep / Jan Wender
science + computing ag
IT services for sophisticated computer environments
TĂŒbingen | MĂŒnchen | Berlin | DĂŒsseldorf
Agenda

   This is an interactive session!
   â–Ș      Who is on the podium?
   â–Ș      Living Histogram?
   â–Ș      Getting some statistics
          â–Ș      Living Histogram
   â–Ș      Existing Monitoring Solutions
   â–Ș      Discussion
          â–Ș      Quick and Dirty Analysis
          â–Ș      Conclusions




Page 2

BoF InfiniBand | 2012-06-19                 © 2012 science +   computing ag
On the podium




Page 3

BoF InfiniBand | 2012-06-19   © 2012 science +   computing ag
science + computing at a glance

    Founding Year             1989

    Locations                 TĂŒbingen
                              MĂŒnchen 
          
                              Berlin
                              DĂŒsseldorf

    Employees                 270
    Shareholder               Bull S.A. (100%)
    Revenue 10/11             27 Mio. Euro

    Partners                  Daikin Industries, Japan
                              NICE srl, Italien
                              Exa Corporation, USA
                              Platform Computing, Kanada



Page 4

BoF InfiniBand | 2012-06-19                                © 2012 science +   computing ag
Living Histogram?




                              Brian L. Joiner, International Statistical Review / Revue Internationale de Statistique, Vol. 43, No. 3. (Dec.,1975), pp. 339-340.


Page 5

BoF InfiniBand | 2012-06-19                                                                                                                                   © 2012 science +   computing ag
Living Histogram

      Size of Fabric
  â–Ș         <10
  â–Ș         <50
  â–Ș         <500
  â–Ș         >500




Page    6

BoF InfiniBand | 2012-06-19   © 2012 science +   computing ag
Living Histogram

      Switch Structure
  â–Ș         Switch size
            â–Ș   singular switch
                (mlx4036, qlogic12300)
            â–Ș   Modular switch
                (mlx5600, qlogic12800)
  â–Ș         Amount
            â–Ș   few
            â–Ș   many




Page    7

BoF InfiniBand | 2012-06-19              © 2012 science +   computing ag
Living Histogram

      Focus
  â–Ș         Stability
            ➡     maintenance cost
  â–Ș         High-Perfomance
            ➡     extremly optimized




Page    8

BoF InfiniBand | 2012-06-19            © 2012 science +   computing ag
Living Histogram

      Type of Use
  â–Ș         Cluster Purpose
            â–Ș   Single Purpose Cluster
            â–Ș   Multi Purpose Cluster
  â–Ș         Usage
            â–Ș   One Job at a time
            â–Ș   Multiple Jobs




Page    9

BoF InfiniBand | 2012-06-19              © 2012 science +   computing ag
Living Histogram

      Kind/Amount of Problems
  â–Ș      Impact
         â–Ș      minor
         â–Ș      major
  â–Ș      Amount
         â–Ș      few
         â–Ș      many




Page 10

BoF InfiniBand | 2012-06-19     © 2012 science +   computing ag
Living Histogram

      Problem solving
  â–Ș      Iterative
           ➡      reseat / reboot
  â–Ș      Analytic
           ➡      dig into the problem
           ➡      try to wipe it out




Page 11

BoF InfiniBand | 2012-06-19              © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

â–Ș       infiniband-diags                           â–Ș   wrapper of infiniband-diags
        â–Ș      ibcheckerrors                       â–Ș   INAM (Ohio-State-University)
        â–Ș      ibdiagpath
                                                   â–Ș   QNIB
â–Ș       plugin to non-IB systems
                                                   â–Ș   .....
        â–Ș      nagios
        â–Ș      collectl
â–Ș       hardware vendor suites                     not listed stuff
        â–Ș      Unified Fabric Manager (Mellanox) â–Ș     ...
        â–Ș      InfiniBand Fabric Suites (QLogic)




    Page 12

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

â–Ș       infiniband-diags                           â–Ș   wrapper of infiniband-diags
        â–Ș      ibcheckerrors                       â–Ș   INAM (Ohio-State-University)
        â–Ș      ibdiagpath
                                                   â–Ș   QNIB
â–Ș       plugin to non-IB systems
                                                   â–Ș   .....
        â–Ș      nagios
        â–Ș      collectl
â–Ș       hardware vendor suites                     not listed stuff
        â–Ș      Unified Fabric Manager (Mellanox) â–Ș     ...
        â–Ș      InfiniBand Fabric Suites (QLogic)




    Page 13

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Modular Switches

switchguid=0xac1(ac1)!        #   Spine 1
Switch!     36 "S-ac1"!       #   "A1" enhanced port 0 lid 11 lmc 0
[1]!        "S-bc1"[1]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[1]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[1]!       #   "B3" lid 23 4xQDR

switchguid=0xac2(ac2)!        #   Spine 2
Switch!     36 "S-ac2"!       #   "A2" enhanced port 0 lid 12 lmc 0
[1]!        "S-bc1"[2]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[2]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[2]!       #   "B3" lid 23 4xQDR

switchguid=0xbc1(bc1)!        #   Line 1
Switch      36 "S-bc1"!       #   "B1" enhanced port 0 lid 21 lmc 0
[1]!        "S-ac1"[1]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[1]        #   "A2" lid 12 4xQDR
[3]         "H-1"[1](f1)      #   "Host1" lid 101 4xQDR

switchguid=0xbc2(bc2)!        #   Line 2
Switch!     36 "S-bc2"!       #   "B2" enhanced port 0 lid 22 lmc 0
[1]!        "S-ac1"[2]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[2]        #   "A2" lid 12 4xQDR
[3]         "H-2"[1](f2)      #   "Host2" lid 102 4xQDR

switchguid=0xbc3(bc3)!        #   Line 3
Switch!     36 "S-bc3"!       #   "B3" enhanced port 0 lid 23 lmc 0
[1]!        "S-ac1"[3]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[3]        #   "A2" lid 12 4xQDR
[3]         "H-3"[1](f3)      #   "Host3" lid 103 4xQDR


Page 14

BoF InfiniBand | 2012-06-19                                           © 2012 science +   computing ag
Modular Switches

switchguid=0xac1(ac1)!        #   Spine 1
Switch!     36 "S-ac1"!       #   "A1" enhanced port 0 lid 11 lmc 0               Chassis1
[1]!        "S-bc1"[1]!       #   "B1" lid 21 4xQDR                      Spine1                  Spine2
[2]!        "S-bc2"[1]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[1]!       #   "B3" lid 23 4xQDR

switchguid=0xac2(ac2)!        #   Spine 2
Switch!     36 "S-ac2"!       #   "A2" enhanced port 0 lid 12 lmc 0
[1]!        "S-bc1"[2]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[2]!       #   "B2" lid 22 4xQDR                   Line1        Line2                Line3
[3]!        "S-bc3"[2]!       #   "B3" lid 23 4xQDR

switchguid=0xbc1(bc1)!        #   Line 1
Switch      36 "S-bc1"!       #   "B1" enhanced port 0 lid 21 lmc 0
[1]!        "S-ac1"[1]!       #   "A1" lid 11 4xQDR                   Host1        Host2                 Host3
[2]         "S-ac2"[1]        #   "A2" lid 12 4xQDR
[3]         "H-1"[1](f1)      #   "Host1" lid 101 4xQDR

switchguid=0xbc2(bc2)!        #   Line 2
Switch!     36 "S-bc2"!       #   "B2" enhanced port 0 lid 22 lmc 0
[1]!        "S-ac1"[2]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[2]        #   "A2" lid 12 4xQDR
[3]         "H-2"[1](f2)      #   "Host2" lid 102 4xQDR

switchguid=0xbc3(bc3)!        #   Line 3
Switch!     36 "S-bc3"!       #   "B3" enhanced port 0 lid 23 lmc 0
[1]!        "S-ac1"[3]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[3]        #   "A2" lid 12 4xQDR
[3]         "H-3"[1](f3)      #   "Host3" lid 103 4xQDR


Page 15

BoF InfiniBand | 2012-06-19                                                                © 2012 science +   computing ag
Modular Switches

switchguid=0xac1(ac1)!        #   Spine 1
Switch!     36 "S-ac1"!       #   "A1" enhanced port 0 lid 11 lmc 0               Chassis1
[1]!        "S-bc1"[1]!       #   "B1" lid 21 4xQDR                      Spine1                  Spine2
[2]!        "S-bc2"[1]!       #   "B2" lid 22 4xQDR
[3]!        "S-bc3"[1]!       #   "B3" lid 23 4xQDR

switchguid=0xac2(ac2)!        #   Spine 2
Switch!     36 "S-ac2"!       #   "A2" enhanced port 0 lid 12 lmc 0
[1]!        "S-bc1"[2]!       #   "B1" lid 21 4xQDR
[2]!        "S-bc2"[2]!       #   "B2" lid 22 4xQDR                   Line1        Line2                Line3
[3]!        "S-bc3"[2]!       #   "B3" lid 23 4xQDR

switchguid=0xbc1(bc1)!        #   Line 1
Switch      36 "S-bc1"!       #   "B1" enhanced port 0 lid 21 lmc 0
[1]!        "S-ac1"[1]!       #   "A1" lid 11 4xQDR                   Host1        Host2                 Host3
[2]         "S-ac2"[1]        #   "A2" lid 12 4xQDR
[3]         "H-1"[1](f1)      #   "Host1" lid 101 4xQDR

switchguid=0xbc2(bc2)!        #   Line 2
Switch!     36 "S-bc2"!       #   "B2" enhanced port 0 lid 22 lmc 0
[1]!        "S-ac1"[2]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[2]        #   "A2" lid 12 4xQDR                               Chassis1
[3]         "H-2"[1](f2)      #   "Host2" lid 102 4xQDR

switchguid=0xbc3(bc3)!        #   Line 3
Switch!     36 "S-bc3"!       #   "B3" enhanced port 0 lid 23 lmc 0
[1]!        "S-ac1"[3]!       #   "A1" lid 11 4xQDR
[2]         "S-ac2"[3]        #   "A2" lid 12 4xQDR
[3]         "H-3"[1](f3)      #   "Host3" lid 103 4xQDR               Host1        Host2                 Host3

Page 16

BoF InfiniBand | 2012-06-19                                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

â–Ș       infiniband-diags                           â–Ș   wrapper of infiniband-diags
        â–Ș      ibcheckerrors                       â–Ș   INAM (Ohio-State-University)
        â–Ș      ibdiagpath
                                                   â–Ș   QNIB
â–Ș       plugin to non-IB systems
                                                   â–Ș   .....
        â–Ș      nagios
        â–Ș      collectl
â–Ș       hardware vendor suites                     not listed stuff
        â–Ș      Unified Fabric Manager (Mellanox) â–Ș     ...
        â–Ș      InfiniBand Fabric Suites (QLogic)




    Page 17

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

â–Ș       infiniband-diags                           â–Ș   wrapper of infiniband-diags
        â–Ș      ibcheckerrors                       â–Ș   INAM (Ohio-State-University)
        â–Ș      ibdiagpath
                                                   â–Ș   QNIB
â–Ș       plugin to non-IB systems
                                                   â–Ș   .....
        â–Ș      nagios
        â–Ș      collectl
â–Ș       hardware vendor suites                     not listed stuff
        â–Ș      Unified Fabric Manager (Mellanox) â–Ș     ...
        â–Ș      InfiniBand Fabric Suites (QLogic)




    Page 18

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

â–Ș       infiniband-diags                           â–Ș   wrapper of infiniband-diags
        â–Ș      ibcheckerrors                       â–Ș   INAM (Ohio-State-University)
        â–Ș      ibdiagpath
                                                   â–Ș   QNIB
â–Ș       plugin to non-IB systems
                                                   â–Ș   .....
        â–Ș      nagios
        â–Ș      collectl
â–Ș       hardware vendor suites                     not listed stuff
        â–Ș      Unified Fabric Manager (Mellanox) â–Ș     ...
        â–Ș      InfiniBand Fabric Suites (QLogic)




    Page 19

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

â–Ș       infiniband-diags                           â–Ș   wrapper of infiniband-diags
        â–Ș      ibcheckerrors                       â–Ș   INAM (Ohio-State-University)
        â–Ș      ibdiagpath
                                                   â–Ș   QNIB
â–Ș       plugin to non-IB systems
                                                   â–Ș   .....
        â–Ș      nagios
        â–Ș      collectl
â–Ș       hardware vendor suites                     not listed stuff
        â–Ș      Unified Fabric Manager (Mellanox) â–Ș     ...
        â–Ș      InfiniBand Fabric Suites (QLogic)




    Page 20

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Monitoring Solutions

stable            (but not useful to admins?)      unstable    (individually carved)

â–Ș       infiniband-diags                           â–Ș   wrapper of infiniband-diags
        â–Ș      ibcheckerrors                       â–Ș   INAM (Ohio-State-University)
        â–Ș      ibdiagpath
                                                   â–Ș   QNIB
â–Ș       plugin to non-IB systems
                                                   â–Ș   .....
        â–Ș      nagios
        â–Ș      collectl
â–Ș       hardware vendor suites                     not listed stuff
        â–Ș      Unified Fabric Manager (Mellanox) â–Ș     ...
        â–Ș      InfiniBand Fabric Suites (QLogic)




    Page 21

    BoF InfiniBand | 2012-06-19                                                © 2012 science +   computing ag
Discussion - Quick Analysis

Fabricsize                                 Type of use
â–Ș      small -> easy as pie?               â–Ș   willing/forced to share
â–Ș      big            -> crit. mass for    Problemkind / -amount
                          real analysis?   â–Ș   runs smoothly enough
Switch structure                           Problemsolving
â–Ș      what is your                        â–Ș   learncurve starts step
       routing algorithm?
Focus
â–Ș      80:20 rule?
                  performance
                  maintenance
Page 22

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
â–Ș      small -> easy as pie?               â–Ș   willing/forced to share
â–Ș      big            -> crit. mass for    Problem type / amount
                          real analysis?   â–Ș   runs smoothly enough
Switch structure                           Problem solving
â–Ș      what is your                        â–Ș   learning curve starts steep
       routing algorithm?
Focus
â–Ș      80:20 rule?
                  performance
                  maintenance
Page 23

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
â–Ș      small -> easy as pie?               â–Ș   willing/forced to share
â–Ș      big            -> crit. mass for    Problem type / amount
                          real analysis?   â–Ș   runs smoothly enough
Switch structure                           Problem solving
â–Ș      what is your                        â–Ș   learning curve starts steep
       routing algorithm?
Focus
â–Ș      80:20 rule?
                  performance
                  maintenance
Page 24

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
â–Ș      small -> easy as pie?               â–Ș   willing/forced to share
â–Ș      big            -> crit. mass for    Problem type / amount
                          real analysis?   â–Ș   runs smoothly enough
Switch structure                           Problem solving
â–Ș      what is your                        â–Ș   learning curve starts steep
       routing algorithm?
Focus                                                                      100
â–Ș      80:20 rule?                                                         75
                  performance                                            50
                  maintenance                                           25
Page 25
                                                                       0
BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
â–Ș      small -> easy as pie?               â–Ș   willing/forced to share
â–Ș      big            -> crit. mass for    Problem type / amount
                          real analysis?   â–Ș   runs smoothly enough
Switch structure                           Problem solving
â–Ș      what is your                        â–Ș   learning curve starts steep
       routing algorithm?
Focus
â–Ș      80:20 rule?
                  performance
                  maintenance
Page 26

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
â–Ș      small -> easy as pie?               â–Ș   willing/forced to share
â–Ș      big            -> crit. mass for    Problem type / amount
                          real analysis?   â–Ș   runs smoothly enough
Switch structure                           Problem solving
â–Ș      what is your                        â–Ș   learning curve starts steep
       routing algorithm?
Focus
â–Ș      80:20 rule?
                  performance
                  maintenance
Page 27

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Quick Analysis

Fabric size                                Type of use
â–Ș      small -> easy as pie?               â–Ș   willing/forced to share
â–Ș      big            -> crit. mass for    Problem type / amount
                          real analysis?   â–Ș   runs smoothly enough
Switch structure                           Problem solving
â–Ș      what is your                        â–Ș   learning curve starts steep
       routing algorithm?
Focus
â–Ș      80:20 rule?
                  performance
                  maintenance
Page 28

BoF InfiniBand | 2012-06-19                                     © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
â–Ș      what approach?


Do we scare you?
â–Ș      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 29

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
â–Ș      what approach?


Do we scare you?
â–Ș      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 30

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
â–Ș      what approach?


Do we scare you?
â–Ș      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 31

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
â–Ș      what approach?


Do we scare you?
â–Ș      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 32

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Discussion - Conclusions

Monitoring
â–Ș      what approach?


Do we scare you?
â–Ș      not intending to spread Fear, Uncertainty and Doubt


Our conclusions


Your conclusions




Page 33

BoF InfiniBand | 2012-06-19                                  © 2012 science +   computing ag
Thank you for your attention and participation!



science + computing ag
www.science-computing.de

Telefon: +49 (0)7071 9457 - 0
E-Mail: info@science-computing.de

Weitere Àhnliche Inhalte

Ähnlich wie ISC 12 BoF: InfiniBand? Problems? Do you care?

WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4
Shigeru Kobayashi
 

Ähnlich wie ISC 12 BoF: InfiniBand? Problems? Do you care? (20)

Ax som-xc7z020-user_manual_en
Ax som-xc7z020-user_manual_enAx som-xc7z020-user_manual_en
Ax som-xc7z020-user_manual_en
 
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
[CB20] DeClang: Anti-hacking compiler by Mengyuan Wan
 
Top Ten Programming Mistakes by People New to Siemens
Top Ten Programming Mistakes by People New to SiemensTop Ten Programming Mistakes by People New to Siemens
Top Ten Programming Mistakes by People New to Siemens
 
YCAM Workshop Part 1
YCAM Workshop Part 1YCAM Workshop Part 1
YCAM Workshop Part 1
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
technical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptxtechnical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptx
 
Nettab 2006 Tutorial 3B part 2
Nettab 2006 Tutorial 3B part 2Nettab 2006 Tutorial 3B part 2
Nettab 2006 Tutorial 3B part 2
 
Lmb162 abc manual-rev0.2
Lmb162 abc manual-rev0.2Lmb162 abc manual-rev0.2
Lmb162 abc manual-rev0.2
 
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF controlLS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
LS-RDIO0202 wireless I/O module 2DI 2DO wireless ON-OFF control
 
dokumen.tips_vhdl-0-introduction-to-vhdl.ppt
dokumen.tips_vhdl-0-introduction-to-vhdl.pptdokumen.tips_vhdl-0-introduction-to-vhdl.ppt
dokumen.tips_vhdl-0-introduction-to-vhdl.ppt
 
Software maintenance PyConUK 2016
Software maintenance PyConUK 2016Software maintenance PyConUK 2016
Software maintenance PyConUK 2016
 
Geidai Open Workshop 2009
Geidai Open Workshop 2009Geidai Open Workshop 2009
Geidai Open Workshop 2009
 
LS-RDIO0808 PLC Wireless link module Modbus RTU
LS-RDIO0808 PLC Wireless link module Modbus RTULS-RDIO0808 PLC Wireless link module Modbus RTU
LS-RDIO0808 PLC Wireless link module Modbus RTU
 
seminar on PIC1684
seminar on PIC1684seminar on PIC1684
seminar on PIC1684
 
WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4WCAN mini ActionScript Vol.4
WCAN mini ActionScript Vol.4
 
67WS Event FIO Primer
67WS Event FIO Primer67WS Event FIO Primer
67WS Event FIO Primer
 
Introduction to PCB Design (Eagle)
Introduction to PCB Design (Eagle)Introduction to PCB Design (Eagle)
Introduction to PCB Design (Eagle)
 
What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratch
 
Arduino Projects.pptx
Arduino Projects.pptxArduino Projects.pptx
Arduino Projects.pptx
 
Make: Tokyo Meeting 03
Make: Tokyo Meeting 03Make: Tokyo Meeting 03
Make: Tokyo Meeting 03
 

KĂŒrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

KĂŒrzlich hochgeladen (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

ISC 12 BoF: InfiniBand? Problems? Do you care?

  • 1. InfiniBand? Problems? Do you care? Christian Kniep / Jan Wender science + computing ag IT services for sophisticated computer environments TĂŒbingen | MĂŒnchen | Berlin | DĂŒsseldorf
  • 2. Agenda This is an interactive session! â–Ș Who is on the podium? â–Ș Living Histogram? â–Ș Getting some statistics â–Ș Living Histogram â–Ș Existing Monitoring Solutions â–Ș Discussion â–Ș Quick and Dirty Analysis â–Ș Conclusions Page 2 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 3. On the podium Page 3 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 4. science + computing at a glance Founding Year 1989 Locations TĂŒbingen MĂŒnchen Berlin DĂŒsseldorf Employees 270 Shareholder Bull S.A. (100%) Revenue 10/11 27 Mio. Euro Partners Daikin Industries, Japan NICE srl, Italien Exa Corporation, USA Platform Computing, Kanada Page 4 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 5. Living Histogram? Brian L. Joiner, International Statistical Review / Revue Internationale de Statistique, Vol. 43, No. 3. (Dec.,1975), pp. 339-340. Page 5 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 6. Living Histogram Size of Fabric â–Ș <10 â–Ș <50 â–Ș <500 â–Ș >500 Page 6 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 7. Living Histogram Switch Structure â–Ș Switch size â–Ș singular switch (mlx4036, qlogic12300) â–Ș Modular switch (mlx5600, qlogic12800) â–Ș Amount â–Ș few â–Ș many Page 7 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 8. Living Histogram Focus â–Ș Stability ➡ maintenance cost â–Ș High-Perfomance ➡ extremly optimized Page 8 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 9. Living Histogram Type of Use â–Ș Cluster Purpose â–Ș Single Purpose Cluster â–Ș Multi Purpose Cluster â–Ș Usage â–Ș One Job at a time â–Ș Multiple Jobs Page 9 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 10. Living Histogram Kind/Amount of Problems â–Ș Impact â–Ș minor â–Ș major â–Ș Amount â–Ș few â–Ș many Page 10 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 11. Living Histogram Problem solving â–Ș Iterative ➡ reseat / reboot â–Ș Analytic ➡ dig into the problem ➡ try to wipe it out Page 11 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 12. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) â–Ș infiniband-diags â–Ș wrapper of infiniband-diags â–Ș ibcheckerrors â–Ș INAM (Ohio-State-University) â–Ș ibdiagpath â–Ș QNIB â–Ș plugin to non-IB systems â–Ș ..... â–Ș nagios â–Ș collectl â–Ș hardware vendor suites not listed stuff â–Ș Unified Fabric Manager (Mellanox) â–Ș ... â–Ș InfiniBand Fabric Suites (QLogic) Page 12 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 13. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) â–Ș infiniband-diags â–Ș wrapper of infiniband-diags â–Ș ibcheckerrors â–Ș INAM (Ohio-State-University) â–Ș ibdiagpath â–Ș QNIB â–Ș plugin to non-IB systems â–Ș ..... â–Ș nagios â–Ș collectl â–Ș hardware vendor suites not listed stuff â–Ș Unified Fabric Manager (Mellanox) â–Ș ... â–Ș InfiniBand Fabric Suites (QLogic) Page 13 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 14. Modular Switches switchguid=0xac1(ac1)! # Spine 1 Switch! 36 "S-ac1"! # "A1" enhanced port 0 lid 11 lmc 0 [1]! "S-bc1"[1]! # "B1" lid 21 4xQDR [2]! "S-bc2"[1]! # "B2" lid 22 4xQDR [3]! "S-bc3"[1]! # "B3" lid 23 4xQDR switchguid=0xac2(ac2)! # Spine 2 Switch! 36 "S-ac2"! # "A2" enhanced port 0 lid 12 lmc 0 [1]! "S-bc1"[2]! # "B1" lid 21 4xQDR [2]! "S-bc2"[2]! # "B2" lid 22 4xQDR [3]! "S-bc3"[2]! # "B3" lid 23 4xQDR switchguid=0xbc1(bc1)! # Line 1 Switch 36 "S-bc1"! # "B1" enhanced port 0 lid 21 lmc 0 [1]! "S-ac1"[1]! # "A1" lid 11 4xQDR [2] "S-ac2"[1] # "A2" lid 12 4xQDR [3] "H-1"[1](f1) # "Host1" lid 101 4xQDR switchguid=0xbc2(bc2)! # Line 2 Switch! 36 "S-bc2"! # "B2" enhanced port 0 lid 22 lmc 0 [1]! "S-ac1"[2]! # "A1" lid 11 4xQDR [2] "S-ac2"[2] # "A2" lid 12 4xQDR [3] "H-2"[1](f2) # "Host2" lid 102 4xQDR switchguid=0xbc3(bc3)! # Line 3 Switch! 36 "S-bc3"! # "B3" enhanced port 0 lid 23 lmc 0 [1]! "S-ac1"[3]! # "A1" lid 11 4xQDR [2] "S-ac2"[3] # "A2" lid 12 4xQDR [3] "H-3"[1](f3) # "Host3" lid 103 4xQDR Page 14 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 15. Modular Switches switchguid=0xac1(ac1)! # Spine 1 Switch! 36 "S-ac1"! # "A1" enhanced port 0 lid 11 lmc 0 Chassis1 [1]! "S-bc1"[1]! # "B1" lid 21 4xQDR Spine1 Spine2 [2]! "S-bc2"[1]! # "B2" lid 22 4xQDR [3]! "S-bc3"[1]! # "B3" lid 23 4xQDR switchguid=0xac2(ac2)! # Spine 2 Switch! 36 "S-ac2"! # "A2" enhanced port 0 lid 12 lmc 0 [1]! "S-bc1"[2]! # "B1" lid 21 4xQDR [2]! "S-bc2"[2]! # "B2" lid 22 4xQDR Line1 Line2 Line3 [3]! "S-bc3"[2]! # "B3" lid 23 4xQDR switchguid=0xbc1(bc1)! # Line 1 Switch 36 "S-bc1"! # "B1" enhanced port 0 lid 21 lmc 0 [1]! "S-ac1"[1]! # "A1" lid 11 4xQDR Host1 Host2 Host3 [2] "S-ac2"[1] # "A2" lid 12 4xQDR [3] "H-1"[1](f1) # "Host1" lid 101 4xQDR switchguid=0xbc2(bc2)! # Line 2 Switch! 36 "S-bc2"! # "B2" enhanced port 0 lid 22 lmc 0 [1]! "S-ac1"[2]! # "A1" lid 11 4xQDR [2] "S-ac2"[2] # "A2" lid 12 4xQDR [3] "H-2"[1](f2) # "Host2" lid 102 4xQDR switchguid=0xbc3(bc3)! # Line 3 Switch! 36 "S-bc3"! # "B3" enhanced port 0 lid 23 lmc 0 [1]! "S-ac1"[3]! # "A1" lid 11 4xQDR [2] "S-ac2"[3] # "A2" lid 12 4xQDR [3] "H-3"[1](f3) # "Host3" lid 103 4xQDR Page 15 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 16. Modular Switches switchguid=0xac1(ac1)! # Spine 1 Switch! 36 "S-ac1"! # "A1" enhanced port 0 lid 11 lmc 0 Chassis1 [1]! "S-bc1"[1]! # "B1" lid 21 4xQDR Spine1 Spine2 [2]! "S-bc2"[1]! # "B2" lid 22 4xQDR [3]! "S-bc3"[1]! # "B3" lid 23 4xQDR switchguid=0xac2(ac2)! # Spine 2 Switch! 36 "S-ac2"! # "A2" enhanced port 0 lid 12 lmc 0 [1]! "S-bc1"[2]! # "B1" lid 21 4xQDR [2]! "S-bc2"[2]! # "B2" lid 22 4xQDR Line1 Line2 Line3 [3]! "S-bc3"[2]! # "B3" lid 23 4xQDR switchguid=0xbc1(bc1)! # Line 1 Switch 36 "S-bc1"! # "B1" enhanced port 0 lid 21 lmc 0 [1]! "S-ac1"[1]! # "A1" lid 11 4xQDR Host1 Host2 Host3 [2] "S-ac2"[1] # "A2" lid 12 4xQDR [3] "H-1"[1](f1) # "Host1" lid 101 4xQDR switchguid=0xbc2(bc2)! # Line 2 Switch! 36 "S-bc2"! # "B2" enhanced port 0 lid 22 lmc 0 [1]! "S-ac1"[2]! # "A1" lid 11 4xQDR [2] "S-ac2"[2] # "A2" lid 12 4xQDR Chassis1 [3] "H-2"[1](f2) # "Host2" lid 102 4xQDR switchguid=0xbc3(bc3)! # Line 3 Switch! 36 "S-bc3"! # "B3" enhanced port 0 lid 23 lmc 0 [1]! "S-ac1"[3]! # "A1" lid 11 4xQDR [2] "S-ac2"[3] # "A2" lid 12 4xQDR [3] "H-3"[1](f3) # "Host3" lid 103 4xQDR Host1 Host2 Host3 Page 16 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 17. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) â–Ș infiniband-diags â–Ș wrapper of infiniband-diags â–Ș ibcheckerrors â–Ș INAM (Ohio-State-University) â–Ș ibdiagpath â–Ș QNIB â–Ș plugin to non-IB systems â–Ș ..... â–Ș nagios â–Ș collectl â–Ș hardware vendor suites not listed stuff â–Ș Unified Fabric Manager (Mellanox) â–Ș ... â–Ș InfiniBand Fabric Suites (QLogic) Page 17 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 18. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) â–Ș infiniband-diags â–Ș wrapper of infiniband-diags â–Ș ibcheckerrors â–Ș INAM (Ohio-State-University) â–Ș ibdiagpath â–Ș QNIB â–Ș plugin to non-IB systems â–Ș ..... â–Ș nagios â–Ș collectl â–Ș hardware vendor suites not listed stuff â–Ș Unified Fabric Manager (Mellanox) â–Ș ... â–Ș InfiniBand Fabric Suites (QLogic) Page 18 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 19. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) â–Ș infiniband-diags â–Ș wrapper of infiniband-diags â–Ș ibcheckerrors â–Ș INAM (Ohio-State-University) â–Ș ibdiagpath â–Ș QNIB â–Ș plugin to non-IB systems â–Ș ..... â–Ș nagios â–Ș collectl â–Ș hardware vendor suites not listed stuff â–Ș Unified Fabric Manager (Mellanox) â–Ș ... â–Ș InfiniBand Fabric Suites (QLogic) Page 19 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 20. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) â–Ș infiniband-diags â–Ș wrapper of infiniband-diags â–Ș ibcheckerrors â–Ș INAM (Ohio-State-University) â–Ș ibdiagpath â–Ș QNIB â–Ș plugin to non-IB systems â–Ș ..... â–Ș nagios â–Ș collectl â–Ș hardware vendor suites not listed stuff â–Ș Unified Fabric Manager (Mellanox) â–Ș ... â–Ș InfiniBand Fabric Suites (QLogic) Page 20 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 21. Monitoring Solutions stable (but not useful to admins?) unstable (individually carved) â–Ș infiniband-diags â–Ș wrapper of infiniband-diags â–Ș ibcheckerrors â–Ș INAM (Ohio-State-University) â–Ș ibdiagpath â–Ș QNIB â–Ș plugin to non-IB systems â–Ș ..... â–Ș nagios â–Ș collectl â–Ș hardware vendor suites not listed stuff â–Ș Unified Fabric Manager (Mellanox) â–Ș ... â–Ș InfiniBand Fabric Suites (QLogic) Page 21 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 22. Discussion - Quick Analysis Fabricsize Type of use â–Ș small -> easy as pie? â–Ș willing/forced to share â–Ș big -> crit. mass for Problemkind / -amount real analysis? â–Ș runs smoothly enough Switch structure Problemsolving â–Ș what is your â–Ș learncurve starts step routing algorithm? Focus â–Ș 80:20 rule? performance maintenance Page 22 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 23. Discussion - Quick Analysis Fabric size Type of use â–Ș small -> easy as pie? â–Ș willing/forced to share â–Ș big -> crit. mass for Problem type / amount real analysis? â–Ș runs smoothly enough Switch structure Problem solving â–Ș what is your â–Ș learning curve starts steep routing algorithm? Focus â–Ș 80:20 rule? performance maintenance Page 23 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 24. Discussion - Quick Analysis Fabric size Type of use â–Ș small -> easy as pie? â–Ș willing/forced to share â–Ș big -> crit. mass for Problem type / amount real analysis? â–Ș runs smoothly enough Switch structure Problem solving â–Ș what is your â–Ș learning curve starts steep routing algorithm? Focus â–Ș 80:20 rule? performance maintenance Page 24 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 25. Discussion - Quick Analysis Fabric size Type of use â–Ș small -> easy as pie? â–Ș willing/forced to share â–Ș big -> crit. mass for Problem type / amount real analysis? â–Ș runs smoothly enough Switch structure Problem solving â–Ș what is your â–Ș learning curve starts steep routing algorithm? Focus 100 â–Ș 80:20 rule? 75 performance 50 maintenance 25 Page 25 0 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 26. Discussion - Quick Analysis Fabric size Type of use â–Ș small -> easy as pie? â–Ș willing/forced to share â–Ș big -> crit. mass for Problem type / amount real analysis? â–Ș runs smoothly enough Switch structure Problem solving â–Ș what is your â–Ș learning curve starts steep routing algorithm? Focus â–Ș 80:20 rule? performance maintenance Page 26 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 27. Discussion - Quick Analysis Fabric size Type of use â–Ș small -> easy as pie? â–Ș willing/forced to share â–Ș big -> crit. mass for Problem type / amount real analysis? â–Ș runs smoothly enough Switch structure Problem solving â–Ș what is your â–Ș learning curve starts steep routing algorithm? Focus â–Ș 80:20 rule? performance maintenance Page 27 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 28. Discussion - Quick Analysis Fabric size Type of use â–Ș small -> easy as pie? â–Ș willing/forced to share â–Ș big -> crit. mass for Problem type / amount real analysis? â–Ș runs smoothly enough Switch structure Problem solving â–Ș what is your â–Ș learning curve starts steep routing algorithm? Focus â–Ș 80:20 rule? performance maintenance Page 28 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 29. Discussion - Conclusions Monitoring â–Ș what approach? Do we scare you? â–Ș not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 29 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 30. Discussion - Conclusions Monitoring â–Ș what approach? Do we scare you? â–Ș not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 30 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 31. Discussion - Conclusions Monitoring â–Ș what approach? Do we scare you? â–Ș not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 31 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 32. Discussion - Conclusions Monitoring â–Ș what approach? Do we scare you? â–Ș not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 32 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 33. Discussion - Conclusions Monitoring â–Ș what approach? Do we scare you? â–Ș not intending to spread Fear, Uncertainty and Doubt Our conclusions Your conclusions Page 33 BoF InfiniBand | 2012-06-19 © 2012 science + computing ag
  • 34. Thank you for your attention and participation! science + computing ag www.science-computing.de Telefon: +49 (0)7071 9457 - 0 E-Mail: info@science-computing.de