Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Introduction to HPC

125 Aufrufe

Veröffentlicht am

A terribly dated introduction to HPC that I taught circa 2004.

Veröffentlicht in: Wissenschaft
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

  • Gehören Sie zu den Ersten, denen das gefällt!

Introduction to HPC

  1. 1. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Performance and Research Computing (For life scientists seeking a brief introduction to high performance and research computing) The BioTeam
  2. 2. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net BioTeam™ Inc. • Objective & vendor neutral informatics and ‘bio-IT’ consulting • Composed of scientists who learned to bridge the gap between life science informatics and high performance IT • “iNquiry” bioinformatics cluster solution • Staff Michael Athanas Bill Van Etten Chris Dagdigian Stan Gloss Chris Dwan http://bioteam.net
  3. 3. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Day Two Rough Outline • History of Computing • How Computers Work • Architectures for High Performance Computing • Parallel Computing • Cluster Computing • Building your own HPC Environment
  4. 4. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net I can’t teach Unix today, Sorry • Also can’t teach programming today. • Really can’t teach an editor, meaningfully. • The Unix command line and a basic facility with programming are required to take advantage of HPC resources.
  5. 5. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net What I Want You To Remember • It is important to map your solution to the computer architecture. • Automatic tools for performance tuning will only get you so far. • Good programs (and bad ones) can be written in any language • High Throughput vs. High Performance
  6. 6. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Language and Word Choice • Technical Language is very specific • Words overlap between disciplines Please ask questions.
  7. 7. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Computing - Pre-History • 1588 - English defeat the Spanish Armada (in part) using pre-computed firing tables • 1820’s - Charles Babbage invents a mechanical calculator • 1940’s - Bletchley Park, England breaks German codes
  8. 8. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Computing - History • 1947: Transistor invented • 1950s – First electronic, stored program computers – FORTRAN Programming Language • 1965: Moore’s Law stated • 1968: Knuth publishes volume 1 of TAOCP • 1972: C Programming Language • 1976: Cray-1 Supercomputer • 1984: DNS introduced to the internet • 1989: WWW invented • 1991: First version of Linux • 1993: First Beowulf cluster • 1994: Java Programming Language
  9. 9. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Von Neumann Model (1946) Processor(s) Memory (contains both program and data)
  10. 10. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Alan Turing (1912-1954) • Any computing language of a certain “power” can solve any “computable” problem – Store values in memory – Add one to values – Conditional execution based on a value in memory • Proof using the “Turing machine” Some problems may be more easily stated or understood in one language or another.
  11. 11. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net 1965: Moore’s Law
  12. 12. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net
  13. 13. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Donald Knuth • Professor Emeritus, Stanford University The Art of Computer Programming 1968: Volume One - Fundamental Algorithms 1969: Volume Two - Seminumerical Algorithms 1973: Volume Three - Searching and Sorting Literate Programming: “The main idea is to regard a program as a communication to human beings rather than as a set of instructions to a computer”
  14. 14. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Cray Supercomputers • 1976: Cray 1 – Cray 1 (XMP, YMP, C90, J90, T90) • 1985: – Cray 2 • 1993: – Cray 3 (one machine delivered) • … • Present: – X1, XT3, XD1, SX-6
  15. 15. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Clusters • 1993: – Beowulf: Custom interconnects (switched ethernet too expensive) • 2000: – Commercial cluster sales (Linux Networx) • 2003: – 7 of the top 10 supercomputers are clusters – 40% of the top 500 supercomputers are clusters • 2004: – Apple “workgroup cluster”
  16. 16. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net “Big Mac” • Virginia Tech: – (2003) 3rd fastest supercomputer in the world: $5.4 Million – Ordered from Apple’s web sales page. “Virginia Tech has aspirations … …This is one of those spires that one can build that will be visible from afar.” -Hassan Aref Dean of Engineering, Virginia Tech
  17. 17. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net
  18. 18. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net
  19. 19. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net 2004 - Apple Server Products • Xserve – Dual G5 – Up to 8GB RAM • XRAID – 5.6 TB Storage per unit – XSAN to combine up to 64TB • Apple Workgroup Cluster – Packaged with iNquiry
  20. 20. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net 2004: Cray X1 • Scales to 4,096 CPUs • 4 CPUs per Node • Scales to 32TB RAM, globally addressable • 34.1 GB/sec per CPU memory bandwidth
  21. 21. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net 2004: Orion MultiSystems • 10 to 96 CPUs in a desk side box. • 1.2GHz chips, but lots of them. • Pre-packaged cluster • Powered by a single, standard 15A wall socket
  22. 22. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Other Observations, 2004 • Major computer manufacturers find their profits in selling sub $500 consumer electronics and ink. • Style (see-through cases, round cables, etc) is the determining characteristic in workstation purchases
  23. 23. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net How Computers Work
  24. 24. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Context switching • At any one time, only one process is actually executing on one CPU. • Switching between processes requires time, and is driven by interrupts. • Capture all allocated memory and write off to memory On CPU Other jobs
  25. 25. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net OS and Interrupts • OS switches between processes from time to time • Also performs “housekeeping” tasks • Interrupts force OS context switches: – I/O – Power fault – Disk is ready to send data – …
  26. 26. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Memory • CPU / Registers – Physically part of the CPU – Immediately accessible bythe machine code – ~128 registers on modern chips • Cache – 1 - 2 MB of very fast memory, also built into the chip. • RAM – 1 - 8 GB (Cough cough)
  27. 27. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Memory Timings (2004) • CPU / Registers: – 10-9 seconds per instruction • Cache: – Low Latency • Memory – Latency: 102 cycles (~300 cycles) – Streaming: 0.8 GB / sec (~1 byte / cycle) • Disk – Latency: 10-3 seconds (106 cycles) – Streaming: 100 MB / sec (10-1 bytes / cycle) • Tape – Seconds to minutes
  28. 28. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Memory to the Program • One large address space: 0 - 232 (or 264, or some other value) relative to the program • When memory is used, the “stack” increases • The program’s “memory footprint” is the amount of memory allocated. Larger footprints can step out of cache and even out of RAM. • “Segmentation violation” means “you tried to access memory that’s not yours”
  29. 29. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net 32 vs 64 bits • Number of bits in a single memory “word” • Affects – Precision of calculations – Maximum memory address space – Compatibility of files – Memory data bandwidth – Marketing
  30. 30. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Potential Limits • Largest integer 232 or 264 • Largest file 2 GB – not usually a problem anymore, but it crops up at really annoying times • Smallest / largest floating point number • Number of files on disk (inodes)
  31. 31. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net System Monitoring Examples • Ps • Top • Iostat
  32. 32. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Abstract: Convenience vs. Time Hardware: Voltages, Clocks, Transistors Microcode Assembly Language Operating System User Interface
  33. 33. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Compiled vs. Interpreted • Script: – interpreted one line at a time – sh, csh, bash, tcsh, Perl, tcl, Ruby, … – Much faster development (to a point) – Can be slow – Program is relatively generic / portable (cough cough) • Compiled Language: – Code is translated into Assembly by a “compiler” – FORTRAN, PASCAL, C, C++, … – Can optimize for the specific architecture at compile time – Intellectual property is hidden in the compiled code
  34. 34. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Performance Measurement • Wall Clock Time: How long you waited • User Time: How much of “your” computer time was spent on this job • System Time: How much time the system spent on the actual job
  35. 35. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Example Program • Script in PERL • Program in C • Compile with optimization • Remove IO • Example memory allocation “bug”
  36. 36. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Performance Computing
  37. 37. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net What is “Super” computing? • Faster than what most other people are doing. • $106+ investment • Custom / innovative design It Depends http://www.top500.org
  38. 38. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Just make the system faster • Increase memory bandwidth • Increase memory size • Decrease clock time Clearly limited
  39. 39. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Superscalar Architectures • More than one ALU • More than one CPU • Instructions can happen in parallel • Most modern CPUs are superscalar to some level
  40. 40. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Pipelining • Break each instruction into a series of steps • Build a pipeline of the steps (as much as possible) Y = x + y; Z = y - z; 1) Load inst: add 2) Load data: X, Y Load inst: sub 3) Calc: + Load data: Y, Z Load inst: … 4) Store: Y Stall 5) Calc: - 6) Store: Z
  41. 41. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Branch Prediction if (x == 0) { a++; } else { b--; } Load Inst: “if” Load Data: “x” Load Instruction: Which one? • Always yes • Branch Prediction • Do both (superscalar processing pipeline) • Profile code and insert hints for runtime
  42. 42. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Vector Processing • SIMD - Single Instruction, Multiple Data Vector A Vector CVector B + =
  43. 43. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Vector Processing • Cray 1: 64 x 64 bit registers • Could be Software as well: • read the next instruction and decode it • get this number • get that number • add them • put the result here • get the 10 numbers here and add them to the numbers there and put the results here
  44. 44. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Parallel Computing is Not New
  45. 45. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net “Would you rather have 4 strong oxen, or 1024 chickens?” Seymour Cray (Cray Research Inc.) “Quantity has a quality all its own.” Russian Saying
  46. 46. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Amdahl’s Law • Gene Amdahl (Architect for the IBM 360) • Parallel Time = WS + (WP / N) + C(N) • N = Number of processors • WP = Parallel Fraction • WS = Serial Fraction • C(N) = Cost of setting up a job over N machines • Assumption: WP + WS = W Amdahl measured parallel fraction for several IBM codes of the day and found it to be approx. 1/2. This meant the maximum speedup on those codes would be a factor of 2.
  47. 47. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Embarrassingly Parallel • Large numbers of nearly identical jobs • Identical analysis on large pools of input data • Easily subdivided, mostly independent tasks (large code builds, raytracing, rendering, bioinformatic sequence analysis) • User writes serial code and executes in batches. • Nearly always a speedup
  48. 48. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Traditional Parallelism • A single process in which several parallel elements (threads) must communicate to solve a single problem. • Parallel programming is difficult, and far from automatic • Users must explicitly use parallel programming tools • Speedup is not guaranteed
  49. 49. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Parallel vs. Serial codes Entirely parallelizable X[0] = 7 + 5 X[1] = 2 + 3 X[2] = 4 + 5 X[3] = 6 + 8 Loop dependencies X[0] = 0 X[1] = X[0] + 1 X[2] = X[1] + 2 X[3] = X[2] + 3 Can be reduced to: X[n] = (1 + n)(n/2)
  50. 50. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Loop Unrolling • Simple, obvious things can be done automatically (and already are) • If the contents of the loop are invariant with the iterator, we can safely unroll the loop. for (n = 0; n < 10; n++) { a[n]++; } for (n = 1; n < 10; n++) { a[n-1] + 1;}
  51. 51. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Parallel Processing Architectures • Cycle Stealing / Network of Workstations • Single System Image – Log in and use the machine. – Parallelism can be hidden. • Message Passing vs. Shared Memory Architectures • Portal Architecture (cluster or compute farm) – Log in and submit jobs to be run in parallel. – Parallelism is explicit – Can use message passing.
  52. 52. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Network Of Workstations Cycle Stealing Workstations Public Network My Job
  53. 53. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Network Of Workstations Cycle Stealing Workstations Public Network Also My JobMy Job
  54. 54. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Labs of Workstations • Offer to improve the lab machines by installing hardware you need. • Do not make the users suffer. • Accept that this is a part time resource (return is much less than number of CPUs) • Unless the owner of the lab buys into the distributed computing idea, there will be trouble.
  55. 55. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Cycle Stealing • Variation in hosts • Data motion • Need small granularity in your problem • Condor (U. Wisconsin) • * @ Home • United Devices
  56. 56. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Shared Memory Multiprocessor • SGI Origin, others. • Limited scalability • Remarkably expensive
  57. 57. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net NUMA • Non Uniform Memory Architecture
  58. 58. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Message Passing • Start up multiple instances of the same program • Each figures out which one it is • Can send messages between them. • Requires a parallel programmer
  59. 59. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Supercomputing • Really exploiting and tuning code for a particular supercomputer requires a lot of hard work.
  60. 60. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Cluster Computing • A cost effective way to achieve large speedups • Throughput rather than High Performance. It’s all about power and money
  61. 61. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net
  62. 62. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Portal Architecture Compute Nodes Private Network Head Node Public Network Cluster Partitions Jobs of user 1
  63. 63. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Data motion Compute Nodes Private Network Head Node Public Network Cluster Partitions Data I/O a huge bottleneck for many types of computation
  64. 64. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Portal Architecture Compute Nodes Private Network Head Node Public Network Cluster Partitions Jobs of user 1 Jobs of user 2
  65. 65. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Distributed Resource Managers (DRM) • Maintain a queue of jobs • Schedule jobs onto compute nodes • Several Options, mostly identical: – Sun GridEngine (SGE) – Portable Batch System (PBS) – Load Sharing Facility (LSF - Platform Computing)
  66. 66. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Job Scheduling and Priority • First In, First Out (FIFO) • Fairshare – Try to maintain a goal level of usage on the cluster. – Going above that level lowers your priority – Not using the system for a while raises priority • Job Priority is a social / political issue.
  67. 67. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Job Scheduling • Sadly though, even though users and managers understand share-tree when the method is explained to them they tend to forget these details when they notice their jobs pending in the wait list. Users who have been told to expect a 50% entitlement to cluster resources get frustrated when they launch their jobs and don't get to take over half of the cluster instantly. Explaining to them that the 50% entitlement is a goal that the scheduler is working to meet "as averaged over time..." fall upon deaf ears. Heavy users get upset to learn that their current entitlement is being "penalized" because their past usage greatly exceeded their alloted share. Cluster admins then spend far too much time attempting to "prove" to the user community that they are not getting short changed
  68. 68. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Stages of Cluster Use 1. I just need to get this one set of data processed. 2. This is a task that I will perform frequently. 3. I am the bane of my local administrator. I have my own little cluster, plus a bunch of workstations in my lab. I wish I had administrative access to the big cluster. 4. I have a pipeline of data which will always be subject to the same analysis, and I run all my jobs on some large (set of) central resource(s)
  69. 69. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Example SGE usage
  70. 70. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Parallel Programming
  71. 71. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Why is the program slow? • Who cares? • Something about the way it was run • Something about the system on which it was run • Something about the program itself
  72. 72. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Solution Strategies High Performance High Throughput
  73. 73. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net “Premature optimization is the root of all evil in computer programming” Donald Knuth
  74. 74. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Photoshop Example • Steve and Phil’s Photoshop Demo – 8 minutes • On 8 Xserves – 1 minute? • No!
  75. 75. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Throughput • 8 X Photshop Demos on 8 Xserves – 8 minutes? • Yes! • With Some Effort
  76. 76. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Performance • Not 8 X Work in 1 X Time • But 1 X Work in 1/8 X Time • Partition the Problem? • Limited by –Application –Data Parallelism
  77. 77. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Performance • Sharpen, Blur, Diffuse, Rotate, etc. • Divide Task by Step? • No! – Steps are order dependent – Can’t merge results
  78. 78. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Divide by Step Local Area Network PS Sequence Distributed Resource Manager Sharpen Diffuse Rotate Blur etc. … … … •Divide Steps •Perform Steps •Merge Results
  79. 79. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Performance • Divide by Image? • 1/8th Image on Each of 8 Xserves – 1 minute? • Plausible, but… – new work – duplicated work
  80. 80. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Divide by Image Local Area Network PS Sequence Distributed Resource Manager 1/8 2/8 3/8 4/8 5/8 6/8 7/8 8/8 •Divide Image •Perform Steps •Merge Results
  81. 81. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Performance • Divide Task by Layer? • 1 of 8 Layers on Each of 8 Xserves – 1 minute? • Probably Yes! – If each layer computes in the same time
  82. 82. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Divide by Layer Local Area Network PS Sequence Distributed Resource Manager Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8 •Render Layers •Merge Results
  83. 83. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Performance • Same Work in Less Time • Application Specific • Data Specific • Use Specific • Unless?
  84. 84. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Tightly Coupled High Speed/Low Latency Switching Fabric Private Ethernet Network “Public” Local Area Network Photoshop •App Partitions •Pass messages •Share Memory •PVM, MPI
  85. 85. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Building Your Own HPC Environment
  86. 86. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Do It Yourself (DIY) • It is possible to build a high performance computing environment in your basement. • Home users can now run Linux without having to be computer experts. • Labs and corporations can run cluster computers without having to employ full time computer support staff.
  87. 87. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net DIY: Physical Considerations • Power – Resting draw vs Loaded draw. – Two Phase vs. Three Phase • Cooling – Air cooling the “Big Mac” would have required 60MPH winds • Space • Physical Security • Noise
  88. 88. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net DIY: Administration • ~1 FTE to maintain and provide access to a large cluster • Also plan on some portion of a software developer to assist with research (distinct from system administration)
  89. 89. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Cluster Size Thresholds • 1-16 nodes – Scrape by with poor practices – Make a grad student do it. • 32 nodes – Physical management issues loom large – Split out fileserver functions • 128 nodes: – Network gets interesting – Power / cooling become major issues • 1,024 nodes: – Call the newspaper
  90. 90. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Should I build my own cluster? • Install takes time: – Expert: ~1 day to configure – Novice: Weeks to never. • Systems support is ongoing. – Well managed: ~1 FTE / 1,000 compute nodes. • No need to share, custom configurations in real time.
  91. 91. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Infx Cluster Checklist • User Model • Applications • Compilers • Phys. Environment • Know Bottlenecks • Network & Topology • Storage • Maintenance • Administration & Monitoring • DRM • Common User Env and File System
  92. 92. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net User model & Use Cases • Single User, Few, Many Users • Groups of users • Are some more equal than others • Batch/bulk vs. singleton jobs • High-throughput or high-performance
  93. 93. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Application Requirements • Many short running processes • Few long running processes • Ave/Max RAM requirement • CPU and/or IO bound • Single/Multi-threaded • MPI/PVM/Linda parallel aware • Will it run under *nix
  94. 94. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Compilers • GNU tools are great, but… – consider commercial compiler options – Performance freak – Writing SMP or parallel apps – Serious scientific programs in C(++)/Fortran
  95. 95. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Phys/Environmental Constraints • Available Power • Available Cooling • Density (Blades/1U/2U/Tower) • DIY Staging space • Raised floor or ceiling drops • Height & width surprises • Fire code • Organizational standards
  96. 96. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net This could be you…
  97. 97. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Know your bottlenecks • I/O bound – Sequence analysis is limited by the speed of your disks and fileserver • CPU bound – Chemical and protein structure modeling are generally CPU bound • RAM bound – Some apps (eg Gaussian) are bound by speed of memory access • Network bound – Network IO/IPC
  98. 98. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Network & Interconnects • Bandwidth for IO/IPC • Parallel networks? • High speed interconnect(s)? –Enough PCI slots? • Network topology effects: –Scaling and growth –Wire management –Access to external networks
  99. 99. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net High Speed Interconnects • When low latency message passing is critical – Massively parallel applications • Not generally needed in BioClusters (yet) • Can add 50% or more to cost of each server • No magic, must be planned for – Applications, APIs, code, compilers, PCI slots, cable management & rack space • Commercial products – Myrinet (www.myricom.com) – Dolphin SCI (www.dolphinics.com)
  100. 100. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Storage • Most BioClusters are I/O bound • Not an area to pinch pennies • NAS vs. SAN vs. Local Disk • Pick the right RAID levels • Heterogeneous storage • Plan for data staging and caching
  101. 101. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net SAN vs. NAS vs. Local Disk • SAN generally inappropriate • NAS or Hybrid NAS/SAN is best – multiple clients with concurrent read/write access to the same file system or volume • Local Disk – Diskless nodes inappropriate – Expensive SCSI disks are unnecessary – Large, cheap IDE drives allow for clever data caching – Best way to avoid thrashing a fileserver
  102. 102. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net RAID Levels • RAID 0: Stripe – Fast / risky • RAID 1: Mirror – Great, if you can afford double the disk • RAID 5: – Able to lose any individual disk and still keep data • Striped Banks of RAID 5: – Scalable, stable solution
  103. 103. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Stage Data to Local Disk • A $250K NAS server can be brought to its knees by a few large BLAST searches • Active clusters will tax even the fastest arrays. This is mostly unavoidable. • Plan for data staging – Move data from fileserver to cheap local disk on cluster compute nodes – No magic; users and software developers need to do this explicitly in their workflows and pipelines
  104. 104. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Maintenance Philosophy • Compute nodes must be – Anonymous, Interchangeable, Disposable • 3 Possible states – Running / Online – Faulted / Re-imaging – Failed / Offline & marked for replacement • Administration must be – Scalable, Automated, Remote The Three “R”s of a cluster: Reboot, Reimage, Replace
  105. 105. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Monitoring and Reporting • Many high quality free tools – Ganglia, Big Brother, RRDTool, MRTG – Sar, Ntop, etc. etc. • Commercial tools too • Log files are the poor man’s trending tool – System, daemon, DRM
  106. 106. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net OS Installation and Updates • SystemImager- www.systemimager.org – “SystemImager is software that automates UNIX installs, software distribution, and production deployment.” – Unattended disk partitioning and UNIX installation – Incremental updates of active systems – Based on open standards and tools • RSYNC, SSH, DHCP, TFTP – Totally free, open source – Merging with IBM LUI project • Also: Apple NetBoot
  107. 107. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net Common DRM suites • Open Source and/or freely licensed – OpenPBS – Sun GridEngine • Commercially available – PBS Pro – Platform LSF
  108. 108. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net DRM: My $.02 • At this time Platform LSF is still technically the best choice for serious production BioClusters – Lowest administrative/operational burden – Fault tolerance features are unmatched • 2nd choice(s) – GridEngine, if you can support it internally Go with your local expertise
  109. 109. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net A Hybrid Approach • Systems group – Specify supported configurations – Maintain a central machine room – Configure scheduler to give owners priority on their own nodes. • Researchers – Estimate their computational load – Include line-item for the required number of nodes
  110. 110. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net What can I do today? • CS: – Take biology coursework – Accept that biology is really, really complex and difficult. • Bio: – Take CS coursework – Accept that computer engineering / software development is tricky. • Administrators: – Decide to build a “spire, which will be visible from afar” • All: – Attend Journal Clubs, symposia, etc. – Get a bigger monitor
  111. 111. © 2004 The BioTeam http://bioteam.net cdwan@bioteam.net The future • All scientists will write computer programs • “Computational Biology” will sound just as redundant as “Computational Physics” • Most labs will have a small cluster and some local expertise, plus collaborations with supercomputing centers • Grid / Web Services technology will enable cool things.

×