SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Guide to Building your Linux
 High-performance Cluster
       Edmund Ochieng
         March 2, 2012




               1
Abstract
    In modern day where computer simulation forms a critical part in
research, high-performance clusters have become a need in about every
educational or research institution.

  This paper aims to give you the instructions you need to setup your
personal computer. So if you are looking forward to setting up a cluster,
this is the guide for you.

   This guide is prepared with climate simulation in mind. However, be-
sides the software required for climate simualtion, steps required to setup
the cluster remain more or less the same.

  The setup aims to grant you the ability to run modelling, simulation
and visualisation applications across multiple processors. Probably more
than you can have in a single server unit.




                                    2
Contents

I   Master node Configuration                                                    5
1 Network configuration                                                          6
  1.1 Internal interface configuration . . . . . . . . . . . . . . . . . . .     6
  1.2 External interface configuration . . . . . . . . . . . . . . . . . . .     6

2 MAC address acquisition                                                       6
  2.1 System Documentation / Manuals . . . . . . . . . . . . . . . . .          7
  2.2 Netwotk Traffic Monitoring . . . . . . . . . . . . . . . . . . . . .        7
  2.3 TFTP Configuration . . . . . . . . . . . . . . . . . . . . . . . . .       7

3 DHCP configuration                                                             9

4 Local Repository                                                             11

5 EPEL Repository                                                              11

6 NFS configuration                                                             12

7 SSH Key Generation Script                                                    13


II Software and Compiler installation and configura-
tion                                                14
8 Torque configuration                                                          15

9 Maui configuration                                                            19

10 Compiler Installation                                                        21
   10.1 GCC Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
   10.2 Intel Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

11 OpenMPI installation                                                 21
   11.1 OpenMPI Compiled with GCC Compilers . . . . . . . . . . . . . 22
   11.2 OpenMPI Compiled with Intel Compilers . . . . . . . . . . . . . 22

12 Environment Modules installation                                            22

13 C3 Tools installation                                                       23

14 Password Syncing                                                            24

15 NetCDF, HDF5 and GrADs installation                                         24

16 NCL and NCO installation                                                    25

17 R Statistical package installation                                          25




                                        3
III   Computing Node Installation   26
18 Node OS installtion              27

19 Name resolution                  28




                           4
Part I
Master node Configuration




              5
1     Network configuration
1.1     Internal interface configuration
Set the network interface through which the DHCP service will listen for IP
address request to be static and to start on system boot up. This is should
appear similar to the configurations below.



    1. With a text editor of your choice, edit your master node network config-
       uration for the network interface to be used to communicate with other
       nodes in your cluster.

      [root@master ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
      # Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet
      DEVICE=eth0
      #BOOTPROTO=dhcp
      BOOTPROTO=static
      HWADDR=00:16:36:E7:8B:A3
      IPADDR=192.168.10.1
      NETMASK=255.255.255.0
      ONBOOT=yes
      DHCP_HOSTNAME=master.cluster

    2. Once the changes have been made, you can save the file and start the
       interface.
    3. Finally, you should invoke, the ifconfig instruction to confirm the settings
       are active as illustrated below.

      [root@master ~]# ifconfig eth0
      eth0 Link encap:Ethernet HWaddr 00:16:36:E7:8B:A3
           inet addr:192.168.10.1 Bcast:192.168.10.127 Mask:255.255.255.0
           UP BROADCAST MULTICAST MTU:1500 Metric:1
           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
           Interrupt:74 Memory:fdfc0000-fdfd0000

1.2     External interface configuration
The eth1 interface shall be connected to the organizational network and will
acquire network configuration via DHCP. So to have the inetrface working, all
that needs to be done is to set the ONBOOT option in /etc/sysconfig/network-
scripts/ifcfg-eth1 and connect a cable to the interface.


2     MAC address acquisition
The MAC address acquisition step is important as it allows the master node to
uniquely identify the nodes that make up the cluster and as a result give them
customized configuration.


                                        6
Each network interface has a unique MAC address which can be obtained
either from the system manuals/documentation or from listening to the network
traffic from the master node interface on which the dhcp shall be listening on.

2.1    System Documentation / Manuals
This could either be on the hardware such as is the case on Sun servers and a
couple of HP servers I’ve seen or on the booklets provided alongside the server.
However, this could at times be deceiving. If that is the case, you could always
listen on the network to obtain the desired MAC address.

2.2    Netwotk Traffic Monitoring
Using the tcpdump command, we can acquire the hardware interfaces’ MAC
address. For easy identification, each node should be turned on at any given
time during the MAC address collection process.

  From the tcpdump output below, we can identify the network interface MAC
address of the first node as 00:1b:24:3d:f1:a3 since the column just before the
second ”greater than” symbol is 0.0.0.0.68 - which basically means it has no ip
address and expects a response on UDP port 68.
[root@master ~]# tcpdump -i eth0 -nn -qtep port bootpc and port bootps 
 and ip broadcast
tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode listening on eth0, link-type EN10MB (Ethernet), capture size
 96 bytes
00:1b:24:3d:f1:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 590: 0.0.0.0.68 >
255.255.255.255.67: UDP, length 548
00:16:36:e7:8b:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 342: 192.168.10.1
.67 > 255.255.255.255.68: UDP, length 300

   Repeat the above process for all nodes to which you would like to issue static
IP addresses.

2.3    TFTP Configuration
The TFTP service is trivial for a PXE server to work as they serve provide a
netinstall kernel and a ramdisk to the clients when they attempt to do a network
boot.

   By default, tftp which is part of xinetd.d is disabled. You can have it enabled
by opening the configuration file and changing the value of the option ”disabled”
from yes to no. Your completed configuration file should be similar to the one
shown below

  1. Enable tftp which is part of the xinetd stack

      [root@master ~]# vi /etc/xinetd.d/tftp
      [root@master ~]# cat /etc/xinetd.d/tftp
      # default: off
      service tftp


                                        7
{
            socket_type                  =   dgram
            protocol                     =   udp
            wait                         =   yes
            user                         =   root
            server                       =   /usr/sbin/in.tftpd
            server_args                  =   -s /tftpboot
            disable                      =   no
            per_source                   =   11
            cps                          =   100 2
            flags                        =   IPv4
   }

2. Once done, restart the service xinetd to start tftp alongside other services
   on the next start.

   [root@master ~]# service xinetd restart
   Stopping xinetd:                                               [   OK   ]
   Starting xinetd:                                               [   OK   ]

3. Check if a tftpboot directory has been created on the root directory tree
   as is shown below

   [root@master ~]# file /tftpboot/
   /tftpboot/: directory

4. Create a directory tree into which the pxe files shall be placed.

   [root@master ~]# mkdir -p /tftpboot/pxe/pxelinux.cfg

5. Copy the netboot kernel image and an initial ramdisk.

   [root@master ~]# ls /distro/centos/images/pxeboot/
   initrd.img README TRANS.TBL vmlinuz
   [root@master ~]# cp /distro/centos/images/pxeboot/{vmlinuz,
   initrd.img} /tftpboot/pxe/

6. Locate the pxelinux.0 file and copy to the /tftpboot/pxe directory from
   where it should be accessible via tftp daemon.

   [root@master ~]# locate pxelinux.0
   /usr/lib/syslinux/pxelinux.0
   [root@master ~]# cp -av /usr/lib/syslinux/pxelinux.0 /tftpboot/pxe/
   ‘/usr/lib/syslinux/pxelinux.0’ -> ‘/tftpboot/pxe/pxelinux.0’

   NOTE: Keenly note the location of the pxelinux.0 file as its relative
   path(i.e. from the tftp root directory - /tftpboot) will be used in the
   DHCP daemon configuration section.
7. Create a default boot configuration file for machines that may not have a
   specific boot file in the pxelinux.cfg directory.




                                     8
[root@master ~]# vi /tftpboot/pxe/pxelinux.cfg/default
      [root@master ~]# cat /tftpboot/pxe/pxelinux.cfg/default
      # /tftpboot/pxe/pxelinux.cfg/default

      prompt 1

      timeout 100

      default local

      label local
       LOCALBOOT 0

      label install
       kernel vmlinuz
       append initrd=initrd.img network ip=dhcp lang=en US keymap=us 
       ksdevice=eth0 ks=http://192.168.10.1/ks/node-ks.cfg 
       loadramdisk=1 prompt_ramdisk=0 ramdisksize=16384 vga=normal 
       selinux=0

    8. Get the hexadecimal equivalent of the nodes ip address used to creat a
       per client pxe configuration.

      [root@master pxelinux.cfg]# gethostip node01
      node01 192.168.10.2 C0A80A02
      [root@master pxelinux.cfg]# cp default C0A80A02

    9. Copy the default file to a file with the hex equivalent obtained above.
       Open the file and change the line default local to default install. This
       should commence installation on rebooting node01. The same should be
       done for all other nodes.

      [root@master ~]# cp /tftpboot/pxe/pxelinux.cfg/default /tftpbo
      ot/pxe/pxelinux.cfg/C0A80A02


3     DHCP configuration
To issue static ip addresses via the DHCP daemon, the network interface hard-
ware(or MAC) addresses collected in the MAC address collection section will
be necessary.

  DHCP daemon configuration for the cluster should carried out as outlined in
the steps below.

    1. Enter the name of the interface through which the DHCP daemon will be
       listening on.

      [root@master ~]# cat /etc/sysconfig/dhcpd
      # Command line options here
      DHCPDARGS="eth0"

    2. Create your DHCP configuration file, from the sample file in the location
       below.


                                       9
[root@master ~]# cp /usr/share/doc/dhcp-3.0.5/dhcpd.conf.sample 
  /etc/dhcpd.conf
  cp: overwrite ‘/etc/dhcpd.conf’? y

3. You could edit your your configurations to look more or less like my con-
   figurations issuing addresses to desired hosts using their MAC addresses
   as illustrated below.
  [root@master ~]# cat /etc/dhcpd.conf
  ddns-update-style interim;
  ignore client-updates;
  allow booting;
  allow bootp;

  subnet 192.168.10.0 netmask 255.255.255.0 {

  # --- default gateway
  #       option routers                       192.168.0.1;
          option subnet-mask                   255.255.255.0;

  #        option nis-domain                   "domain.org";
           option domain-name                  "cluster";
           option domain-name-servers          192.168.10.1;

           option time-offset                  10800; # EAT
  #        option ntp-servers                  192.168.1.1;
  #        option netbios-name-servers         192.168.1.1;

  #        range dynamic-bootp 192.168.10.4 192.168.10.20;
           default-lease-time 21600;
           max-lease-time 43200;
           filename "pxe/pxelinux.0";
           next-server 192.168.10.1;

           # we want the nameserver to appear at a fixed address
           host node01 {
                   hardware ethernet 00:1b:24:3d:f1:a3;
                   fixed-address 192.168.10.2;
                   option host-name "node01";
           }

           host node02 {
                   hardware ethernet 00:1b:24:3e:05:d1;
                   fixed-address 192.168.10.3;
                   option host-name "node02";
           }

           host node03 {
                   hardware ethernet 00:1b:24:3e:04:f6;
                   fixed-address 192.168.10.4;
                   option host-name "node03";
           }
  }

4. Finally, save the configuration file and start the server.


                                    10
[root@master ~]# service dhcpd start
      Starting dhcpd:                                                [   OK   ]

    5. Should the starting of DHCP daemon fail, you could look at the logs at
       /var/logs/messages and identify any DHCP daemon related errors. This
       could be done using the GNU/Linux editor but for better troubleshooting,
       I’d proceed as below.

      [root@master ~]# tail -f /var/log/messages


4     Local Repository
A local repository is very crucial in cases of poor Internet connectivity.

    1. Create a directory on the system and make it copy all the contents of the
       installation disk into it.

      [root@master ~]# mkdir -p /distro/centos
      [root@master ~]# cp -ar /media/CentOS_5.6_Final/*        /distro/centos

    2. Create a new repository file that would point to the location created above.

      [root@master ~]# cat /etc/yum.repos.d/CentOS-Local.repo
      [Local]
      name=CentOS- - Local
      baseurl=file:///distro/centos
      gpgcheck=0
      enabled=1

    3. Clear the cache and any other repository information saved locally

      [root@master ~]# yum clean all

    4. Make a cache of the new available repositories.

      [root@master ~]# yum makecache


5     EPEL Repository
The addition of the EPEL(Extraa Packages for Enterprise Linux) repository
was crucial in the facilitation of the installation of some of the software needed
in the cluster and which installation from source was not quite a simple process.
These are such as:

    1. R - R Statistical package http://www.r-project.org/
    2. NCO - NetCDF Operator http://nco.sourceforge.net/
    3. CDO - Climate Data Operators
    4. NCL - NCAR Command Language
       http://www.ncl.ucar.edu/Applications/rcm.shtml


                                        11
5. GrADS - Grid Analysis and Display System http://www.iges.org/
This is done as illustrated below:
[root@master ~]# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5
/x86_64/epel-release-5-4.noarch.rpm
Retrieving http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-re
lease-5-4.noarch.rpm
warning: /var/tmp/rpm-xfer.Ln8ILG: Header V3 DSA signature: NOKEY, key
 ID 217521f6
Preparing...         ########################################### [100%]
   1:epel-release   ########################################### [100%]


6     NFS configuration
We shall export some of the master node’s filesystem to reduce the need for
repetitive configuration.

    1. Populate the /etc/exports configuration file with the directories you’d wish
       to have exported via nfs.

      [root@master ~]# vi /etc/exports
      /distro         *(ro,root_squash)
      /home           *(rw,root_squash)
      /distro/centos          *(ro,root_squash)
      /distro/ks              *(ro,root_squash)
      /opt            *(ro,root_squash)
      /usr/local              *(ro,root_squash)
      /scratch        *(rw,root_squash)

    2. Start the nfs daemon. Which should start succesfully should your config-
       urations.

      [root@master   ~]# service nfs start
      Starting NFS   services:                                      [   OK   ]
      Starting NFS   quotas:                                        [   OK   ]
      Starting NFS   daemon:                                        [   OK   ]
      Starting NFS   mountd:                                        [   OK   ]

    3. Make the nfs daemon to autostart without on system start up.

      [root@master ~]# chkconfig nfs on

      [root@master ~]# exportfs -vra
      exporting *:/distro/centos
      exporting *:/distro/ks
      exporting *:/usr/local
      exporting *:/scratch
      exporting *:/distro
      exporting *:/home
      exporting *:/opt




                                       12
7        SSH Key Generation Script
To allow jobs to be succesfully submitted to the cluster, passwordless ssh login
should be possible for all users on the cluster. So the script below will create
a key pair and copy it over to the authorized keys file in the .ssh/ directory in
each users home directory.

  This shall be automated by the script below which we shall place in system-
wide /etc/profile.d directory.

[root@master modulefiles]# cat /etc/profile.d/passwordless-ssh.sh


                           Listing 1: /etc/profile.d/passwordless-ssh.sh
#! / b i n / b a s h
#
# / e t c / p r o f i l e . d/ p a s s w o r d l e s s −s s h . sh
#

i f [ ! −d ” $ {HOME} ” / . s s h / −o ! −f ” $ {HOME} ” / . s s h / i d d s a . pub ]
then
  echo −ne ” G e n e r a t i n g s s h k e y s :  t ”
  ssh−keygen −t dsa −N ” ” −f ” $ {HOME} ” / . s s h / i d d s a
   i f [ ” $ ? ” −eq 0 ] ; then
      echo −e ” [  0 3 3 [ 3 2 ; 1m done  0 3 3 [ 0m] ” ;
      c a t ” $ {HOME} ” / . s s h / i d d s a . pub >> ” $ {HOME} ” / . s s h / a u t h o r i z e d k e y s
      chmod −R u+rwX , go= ” $ {HOME} ” / . s s h /
   else
     echo −e ” [  0 3 3 [ 3 5 ; 1m f a i l e d  0 3 3 [ 0m] ”
   fi
fi




                                                            13
Part II
Software and Compiler
installation and configuration




                14
8     Torque configuration
    1. Untar the source and execute the configure script with the following below.

       [root@master src]# tar xvfz torque-2.4.14.tar.gz
       [root@master src]# cd torque-2.4.14
       [root@master torque-2.4.14]# mkdir build
       [root@master torque-2.4.14]# cd build
       [root@master build]# ../configure --help
       [root@master build]# ../configure --prefix=/opt/torque --
       enable-server --enable-mom --enable-clients --disable-gui
        --with-rcp=scp

    2. Compile the code to create binary files by executing ”make”, followed by
       ”make install” to install the binaries.

       [root@master build]# make
       [root@master build]# make install

    3. Add the path for the sbin directory to the root user’s .bashrc file.

       [root@master torque-2.4.14]# echo "export PATH=/opt/torqu
       e/sbin:$PATH" >> /root/.bashrc
       [root@master torque-2.4.14]# tail -n 1 ~/.bashrc
       export PATH=/opt/torque/sbin:$PATH

    4. Copy the pbs mom script in the contrib/init.d directory of the installation
       source /opt/torque/pbs mom.init. Open the file in an editor of your choice
       and ammend any erroneous paths.

       [root@master torque-2.4.14]# cp contrib/init.d/pbs_mom 
       /opt/torque/pbs_mom.init
       [root@master torque-2.4.14]# vi /opt/torque/pbs_mom.init

    5. Copy the node install.sh script into the torque install directory. It will be
       used to install pbs mom on the computing nodes.

                                        Listing 2: node install.sh
      #! / b i n / b a s h
      # / o p t / t o r q u e / n o d e i n s t a l l . sh

      # h t t p : / / e p i c o . e s c i e n c e −l a b . o r g
      # mailto : baro@democritos . i t

      TORQUEHOME=/opt / t o r q u e /
       TORQUEBIN=$TORQUEHOME/ b i n
         MAUIBIN=/opt / maui / b i n
           SPOOL=/v a r / s p o o l / t o r q u e

       mkdir −vp $SPOOL

       cd $SPOOL | |           exit

      #===========================================================#




                                                          15
mkdir −vp aux mom priv / j o b s mom logs c h e c k p o i n t s p o o l
    undelivered

chmod −v 1777 s p o o l u n d e l i v e r e d

for s in prologue epilogue
do
        t e s t −e $TORQUEHOME/ s c r i p t s / $ s && 
                 l n −sv $TORQUEHOME/ s c r i p t s / $ s $SPOOL/ mom priv /
done

#===========================================================#

c a t << EOF > p b s e n v i r o n m e n t
PATH=/b i n : / u s r / b i n
LANG  =C
   EOF

#===========================================================#

echo master > s e r v e r n a m e

#===========================================================#

c a t << EOF > mom priv / c o n f i g
 $clienthost    master
 $logevent      0 x7f
 $usecp         ∗ : / u /u
 $usecp         ∗ : / home /home
 $usecp         ∗:/ scratch / scratch
   EOF

#===========================================================#

MOM INIT=/ e t c / i n i t . d/pbs mom

cp −va / opt / t o r q u e /pbs mom . i n i t $MOM INIT
chmod +x $MOM INIT

c h k c o n f i g −−add pbs mom
c h k c o n f i g pbs mom on

# i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( pbs mom i s NOT
      p a m l i m i t s aware )

e g r e p ’ u l i m i t [ [ : s p a c e : ] ] + . ∗ − l [ [ : s p a c e : ] ] ’ $MOM INIT | | 
p e r l −e ’
w h i l e (<>) {
         print ;
         i f ( / ˆ [  t ]+ s t a r t  ) / )
         {
             p r i n t << EOF ;
             # −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−#
             # i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( no−p a m l i m i t s −
                      aware )
             # max l o c k e d memory , s o f t and hard l i m i t s f o r a l l PBS
                      children
             u l i m i t −H − l u n l i m i t e d
             u l i m i t −S − l 4096000
             # s t a c k s i z e , s o f t and hard l i m i t s f o r a l l PBS
                      children
             u l i m i t −H −s u n l i m i t e d



                                              16
u l i m i t −S −s 1024000
             #−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−#
    EOF
            }
   }
   ’ − i $MOM INIT

  #===========================================================#

   c a t << EOF > / e t c / p r o f i l e . d/ pbs . sh
   e x p o r t PATH=$TORQUEBIN: $MAUIBIN : $PATH
      EOF

  #EOF


6. In an editor of your choice, enter the fully qualified domain name of your
   master node in the file below.

  [root@master torque-2.4.14]# vi /var/spool/torque/server_name
  master.cluster

7. Add your nodes and the their properties into the nodes file as shown below.

  [root@master torque-2.4.14]# vi /var/spool/torque/server_priv/nodes
  node01 np=4
  node02 np=4
  node03 np=4

8. Initialize the serverdb and start the torque pbs server as shown below

  [root@master ~]# pbs_server -t create
  [root@master ~]# service pbs_server start
  Starting TORQUE Server:                                        [   OK   ]

9. Create a queue(s) to suit your configuration and make at least one of
   default using the torque qmgr command. An easier way would be to
   create a file as below

  [root@master ~]# vi qmgr.cluster
  create queue default
  set queue default queue_type = Execution
  set queue default Priority = 60
  set queue default max_running = 128
  set queue default resources_max.walltime = 168:00:00
  set queue default resources_default.walltime = 01:00:00
  set queue default max_user_run = 12
  set queue default enabled = True
  set queue default started = True

  set   server   scheduling = True
  set   server   managers = maui@master
  set   server   managers += root@master
  set   server   operators = maui@master
  set   server   operators += root@master
  set   server   default_queue = default


                                         17
10. Load the enter the file containing the qmgr configuration as illustrated
    below

    [root@master ~]# qmgr -c < qmgr.cluster

11. A print of the pbs server configuration looks as below

    [root@master ~]# qmgr -c ’p s’
    #
    # Create queues and set their attributes.
    #
    #
    # Create and define queue default
    #
    create queue default
    set queue default queue_type = Execution
    set queue default Priority = 60
    set queue default max_running = 128
    set queue default resources_max.walltime = 168:00:00
    set queue default resources_default.walltime = 01:00:00
    set queue default max_user_run = 12
    set queue default enabled = True
    set queue default started = True
    #
    # Set server attributes.
    #
    set server scheduling = True
    set server acl_hosts = master.cluster
    set server managers = maui@master
    set server managers += root@master
    set server operators = maui@master
    set server operators += root@master
    set server default_queue = default
    set server log_events = 511
    set server mail_from = adm
    set server query_other_jobs = True
    set server scheduler_iteration = 600
    set server node_check_rate = 150
    set server tcp_timeout = 6
    set server next_job_number = 26

12. Restart both the pbs server on the master node and the pbs mom on the
    nodes and execute, pbsnodes to see a print out on all free nodes.

    [root@master ~]# pbsnodes
    node01
         state = free
         np = 2
         ntype = cluster
         status = rectime=1308321567,varattr=,jobs=,state=free,
    netload=1205591,gres=,loadave=0.18,ncpus=4,physmem=4051184
    kb,availmem=5021068kb,totmem=5103400kb,idletime=0,nusers=0,
    nsessions=? 0,sessions=? 0,uname=Linux node01 2.6.18-238.
    el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux


                                    18
node02
           state = free
           np = 2
           ntype = cluster
           status = rectime=1308321569,varattr=,jobs=,state=free,
      netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184
      kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0,
      nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238.
      el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux

      node03
           state = free
           np = 2
           ntype = cluster
           status = rectime=1308321569,varattr=,jobs=,state=free,
      netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184
      kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0,
      nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238.
      el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux


9     Maui configuration
    1. Untar, configure, make binaries and install maui from source as shown in
       the next sequence of steps

      [root@master ~]# tar xvfz maui-3.3.1.tar.gz
      [root@master ~]# cd maui-3.3.1
      [root@master maui-3.3.1]# ./configure --help
      [root@master maui-3.3.1]# ./configure --prefix=/opt/maui
       --with-spooldir=/var/spool/maui --with-pbs=/opt/torque/
      [root@master maui-3.3.1]# make
      [root@master maui-3.3.1]# make install

    2. Create a system user maui through which maui shall be run

      [root@master maui-3.3.1]# useradd -d /var/spool/maui -r -g daemon 
      maui

    3. Edit the maui.cfg file changing the SERVERHOST, ADMIN1, ADMIN3
       and resouce manager definition(RMCFG) as shown in the snipett below

      [root@master maui-3.3.1]# vi /var/spool/maui/maui.cfg
      # maui.cfg 3.3.1

      SERVERHOST            master
      # primary admin must be first in list
      ADMIN1                maui root
      ADMIN3       ALL

      # Resource Manager Definition

      RMCFG[MASTER] TYPE=PBS


                                      19
# Allocation Manager Definition

   AMCFG[bank]   TYPE=NONE
   ....
   EOF

4. Copy the init script in the maui source package to /etc/init.d/ and, edit the
   file changing the MAUI PREFIX to point to your installation directory.

   [root@master maui-3.3.1]# cp contrib/service-scripts/redhat. 
   maui.d /etc/init.d/maui
   [root@master maui-3.3.1]# vi /etc/init.d/maui
   [root@master maui-3.3.1]# cat /etc/init.d/maui
   #!/bin/sh
   #
   # maui This script will start and stop the MAUI Scheduler
   #
   # chkconfig: 345 85 85
   # description: maui
   #
   ulimit -n 32768
   # Source the library functions
   . /etc/rc.d/init.d/functions

   MAUI_PREFIX=/opt/maui

   # let see how we were called
   case "$1" in
         start)
                  echo -n "Starting MAUI Scheduler: "
                  daemon --user maui $MAUI_PREFIX/sbin/maui
                  echo
                  ;;
         stop)
                  echo -n "Shutting down MAUI Scheduler: "
                  killproc maui
                  echo
                  ;;
         status)
                  status maui
                  ;;
         restart)
                  $0 stop
                  $0 start
         ;;
         *)
                  echo "Usage: maui {start|stop|restart|status}"
                  exit 1
   esac

5. Create a file maui.sh in the /etc/profile.d directory and to it add the
   environment variables PATH, INCLUDE and LD LIBRARY PATH and
   make it executable.


                                     20
[root@master maui]# vi /etc/profile.d/maui.sh
       [root@master maui]# chmod +x /etc/profile.d/maui.sh


10      Compiler Installation
A compilers is necessary in a cluster as they aid in the changing of source
code into executables that can be run or understood by the computer. Of
interest are C, C++ and fortran compilers popular of which are the GCC and
Intel compilers. Another, option is the PGI compilers which we shall not have
installed.

10.1     GCC Compilers
From the CentOS repositories we shall install the GCC compilers using the yum
package management utility.

[root@master src]# yum -y install gcc.x86_64 gcc-gfortran.x86_64 
libstdc++.x86_64 libstdc++-devel.x86_64 libgcj.x86_64 compat-lib 
stdc++.x86_64


10.2     Intel Compilers
For the Intel compilers which may give better results depending on the scenario,
we shall proceed with the installation as outlined below:
  1. Visit the Intel Website in your preferred web browser, register and down-
     load the Intel compilers for non-commercial use.

  2. Move to the directory into which you downloaded the Intel C compilers
     and Fortran compilers.
  3. Untar the tarballs and change directory into the created directory.

       [root@master   ~]# tar xvfz l_ccompxe_2011.4.191.tgz
       [root@master   ~]# cd l_ccompxe_2011.4.191
       [root@master   l_ccompxe_2011.4.191]# ./install.sh
       [root@master   ~]# tar xvfz l_fcompxe_2011.4.191.tgz
       [root@master   ~]# cd l_fcompxe_2011.4.191
       [root@master   l_fcompxe_2011.4.191]# ./install.sh

  4. Execute the install.sh script and proceed as prompted.


11      OpenMPI installation
OpenMPI is an open source library implementation of the Message Passing
Interface(MPI-2) and facilitates communication/message inter-change between
process in a High Performance Computing environment.




                                       21
11.1     OpenMPI Compiled with GCC Compilers
  1. Untar and compile the sources

       [root@master src]# tar xvfj openmpi-1.4.2.tar.bz2
       [root@master src]# cd openmpi-1.4.2
       [root@master openmpi-1.4.2]# mkdir build
       [root@master openmpi-1.4.2]# cd build/
       [root@master build]# ../configure CC=gcc CXX=g++ FC=gfortran 
       F77=gfortran --prefix=/opt/openmpi/1.4.2/gcc/4.1.2 
       --with-tm=/opt/torque/

  2. Create binaries by running ”make”

       [root@master build]# make

  3. Finally, install the binaries into the system

       [root@master build]# make install


11.2     OpenMPI Compiled with Intel Compilers
  1. Untar and compile the sources as above. However, take keen notice of the
     value of the variables CC, CXX, FC and F77 as compared to the same
     step when compiled with the GCC compilers above.

       [root@master src]# tar xvfj openmpi-1.4.2.tar.bz2
       [root@master src]# cd openmpi-1.4.2
       [root@master openmpi-1.4.2]# mkdir build
       [root@master openmpi-1.4.2]# cd build/
       [root@master build]# ../configure CC=icc CXX=icpc FC=ifort 
       F77=ifort --prefix=/opt/openmpi/1.4.2/intel/12.0.4 
       --with-tm=/opt/torque/

  2. Create binaries by running ”make”

       [root@master build]# make

  3. Finally, install the binaries into the system

       [root@master build]# make install


12      Environment Modules installation
  1. Obtain the environment modules source file, uncompress it and changee
     directory into the created directory as below

       [root@master src]# tar xvfz modules-3.2.8a.tar.gz
       [root@master src]# cd modules-3.2.8

  2. Then compile the sources specifying a prefix where the sources should be
     installed.


                                       22
[root@master modules-3.2.8]# ./configure --prefix=/opt

     Should, you be running a 64-bit system and encounter an error indicating
     tcl lib and include directories cannot be found, proceed as below

     [root@master modules-3.2.8]# ./configure --with-tcl-lib=/usr/lib64/
      --with-tcl-inc=/usr/include/ --prefix=/opt

 3. Then create binaries and install.

     [root@master modules-3.2.8]# make
     [root@master modules-3.2.8]# make install

 4. Finally, copy the init scrips to the /etc/profile.d directory to make the
    modules command available system-wide.

     [root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash /etc/
     profile.d/modules.sh
     [root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash_compl
     etion /etc/profile.d/modules_bash_completion.sh


13    C3 Tools installation
 1. Uncompress the C3 tools source package and execute the install script

     [root@master src]# tar xvfz c3-4.0.1.tar.gz
     [root@master src]# cd c3-4.0.1
     [root@master c3-4.0.1]# ./Install-c3

 2. Create a c3.conf configuration file defining a cluster name, the master node
    and nodes in the cluster.

     [root@master c3-4.0.1]# vi /etc/c3.conf
     [root@master c3-4.0.1]# cat /etc/c3.conf
     cluster cluster1 {
     master:master
     node0[1-3]
     }

 3. Create ssh keys to be used for passwordless login in the nodes of the
    cluster.

     [root@master ~]# ssh-keygen -t dsa
     Generating public/private dsa key pair.
     Enter file in which to save the key (/root/.ssh/id_dsa):
     Created directory ’/root/.ssh’.
     Enter passphrase (empty for no passphrase):
     Enter same passphrase again:
     Your identification has been saved in /root/.ssh/id_dsa.
     Your public key has been saved in /root/.ssh/id_dsa.pub.
     The key fingerprint is:
     46:6d:e5:e5:e2:5c:b5:72:16:bc:04:6f:59:2c:b5:32 root@master
     .cluster


                                        23
4. Copy the /.ssh/id dsa.pub contents to the authorized keys file of all nodes
      in the cluster. This is how to do it on a single node.
       [root@master ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@node01
       21
       The authenticity of host ’node01 (192.168.10.2)’ can’t be es
       tablished. DSA key fingerprint is fe:8d:bf:6e:de:f4:94:d3:c4:
       d7:ee:74:6c:8c:dd:da. Are you sure you want to continue conn-
       ecting (yes/no)? yes
       Warning: Permanently added ’node01,192.168.10.2’ (RSA) to the
        list of known hosts.
       root@node01’s password:
       Now try logging into the machine, with "ssh ’root@node01’",
       and check in:

          .ssh/authorized_keys

       to make sure we haven’t added extra keys that you weren’t
       expecting.

   5. Test if the key was succesfully registered by attempting to login into
      node01.
       [root@master ~]# ssh node01
       Last login: Fri Jun 17 12:53:28 2011
       [root@node01 ~]# exit
       logout


14       Password Syncing
User accounts and passwords in the cluster should be similar in all nodes form-
ing the cluster should be the same however, we cant have the user create the
password in all the machines that form up the cluster. We shall therefore create
a script to effect this. In our case we shall use the cpush command from the c3
tools package installed earlier.
                                     Listing 3: node-ks.cfg
#! / b i n / b a s h
#
# Sync / e t c / passwd , / e t c / shadow and / e t c / group
# File : / root / bin
# Cron : min hour dom month dow r o o t / e t c / password−push . sh

f o r f i n passwd shadow group ; do
  / opt / c3 −4/cpush / e t c / ” $ { f } ” > / dev / n u l l
done



  However, have in mind that rsync could be used to achieve the same.


15       NetCDF, HDF5 and GrADs installation
Grads requires NetCDF and HDF5 as dependencies for its installtion. Therefore,
we shall install them all as a pack from the epel repositories.


                                                  24
[root@master ~]# yum -y install netcdf hdf5 grads


16     NCL and NCO installation
These too we shall have installed using the yum package manager as below

[root@master ~]# yum -y install ncl nco


17     R Statistical package installation
The R statistical package will be installed from the epel repositories to save as
from the agony of installing a myraid of dependencies and for easy updating of
the packages.
[root@master ~]# yum -y install R.x86_64 R-core.x86_64 R-devel.x86_64 
  libRmath.x86_64 libRmath-devel.x86_64




                                       25
Part III
Computing Node Installation




               26
18         Node OS installtion
With the master node setup complete, installtion of the nodes should just be a
push of a button. However, a little understanding of the node-ks.cfg is essential.
It marks the packages tftp, openssh-server, openssh, xorg-x11-xauth, mc and
strace for installation and those with a preceeding − sign for uninstalltion.

  There after, the post installation section is executed, which removes unwanted
packages, creates a local repository, and install the gcc compilers on the nodes
which are available on the CentOS repositories.

                                           Listing 4: node-ks.cfg
tftp
openssh−s e r v e r
openssh
xorg−x11−xauth
mc
strace
−cups
−cups−l i b s
−b l u e z − u t i l s
−b l u e z −gnome
−rp−pppoe
−ppp

%p o s t −−l o g =/ r o o t / ks−p o s t . l o g
MASTER= 1 9 2 . 1 6 8 . 1 0 . 1

# D e l e t e unwanted s e r v i c e s
f o r i in sendmail ;
do
    c h k c o n f i g −−d e l ” $ { i } ”
done

# Remove d e f a u l t r e p o s
t a r c v f z yum . r e p o s . d . t a r . gz / e t c /yum . r e p o s . d
rm − r f / e t c /yum . r e p o s . d/∗

# Mount / d i s t r o form master node
mkdir −p / d i s t r o
mount −t n f s $MASTER: / d i s t r o / d i s t r o

# Add mount t o f s t a b
echo −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / d i s t r o  t / d i s t r o  t  t n f s  t d e f a u l t s  t 0 0 ” | t e e
     −a / e t c / f s t a b

# Add master node ’ s / o p t t o f s t a b
echo −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / opt  t / opt  t  t n f s  t d e f a u l t s  t 0 0 ” | t e e −a /
    etc / fstab

# Add master node ’ s /home t o f s t a b
echo −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / home t /home t  t n f s  t d e f a u l t s  t 0 0 ” | t e e −a
    / etc / fstab

# E x e c u t e t h e n o d e i n s t a l l . sh s c r i p t t o i n s t a l l pbs mom
/ opt / t o r q u e / n o d e i n s t a l l . sh

# Create l o c a l repo
mkdir −p / d i s t r o / c e n t o s
echo −e ” [ L o c a l ] nname=CentOS−$ r e l e a s e v e r − L o c a l  n b a s e u r l= f i l


                                                          27
e : / / / d i s t r o / c e n t o s  ngpgcheck=0  n e n a b l e d=1” | t e e / e t c /yum . r e p o s
. d/CentOS−L o c a l . r e p o

yum c l e a n a l l
yum makecache

# GCC c o m p i l e r s
yum −y i n s t a l l g c c . x 8 6 6 4 gcc−g f o r t r a n . x 8 6 6 4 l i b s t d c ++. x 8 6 6 4
                   −d
    l i b s t d c++ e v e l . x 8 6 6 4 l i b g c j . x 8 6 6 4 compat−l i b s t d c ++. x 8 6 6 4



  Once the installation is complete, you could have a look at the ks-post.log
in root’s home directory for any errors while executing the post section of the
kickstart file.


19        Name resolution
Finally, ensure that all the nodes in the cluster can resolve names of the nodes in
the cluster. You can either setup DNS on the master node or use the /etc/hosts
file.
    SHould you need help setting up a DNS server, post your requests in the
comments below.




                                                    28

Weitere ähnliche Inhalte

Ähnlich wie Linux hpc-cluster-setup-guide

How to Use GSM/3G/4G in Embedded Linux Systems
How to Use GSM/3G/4G in Embedded Linux SystemsHow to Use GSM/3G/4G in Embedded Linux Systems
How to Use GSM/3G/4G in Embedded Linux SystemsToradex
 
7 hands on
7 hands on7 hands on
7 hands onvideos
 
OpenWRT manual
OpenWRT manualOpenWRT manual
OpenWRT manualfosk
 
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...Salem Trabelsi
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation ToolsEdwin Beekman
 
9 creating cent_os 7_mages_for_dpdk_training
9 creating cent_os 7_mages_for_dpdk_training9 creating cent_os 7_mages_for_dpdk_training
9 creating cent_os 7_mages_for_dpdk_trainingvideos
 
Sharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linuxSharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linuxjasembo
 
Tutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi Subsystem
Tutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi SubsystemTutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi Subsystem
Tutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi SubsystemDheryta Jaisinghani
 
101 apend. networking linux
101 apend. networking linux101 apend. networking linux
101 apend. networking linuxAcácio Oliveira
 
NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)olatunde ismaila
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV FeaturesRaul Leite
 
Free radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmapleFree radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmapleChanaka Lasantha
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rulesFreddy Buenaño
 
ccna 1 chapter 2 v5.0 exam answers 2014
ccna 1 chapter 2 v5.0 exam answers 2014ccna 1 chapter 2 v5.0 exam answers 2014
ccna 1 chapter 2 v5.0 exam answers 2014Đồng Quốc Vương
 

Ähnlich wie Linux hpc-cluster-setup-guide (20)

How to Use GSM/3G/4G in Embedded Linux Systems
How to Use GSM/3G/4G in Embedded Linux SystemsHow to Use GSM/3G/4G in Embedded Linux Systems
How to Use GSM/3G/4G in Embedded Linux Systems
 
7 hands on
7 hands on7 hands on
7 hands on
 
Cloud RPI4 tomcat ARM64
Cloud RPI4 tomcat ARM64Cloud RPI4 tomcat ARM64
Cloud RPI4 tomcat ARM64
 
OpenWRT manual
OpenWRT manualOpenWRT manual
OpenWRT manual
 
ENSA_Module_10.pptx
ENSA_Module_10.pptxENSA_Module_10.pptx
ENSA_Module_10.pptx
 
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation Tools
 
9 creating cent_os 7_mages_for_dpdk_training
9 creating cent_os 7_mages_for_dpdk_training9 creating cent_os 7_mages_for_dpdk_training
9 creating cent_os 7_mages_for_dpdk_training
 
Sharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linuxSharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linux
 
Kwfsbs67 en-v1
Kwfsbs67 en-v1Kwfsbs67 en-v1
Kwfsbs67 en-v1
 
Tutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi Subsystem
Tutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi SubsystemTutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi Subsystem
Tutorial WiFi driver code - Opening Nuts and Bolts of Linux WiFi Subsystem
 
101 apend. networking linux
101 apend. networking linux101 apend. networking linux
101 apend. networking linux
 
Howto Pxeboot
Howto PxebootHowto Pxeboot
Howto Pxeboot
 
Apend. networking linux
Apend. networking linuxApend. networking linux
Apend. networking linux
 
NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)NWI FOR OLATUNDE ISMAILA (G10B)
NWI FOR OLATUNDE ISMAILA (G10B)
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
Free radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmapleFree radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmaple
 
26.1.7 lab snort and firewall rules
26.1.7 lab   snort and firewall rules26.1.7 lab   snort and firewall rules
26.1.7 lab snort and firewall rules
 
CCNA CheatSheet
CCNA CheatSheetCCNA CheatSheet
CCNA CheatSheet
 
ccna 1 chapter 2 v5.0 exam answers 2014
ccna 1 chapter 2 v5.0 exam answers 2014ccna 1 chapter 2 v5.0 exam answers 2014
ccna 1 chapter 2 v5.0 exam answers 2014
 

Kürzlich hochgeladen

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Linux hpc-cluster-setup-guide

  • 1. Guide to Building your Linux High-performance Cluster Edmund Ochieng March 2, 2012 1
  • 2. Abstract In modern day where computer simulation forms a critical part in research, high-performance clusters have become a need in about every educational or research institution. This paper aims to give you the instructions you need to setup your personal computer. So if you are looking forward to setting up a cluster, this is the guide for you. This guide is prepared with climate simulation in mind. However, be- sides the software required for climate simualtion, steps required to setup the cluster remain more or less the same. The setup aims to grant you the ability to run modelling, simulation and visualisation applications across multiple processors. Probably more than you can have in a single server unit. 2
  • 3. Contents I Master node Configuration 5 1 Network configuration 6 1.1 Internal interface configuration . . . . . . . . . . . . . . . . . . . 6 1.2 External interface configuration . . . . . . . . . . . . . . . . . . . 6 2 MAC address acquisition 6 2.1 System Documentation / Manuals . . . . . . . . . . . . . . . . . 7 2.2 Netwotk Traffic Monitoring . . . . . . . . . . . . . . . . . . . . . 7 2.3 TFTP Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 DHCP configuration 9 4 Local Repository 11 5 EPEL Repository 11 6 NFS configuration 12 7 SSH Key Generation Script 13 II Software and Compiler installation and configura- tion 14 8 Torque configuration 15 9 Maui configuration 19 10 Compiler Installation 21 10.1 GCC Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 10.2 Intel Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 11 OpenMPI installation 21 11.1 OpenMPI Compiled with GCC Compilers . . . . . . . . . . . . . 22 11.2 OpenMPI Compiled with Intel Compilers . . . . . . . . . . . . . 22 12 Environment Modules installation 22 13 C3 Tools installation 23 14 Password Syncing 24 15 NetCDF, HDF5 and GrADs installation 24 16 NCL and NCO installation 25 17 R Statistical package installation 25 3
  • 4. III Computing Node Installation 26 18 Node OS installtion 27 19 Name resolution 28 4
  • 5. Part I Master node Configuration 5
  • 6. 1 Network configuration 1.1 Internal interface configuration Set the network interface through which the DHCP service will listen for IP address request to be static and to start on system boot up. This is should appear similar to the configurations below. 1. With a text editor of your choice, edit your master node network config- uration for the network interface to be used to communicate with other nodes in your cluster. [root@master ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet DEVICE=eth0 #BOOTPROTO=dhcp BOOTPROTO=static HWADDR=00:16:36:E7:8B:A3 IPADDR=192.168.10.1 NETMASK=255.255.255.0 ONBOOT=yes DHCP_HOSTNAME=master.cluster 2. Once the changes have been made, you can save the file and start the interface. 3. Finally, you should invoke, the ifconfig instruction to confirm the settings are active as illustrated below. [root@master ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:16:36:E7:8B:A3 inet addr:192.168.10.1 Bcast:192.168.10.127 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:74 Memory:fdfc0000-fdfd0000 1.2 External interface configuration The eth1 interface shall be connected to the organizational network and will acquire network configuration via DHCP. So to have the inetrface working, all that needs to be done is to set the ONBOOT option in /etc/sysconfig/network- scripts/ifcfg-eth1 and connect a cable to the interface. 2 MAC address acquisition The MAC address acquisition step is important as it allows the master node to uniquely identify the nodes that make up the cluster and as a result give them customized configuration. 6
  • 7. Each network interface has a unique MAC address which can be obtained either from the system manuals/documentation or from listening to the network traffic from the master node interface on which the dhcp shall be listening on. 2.1 System Documentation / Manuals This could either be on the hardware such as is the case on Sun servers and a couple of HP servers I’ve seen or on the booklets provided alongside the server. However, this could at times be deceiving. If that is the case, you could always listen on the network to obtain the desired MAC address. 2.2 Netwotk Traffic Monitoring Using the tcpdump command, we can acquire the hardware interfaces’ MAC address. For easy identification, each node should be turned on at any given time during the MAC address collection process. From the tcpdump output below, we can identify the network interface MAC address of the first node as 00:1b:24:3d:f1:a3 since the column just before the second ”greater than” symbol is 0.0.0.0.68 - which basically means it has no ip address and expects a response on UDP port 68. [root@master ~]# tcpdump -i eth0 -nn -qtep port bootpc and port bootps and ip broadcast tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 00:1b:24:3d:f1:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 590: 0.0.0.0.68 > 255.255.255.255.67: UDP, length 548 00:16:36:e7:8b:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 342: 192.168.10.1 .67 > 255.255.255.255.68: UDP, length 300 Repeat the above process for all nodes to which you would like to issue static IP addresses. 2.3 TFTP Configuration The TFTP service is trivial for a PXE server to work as they serve provide a netinstall kernel and a ramdisk to the clients when they attempt to do a network boot. By default, tftp which is part of xinetd.d is disabled. You can have it enabled by opening the configuration file and changing the value of the option ”disabled” from yes to no. Your completed configuration file should be similar to the one shown below 1. Enable tftp which is part of the xinetd stack [root@master ~]# vi /etc/xinetd.d/tftp [root@master ~]# cat /etc/xinetd.d/tftp # default: off service tftp 7
  • 8. { socket_type = dgram protocol = udp wait = yes user = root server = /usr/sbin/in.tftpd server_args = -s /tftpboot disable = no per_source = 11 cps = 100 2 flags = IPv4 } 2. Once done, restart the service xinetd to start tftp alongside other services on the next start. [root@master ~]# service xinetd restart Stopping xinetd: [ OK ] Starting xinetd: [ OK ] 3. Check if a tftpboot directory has been created on the root directory tree as is shown below [root@master ~]# file /tftpboot/ /tftpboot/: directory 4. Create a directory tree into which the pxe files shall be placed. [root@master ~]# mkdir -p /tftpboot/pxe/pxelinux.cfg 5. Copy the netboot kernel image and an initial ramdisk. [root@master ~]# ls /distro/centos/images/pxeboot/ initrd.img README TRANS.TBL vmlinuz [root@master ~]# cp /distro/centos/images/pxeboot/{vmlinuz, initrd.img} /tftpboot/pxe/ 6. Locate the pxelinux.0 file and copy to the /tftpboot/pxe directory from where it should be accessible via tftp daemon. [root@master ~]# locate pxelinux.0 /usr/lib/syslinux/pxelinux.0 [root@master ~]# cp -av /usr/lib/syslinux/pxelinux.0 /tftpboot/pxe/ ‘/usr/lib/syslinux/pxelinux.0’ -> ‘/tftpboot/pxe/pxelinux.0’ NOTE: Keenly note the location of the pxelinux.0 file as its relative path(i.e. from the tftp root directory - /tftpboot) will be used in the DHCP daemon configuration section. 7. Create a default boot configuration file for machines that may not have a specific boot file in the pxelinux.cfg directory. 8
  • 9. [root@master ~]# vi /tftpboot/pxe/pxelinux.cfg/default [root@master ~]# cat /tftpboot/pxe/pxelinux.cfg/default # /tftpboot/pxe/pxelinux.cfg/default prompt 1 timeout 100 default local label local LOCALBOOT 0 label install kernel vmlinuz append initrd=initrd.img network ip=dhcp lang=en US keymap=us ksdevice=eth0 ks=http://192.168.10.1/ks/node-ks.cfg loadramdisk=1 prompt_ramdisk=0 ramdisksize=16384 vga=normal selinux=0 8. Get the hexadecimal equivalent of the nodes ip address used to creat a per client pxe configuration. [root@master pxelinux.cfg]# gethostip node01 node01 192.168.10.2 C0A80A02 [root@master pxelinux.cfg]# cp default C0A80A02 9. Copy the default file to a file with the hex equivalent obtained above. Open the file and change the line default local to default install. This should commence installation on rebooting node01. The same should be done for all other nodes. [root@master ~]# cp /tftpboot/pxe/pxelinux.cfg/default /tftpbo ot/pxe/pxelinux.cfg/C0A80A02 3 DHCP configuration To issue static ip addresses via the DHCP daemon, the network interface hard- ware(or MAC) addresses collected in the MAC address collection section will be necessary. DHCP daemon configuration for the cluster should carried out as outlined in the steps below. 1. Enter the name of the interface through which the DHCP daemon will be listening on. [root@master ~]# cat /etc/sysconfig/dhcpd # Command line options here DHCPDARGS="eth0" 2. Create your DHCP configuration file, from the sample file in the location below. 9
  • 10. [root@master ~]# cp /usr/share/doc/dhcp-3.0.5/dhcpd.conf.sample /etc/dhcpd.conf cp: overwrite ‘/etc/dhcpd.conf’? y 3. You could edit your your configurations to look more or less like my con- figurations issuing addresses to desired hosts using their MAC addresses as illustrated below. [root@master ~]# cat /etc/dhcpd.conf ddns-update-style interim; ignore client-updates; allow booting; allow bootp; subnet 192.168.10.0 netmask 255.255.255.0 { # --- default gateway # option routers 192.168.0.1; option subnet-mask 255.255.255.0; # option nis-domain "domain.org"; option domain-name "cluster"; option domain-name-servers 192.168.10.1; option time-offset 10800; # EAT # option ntp-servers 192.168.1.1; # option netbios-name-servers 192.168.1.1; # range dynamic-bootp 192.168.10.4 192.168.10.20; default-lease-time 21600; max-lease-time 43200; filename "pxe/pxelinux.0"; next-server 192.168.10.1; # we want the nameserver to appear at a fixed address host node01 { hardware ethernet 00:1b:24:3d:f1:a3; fixed-address 192.168.10.2; option host-name "node01"; } host node02 { hardware ethernet 00:1b:24:3e:05:d1; fixed-address 192.168.10.3; option host-name "node02"; } host node03 { hardware ethernet 00:1b:24:3e:04:f6; fixed-address 192.168.10.4; option host-name "node03"; } } 4. Finally, save the configuration file and start the server. 10
  • 11. [root@master ~]# service dhcpd start Starting dhcpd: [ OK ] 5. Should the starting of DHCP daemon fail, you could look at the logs at /var/logs/messages and identify any DHCP daemon related errors. This could be done using the GNU/Linux editor but for better troubleshooting, I’d proceed as below. [root@master ~]# tail -f /var/log/messages 4 Local Repository A local repository is very crucial in cases of poor Internet connectivity. 1. Create a directory on the system and make it copy all the contents of the installation disk into it. [root@master ~]# mkdir -p /distro/centos [root@master ~]# cp -ar /media/CentOS_5.6_Final/* /distro/centos 2. Create a new repository file that would point to the location created above. [root@master ~]# cat /etc/yum.repos.d/CentOS-Local.repo [Local] name=CentOS- - Local baseurl=file:///distro/centos gpgcheck=0 enabled=1 3. Clear the cache and any other repository information saved locally [root@master ~]# yum clean all 4. Make a cache of the new available repositories. [root@master ~]# yum makecache 5 EPEL Repository The addition of the EPEL(Extraa Packages for Enterprise Linux) repository was crucial in the facilitation of the installation of some of the software needed in the cluster and which installation from source was not quite a simple process. These are such as: 1. R - R Statistical package http://www.r-project.org/ 2. NCO - NetCDF Operator http://nco.sourceforge.net/ 3. CDO - Climate Data Operators 4. NCL - NCAR Command Language http://www.ncl.ucar.edu/Applications/rcm.shtml 11
  • 12. 5. GrADS - Grid Analysis and Display System http://www.iges.org/ This is done as illustrated below: [root@master ~]# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5 /x86_64/epel-release-5-4.noarch.rpm Retrieving http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-re lease-5-4.noarch.rpm warning: /var/tmp/rpm-xfer.Ln8ILG: Header V3 DSA signature: NOKEY, key ID 217521f6 Preparing... ########################################### [100%] 1:epel-release ########################################### [100%] 6 NFS configuration We shall export some of the master node’s filesystem to reduce the need for repetitive configuration. 1. Populate the /etc/exports configuration file with the directories you’d wish to have exported via nfs. [root@master ~]# vi /etc/exports /distro *(ro,root_squash) /home *(rw,root_squash) /distro/centos *(ro,root_squash) /distro/ks *(ro,root_squash) /opt *(ro,root_squash) /usr/local *(ro,root_squash) /scratch *(rw,root_squash) 2. Start the nfs daemon. Which should start succesfully should your config- urations. [root@master ~]# service nfs start Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ] 3. Make the nfs daemon to autostart without on system start up. [root@master ~]# chkconfig nfs on [root@master ~]# exportfs -vra exporting *:/distro/centos exporting *:/distro/ks exporting *:/usr/local exporting *:/scratch exporting *:/distro exporting *:/home exporting *:/opt 12
  • 13. 7 SSH Key Generation Script To allow jobs to be succesfully submitted to the cluster, passwordless ssh login should be possible for all users on the cluster. So the script below will create a key pair and copy it over to the authorized keys file in the .ssh/ directory in each users home directory. This shall be automated by the script below which we shall place in system- wide /etc/profile.d directory. [root@master modulefiles]# cat /etc/profile.d/passwordless-ssh.sh Listing 1: /etc/profile.d/passwordless-ssh.sh #! / b i n / b a s h # # / e t c / p r o f i l e . d/ p a s s w o r d l e s s −s s h . sh # i f [ ! −d ” $ {HOME} ” / . s s h / −o ! −f ” $ {HOME} ” / . s s h / i d d s a . pub ] then echo −ne ” G e n e r a t i n g s s h k e y s : t ” ssh−keygen −t dsa −N ” ” −f ” $ {HOME} ” / . s s h / i d d s a i f [ ” $ ? ” −eq 0 ] ; then echo −e ” [ 0 3 3 [ 3 2 ; 1m done 0 3 3 [ 0m] ” ; c a t ” $ {HOME} ” / . s s h / i d d s a . pub >> ” $ {HOME} ” / . s s h / a u t h o r i z e d k e y s chmod −R u+rwX , go= ” $ {HOME} ” / . s s h / else echo −e ” [ 0 3 3 [ 3 5 ; 1m f a i l e d 0 3 3 [ 0m] ” fi fi 13
  • 14. Part II Software and Compiler installation and configuration 14
  • 15. 8 Torque configuration 1. Untar the source and execute the configure script with the following below. [root@master src]# tar xvfz torque-2.4.14.tar.gz [root@master src]# cd torque-2.4.14 [root@master torque-2.4.14]# mkdir build [root@master torque-2.4.14]# cd build [root@master build]# ../configure --help [root@master build]# ../configure --prefix=/opt/torque -- enable-server --enable-mom --enable-clients --disable-gui --with-rcp=scp 2. Compile the code to create binary files by executing ”make”, followed by ”make install” to install the binaries. [root@master build]# make [root@master build]# make install 3. Add the path for the sbin directory to the root user’s .bashrc file. [root@master torque-2.4.14]# echo "export PATH=/opt/torqu e/sbin:$PATH" >> /root/.bashrc [root@master torque-2.4.14]# tail -n 1 ~/.bashrc export PATH=/opt/torque/sbin:$PATH 4. Copy the pbs mom script in the contrib/init.d directory of the installation source /opt/torque/pbs mom.init. Open the file in an editor of your choice and ammend any erroneous paths. [root@master torque-2.4.14]# cp contrib/init.d/pbs_mom /opt/torque/pbs_mom.init [root@master torque-2.4.14]# vi /opt/torque/pbs_mom.init 5. Copy the node install.sh script into the torque install directory. It will be used to install pbs mom on the computing nodes. Listing 2: node install.sh #! / b i n / b a s h # / o p t / t o r q u e / n o d e i n s t a l l . sh # h t t p : / / e p i c o . e s c i e n c e −l a b . o r g # mailto : baro@democritos . i t TORQUEHOME=/opt / t o r q u e / TORQUEBIN=$TORQUEHOME/ b i n MAUIBIN=/opt / maui / b i n SPOOL=/v a r / s p o o l / t o r q u e mkdir −vp $SPOOL cd $SPOOL | | exit #===========================================================# 15
  • 16. mkdir −vp aux mom priv / j o b s mom logs c h e c k p o i n t s p o o l undelivered chmod −v 1777 s p o o l u n d e l i v e r e d for s in prologue epilogue do t e s t −e $TORQUEHOME/ s c r i p t s / $ s && l n −sv $TORQUEHOME/ s c r i p t s / $ s $SPOOL/ mom priv / done #===========================================================# c a t << EOF > p b s e n v i r o n m e n t PATH=/b i n : / u s r / b i n LANG =C EOF #===========================================================# echo master > s e r v e r n a m e #===========================================================# c a t << EOF > mom priv / c o n f i g $clienthost master $logevent 0 x7f $usecp ∗ : / u /u $usecp ∗ : / home /home $usecp ∗:/ scratch / scratch EOF #===========================================================# MOM INIT=/ e t c / i n i t . d/pbs mom cp −va / opt / t o r q u e /pbs mom . i n i t $MOM INIT chmod +x $MOM INIT c h k c o n f i g −−add pbs mom c h k c o n f i g pbs mom on # i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( pbs mom i s NOT p a m l i m i t s aware ) e g r e p ’ u l i m i t [ [ : s p a c e : ] ] + . ∗ − l [ [ : s p a c e : ] ] ’ $MOM INIT | | p e r l −e ’ w h i l e (<>) { print ; i f ( / ˆ [ t ]+ s t a r t ) / ) { p r i n t << EOF ; # −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−# # i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( no−p a m l i m i t s − aware ) # max l o c k e d memory , s o f t and hard l i m i t s f o r a l l PBS children u l i m i t −H − l u n l i m i t e d u l i m i t −S − l 4096000 # s t a c k s i z e , s o f t and hard l i m i t s f o r a l l PBS children u l i m i t −H −s u n l i m i t e d 16
  • 17. u l i m i t −S −s 1024000 #−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−# EOF } } ’ − i $MOM INIT #===========================================================# c a t << EOF > / e t c / p r o f i l e . d/ pbs . sh e x p o r t PATH=$TORQUEBIN: $MAUIBIN : $PATH EOF #EOF 6. In an editor of your choice, enter the fully qualified domain name of your master node in the file below. [root@master torque-2.4.14]# vi /var/spool/torque/server_name master.cluster 7. Add your nodes and the their properties into the nodes file as shown below. [root@master torque-2.4.14]# vi /var/spool/torque/server_priv/nodes node01 np=4 node02 np=4 node03 np=4 8. Initialize the serverdb and start the torque pbs server as shown below [root@master ~]# pbs_server -t create [root@master ~]# service pbs_server start Starting TORQUE Server: [ OK ] 9. Create a queue(s) to suit your configuration and make at least one of default using the torque qmgr command. An easier way would be to create a file as below [root@master ~]# vi qmgr.cluster create queue default set queue default queue_type = Execution set queue default Priority = 60 set queue default max_running = 128 set queue default resources_max.walltime = 168:00:00 set queue default resources_default.walltime = 01:00:00 set queue default max_user_run = 12 set queue default enabled = True set queue default started = True set server scheduling = True set server managers = maui@master set server managers += root@master set server operators = maui@master set server operators += root@master set server default_queue = default 17
  • 18. 10. Load the enter the file containing the qmgr configuration as illustrated below [root@master ~]# qmgr -c < qmgr.cluster 11. A print of the pbs server configuration looks as below [root@master ~]# qmgr -c ’p s’ # # Create queues and set their attributes. # # # Create and define queue default # create queue default set queue default queue_type = Execution set queue default Priority = 60 set queue default max_running = 128 set queue default resources_max.walltime = 168:00:00 set queue default resources_default.walltime = 01:00:00 set queue default max_user_run = 12 set queue default enabled = True set queue default started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = master.cluster set server managers = maui@master set server managers += root@master set server operators = maui@master set server operators += root@master set server default_queue = default set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server next_job_number = 26 12. Restart both the pbs server on the master node and the pbs mom on the nodes and execute, pbsnodes to see a print out on all free nodes. [root@master ~]# pbsnodes node01 state = free np = 2 ntype = cluster status = rectime=1308321567,varattr=,jobs=,state=free, netload=1205591,gres=,loadave=0.18,ncpus=4,physmem=4051184 kb,availmem=5021068kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node01 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux 18
  • 19. node02 state = free np = 2 ntype = cluster status = rectime=1308321569,varattr=,jobs=,state=free, netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184 kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux node03 state = free np = 2 ntype = cluster status = rectime=1308321569,varattr=,jobs=,state=free, netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184 kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux 9 Maui configuration 1. Untar, configure, make binaries and install maui from source as shown in the next sequence of steps [root@master ~]# tar xvfz maui-3.3.1.tar.gz [root@master ~]# cd maui-3.3.1 [root@master maui-3.3.1]# ./configure --help [root@master maui-3.3.1]# ./configure --prefix=/opt/maui --with-spooldir=/var/spool/maui --with-pbs=/opt/torque/ [root@master maui-3.3.1]# make [root@master maui-3.3.1]# make install 2. Create a system user maui through which maui shall be run [root@master maui-3.3.1]# useradd -d /var/spool/maui -r -g daemon maui 3. Edit the maui.cfg file changing the SERVERHOST, ADMIN1, ADMIN3 and resouce manager definition(RMCFG) as shown in the snipett below [root@master maui-3.3.1]# vi /var/spool/maui/maui.cfg # maui.cfg 3.3.1 SERVERHOST master # primary admin must be first in list ADMIN1 maui root ADMIN3 ALL # Resource Manager Definition RMCFG[MASTER] TYPE=PBS 19
  • 20. # Allocation Manager Definition AMCFG[bank] TYPE=NONE .... EOF 4. Copy the init script in the maui source package to /etc/init.d/ and, edit the file changing the MAUI PREFIX to point to your installation directory. [root@master maui-3.3.1]# cp contrib/service-scripts/redhat. maui.d /etc/init.d/maui [root@master maui-3.3.1]# vi /etc/init.d/maui [root@master maui-3.3.1]# cat /etc/init.d/maui #!/bin/sh # # maui This script will start and stop the MAUI Scheduler # # chkconfig: 345 85 85 # description: maui # ulimit -n 32768 # Source the library functions . /etc/rc.d/init.d/functions MAUI_PREFIX=/opt/maui # let see how we were called case "$1" in start) echo -n "Starting MAUI Scheduler: " daemon --user maui $MAUI_PREFIX/sbin/maui echo ;; stop) echo -n "Shutting down MAUI Scheduler: " killproc maui echo ;; status) status maui ;; restart) $0 stop $0 start ;; *) echo "Usage: maui {start|stop|restart|status}" exit 1 esac 5. Create a file maui.sh in the /etc/profile.d directory and to it add the environment variables PATH, INCLUDE and LD LIBRARY PATH and make it executable. 20
  • 21. [root@master maui]# vi /etc/profile.d/maui.sh [root@master maui]# chmod +x /etc/profile.d/maui.sh 10 Compiler Installation A compilers is necessary in a cluster as they aid in the changing of source code into executables that can be run or understood by the computer. Of interest are C, C++ and fortran compilers popular of which are the GCC and Intel compilers. Another, option is the PGI compilers which we shall not have installed. 10.1 GCC Compilers From the CentOS repositories we shall install the GCC compilers using the yum package management utility. [root@master src]# yum -y install gcc.x86_64 gcc-gfortran.x86_64 libstdc++.x86_64 libstdc++-devel.x86_64 libgcj.x86_64 compat-lib stdc++.x86_64 10.2 Intel Compilers For the Intel compilers which may give better results depending on the scenario, we shall proceed with the installation as outlined below: 1. Visit the Intel Website in your preferred web browser, register and down- load the Intel compilers for non-commercial use. 2. Move to the directory into which you downloaded the Intel C compilers and Fortran compilers. 3. Untar the tarballs and change directory into the created directory. [root@master ~]# tar xvfz l_ccompxe_2011.4.191.tgz [root@master ~]# cd l_ccompxe_2011.4.191 [root@master l_ccompxe_2011.4.191]# ./install.sh [root@master ~]# tar xvfz l_fcompxe_2011.4.191.tgz [root@master ~]# cd l_fcompxe_2011.4.191 [root@master l_fcompxe_2011.4.191]# ./install.sh 4. Execute the install.sh script and proceed as prompted. 11 OpenMPI installation OpenMPI is an open source library implementation of the Message Passing Interface(MPI-2) and facilitates communication/message inter-change between process in a High Performance Computing environment. 21
  • 22. 11.1 OpenMPI Compiled with GCC Compilers 1. Untar and compile the sources [root@master src]# tar xvfj openmpi-1.4.2.tar.bz2 [root@master src]# cd openmpi-1.4.2 [root@master openmpi-1.4.2]# mkdir build [root@master openmpi-1.4.2]# cd build/ [root@master build]# ../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran --prefix=/opt/openmpi/1.4.2/gcc/4.1.2 --with-tm=/opt/torque/ 2. Create binaries by running ”make” [root@master build]# make 3. Finally, install the binaries into the system [root@master build]# make install 11.2 OpenMPI Compiled with Intel Compilers 1. Untar and compile the sources as above. However, take keen notice of the value of the variables CC, CXX, FC and F77 as compared to the same step when compiled with the GCC compilers above. [root@master src]# tar xvfj openmpi-1.4.2.tar.bz2 [root@master src]# cd openmpi-1.4.2 [root@master openmpi-1.4.2]# mkdir build [root@master openmpi-1.4.2]# cd build/ [root@master build]# ../configure CC=icc CXX=icpc FC=ifort F77=ifort --prefix=/opt/openmpi/1.4.2/intel/12.0.4 --with-tm=/opt/torque/ 2. Create binaries by running ”make” [root@master build]# make 3. Finally, install the binaries into the system [root@master build]# make install 12 Environment Modules installation 1. Obtain the environment modules source file, uncompress it and changee directory into the created directory as below [root@master src]# tar xvfz modules-3.2.8a.tar.gz [root@master src]# cd modules-3.2.8 2. Then compile the sources specifying a prefix where the sources should be installed. 22
  • 23. [root@master modules-3.2.8]# ./configure --prefix=/opt Should, you be running a 64-bit system and encounter an error indicating tcl lib and include directories cannot be found, proceed as below [root@master modules-3.2.8]# ./configure --with-tcl-lib=/usr/lib64/ --with-tcl-inc=/usr/include/ --prefix=/opt 3. Then create binaries and install. [root@master modules-3.2.8]# make [root@master modules-3.2.8]# make install 4. Finally, copy the init scrips to the /etc/profile.d directory to make the modules command available system-wide. [root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash /etc/ profile.d/modules.sh [root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash_compl etion /etc/profile.d/modules_bash_completion.sh 13 C3 Tools installation 1. Uncompress the C3 tools source package and execute the install script [root@master src]# tar xvfz c3-4.0.1.tar.gz [root@master src]# cd c3-4.0.1 [root@master c3-4.0.1]# ./Install-c3 2. Create a c3.conf configuration file defining a cluster name, the master node and nodes in the cluster. [root@master c3-4.0.1]# vi /etc/c3.conf [root@master c3-4.0.1]# cat /etc/c3.conf cluster cluster1 { master:master node0[1-3] } 3. Create ssh keys to be used for passwordless login in the nodes of the cluster. [root@master ~]# ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (/root/.ssh/id_dsa): Created directory ’/root/.ssh’. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_dsa. Your public key has been saved in /root/.ssh/id_dsa.pub. The key fingerprint is: 46:6d:e5:e5:e2:5c:b5:72:16:bc:04:6f:59:2c:b5:32 root@master .cluster 23
  • 24. 4. Copy the /.ssh/id dsa.pub contents to the authorized keys file of all nodes in the cluster. This is how to do it on a single node. [root@master ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@node01 21 The authenticity of host ’node01 (192.168.10.2)’ can’t be es tablished. DSA key fingerprint is fe:8d:bf:6e:de:f4:94:d3:c4: d7:ee:74:6c:8c:dd:da. Are you sure you want to continue conn- ecting (yes/no)? yes Warning: Permanently added ’node01,192.168.10.2’ (RSA) to the list of known hosts. root@node01’s password: Now try logging into the machine, with "ssh ’root@node01’", and check in: .ssh/authorized_keys to make sure we haven’t added extra keys that you weren’t expecting. 5. Test if the key was succesfully registered by attempting to login into node01. [root@master ~]# ssh node01 Last login: Fri Jun 17 12:53:28 2011 [root@node01 ~]# exit logout 14 Password Syncing User accounts and passwords in the cluster should be similar in all nodes form- ing the cluster should be the same however, we cant have the user create the password in all the machines that form up the cluster. We shall therefore create a script to effect this. In our case we shall use the cpush command from the c3 tools package installed earlier. Listing 3: node-ks.cfg #! / b i n / b a s h # # Sync / e t c / passwd , / e t c / shadow and / e t c / group # File : / root / bin # Cron : min hour dom month dow r o o t / e t c / password−push . sh f o r f i n passwd shadow group ; do / opt / c3 −4/cpush / e t c / ” $ { f } ” > / dev / n u l l done However, have in mind that rsync could be used to achieve the same. 15 NetCDF, HDF5 and GrADs installation Grads requires NetCDF and HDF5 as dependencies for its installtion. Therefore, we shall install them all as a pack from the epel repositories. 24
  • 25. [root@master ~]# yum -y install netcdf hdf5 grads 16 NCL and NCO installation These too we shall have installed using the yum package manager as below [root@master ~]# yum -y install ncl nco 17 R Statistical package installation The R statistical package will be installed from the epel repositories to save as from the agony of installing a myraid of dependencies and for easy updating of the packages. [root@master ~]# yum -y install R.x86_64 R-core.x86_64 R-devel.x86_64 libRmath.x86_64 libRmath-devel.x86_64 25
  • 26. Part III Computing Node Installation 26
  • 27. 18 Node OS installtion With the master node setup complete, installtion of the nodes should just be a push of a button. However, a little understanding of the node-ks.cfg is essential. It marks the packages tftp, openssh-server, openssh, xorg-x11-xauth, mc and strace for installation and those with a preceeding − sign for uninstalltion. There after, the post installation section is executed, which removes unwanted packages, creates a local repository, and install the gcc compilers on the nodes which are available on the CentOS repositories. Listing 4: node-ks.cfg tftp openssh−s e r v e r openssh xorg−x11−xauth mc strace −cups −cups−l i b s −b l u e z − u t i l s −b l u e z −gnome −rp−pppoe −ppp %p o s t −−l o g =/ r o o t / ks−p o s t . l o g MASTER= 1 9 2 . 1 6 8 . 1 0 . 1 # D e l e t e unwanted s e r v i c e s f o r i in sendmail ; do c h k c o n f i g −−d e l ” $ { i } ” done # Remove d e f a u l t r e p o s t a r c v f z yum . r e p o s . d . t a r . gz / e t c /yum . r e p o s . d rm − r f / e t c /yum . r e p o s . d/∗ # Mount / d i s t r o form master node mkdir −p / d i s t r o mount −t n f s $MASTER: / d i s t r o / d i s t r o # Add mount t o f s t a b echo −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / d i s t r o t / d i s t r o t t n f s t d e f a u l t s t 0 0 ” | t e e −a / e t c / f s t a b # Add master node ’ s / o p t t o f s t a b echo −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / opt t / opt t t n f s t d e f a u l t s t 0 0 ” | t e e −a / etc / fstab # Add master node ’ s /home t o f s t a b echo −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / home t /home t t n f s t d e f a u l t s t 0 0 ” | t e e −a / etc / fstab # E x e c u t e t h e n o d e i n s t a l l . sh s c r i p t t o i n s t a l l pbs mom / opt / t o r q u e / n o d e i n s t a l l . sh # Create l o c a l repo mkdir −p / d i s t r o / c e n t o s echo −e ” [ L o c a l ] nname=CentOS−$ r e l e a s e v e r − L o c a l n b a s e u r l= f i l 27
  • 28. e : / / / d i s t r o / c e n t o s ngpgcheck=0 n e n a b l e d=1” | t e e / e t c /yum . r e p o s . d/CentOS−L o c a l . r e p o yum c l e a n a l l yum makecache # GCC c o m p i l e r s yum −y i n s t a l l g c c . x 8 6 6 4 gcc−g f o r t r a n . x 8 6 6 4 l i b s t d c ++. x 8 6 6 4 −d l i b s t d c++ e v e l . x 8 6 6 4 l i b g c j . x 8 6 6 4 compat−l i b s t d c ++. x 8 6 6 4 Once the installation is complete, you could have a look at the ks-post.log in root’s home directory for any errors while executing the post section of the kickstart file. 19 Name resolution Finally, ensure that all the nodes in the cluster can resolve names of the nodes in the cluster. You can either setup DNS on the master node or use the /etc/hosts file. SHould you need help setting up a DNS server, post your requests in the comments below. 28