SlideShare ist ein Scribd-Unternehmen logo
1 von 319
From angel at miami.edu Mon Dec 1 10:25:34 2003
From: angel at miami.edu (Angel Li)
Date: Mon, 01 Dec 2003 13:25:34 -0500
Subject: [Rocks-Discuss]cluster-fork
Message-ID: <3FCB879E.8050905@miami.edu>

Hi,

I recently installed Rocks 3.0 on a Linux cluster and when I run the
command "cluster-fork" I get this error:

apple* cluster-fork ls
Traceback (innermost last):
  File "/opt/rocks/sbin/cluster-fork", line 88, in ?
    import rocks.pssh
  File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
    import gmon.encoder
ImportError: Bad magic number in
/usr/lib/python1.5/site-packages/gmon/encoder.pyc

Any thoughts? I'm also wondering where to find the python sources for
files in /usr/lib/python1.5/site-packages/gmon.

Thanks,

Angel



From jghobrial at uh.edu Mon Dec 1 11:35:06 2003
From: jghobrial at uh.edu (Joseph)
Date: Mon, 1 Dec 2003 13:35:06 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <3FCB879E.8050905@miami.edu>
References: <3FCB879E.8050905@miami.edu>
Message-ID: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>

On Mon, 1 Dec 2003, Angel Li wrote:
Hello Angel, I have the same problem and so far there is no response when
I posted about this a month ago.

Is your frontend an AMD setup??

I am thinking this is an AMD problem.

Thanks,
Joseph


>   Hi,
>
>   I recently installed Rocks 3.0 on a Linux cluster and when I run the
>   command "cluster-fork" I get this error:
>
>   apple* cluster-fork ls
>   Traceback (innermost last):
>     File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>       import rocks.pssh
>     File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>       import gmon.encoder
>   ImportError: Bad magic number in
>   /usr/lib/python1.5/site-packages/gmon/encoder.pyc
>
>   Any thoughts? I'm also wondering where to find the python sources for
>   files in /usr/lib/python1.5/site-packages/gmon.
>
>   Thanks,
>
>   Angel
>


From tim.carlson at pnl.gov Mon Dec 1 14:58:54 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 01 Dec 2003 14:58:54 -0800 (PST)
Subject: [Rocks-Discuss]odd kickstart problem
In-Reply-To: <76AC0F5E-2025-11D8-804D-000393A4725A@sdsc.edu>
Message-ID: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>

Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get the
following error in /var/log/httpd/error_log


Traceback (innermost last):
  File "/opt/rocks/sbin/kgen", line 530, in ?
    app.run()
  File "/opt/rocks/sbin/kgen", line 497, in run
    doc = FromXmlStream(file)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
386, in FromXmlStream
    return reader.fromStream(stream, ownerDocument)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
372, in fromStream
    self.parser.parse(s)
  File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 58,
in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line 125,
in parse
    self.close()
  File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
154, in close
    self.feed("", isFinal = 1)
  File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
148, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
340, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found


Doing a wget of http://frontend-0/install/kickstart.cgi?
arch=i386&np=2&project=rocks
on one of the working internal nodes yields the same error.

Any thoughts on this?
I've also done a fresh
rocks-dist dist

Tim



From sjenks at uci.edu Mon Dec 1 15:35:54 2003
From: sjenks at uci.edu (Stephen Jenks)
Date: Mon, 1 Dec 2003 15:35:54 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
Message-ID: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>

FYI, I have a dual Athlon frontend and didn't have that problem. I know
that doesn't exactly help you, but at least it doesn't fail on all AMD
machines.

It looks like the .pyc file might be corrupt in your installation. The
source .py file (encoder.py) is in the
/usr/lib/python1.5/site-packages/gmon directory, so perhaps removing
the .pyc file would regenerate it (if you run cluster-fork as root?)

The md5sum for encoder.pyc on my system is:
459c78750fe6e065e9ed464ab23ab73d encoder.pyc
So you can check if yours is different.

Steve Jenks


On Dec 1, 2003, at 11:35 AM, Joseph wrote:

> On Mon, 1 Dec 2003, Angel Li wrote:
> Hello Angel, I have the same problem and so far there is no response
> when
> I posted about this a month ago.
>
> Is your frontend an AMD setup??
>
> I am thinking this is an AMD problem.
>
> Thanks,
> Joseph
>
>
>> Hi,
>>
>> I recently installed Rocks 3.0 on a Linux cluster and when I run the
>> command "cluster-fork" I get this error:
>>
>> apple* cluster-fork ls
>> Traceback (innermost last):
>>   File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>>     import rocks.pssh
>>   File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>>     import gmon.encoder
>> ImportError: Bad magic number in
>>   /usr/lib/python1.5/site-packages/gmon/encoder.pyc
>>
>>   Any thoughts? I'm also wondering where to find the python sources for
>>   files in /usr/lib/python1.5/site-packages/gmon.
>>
>>   Thanks,
>>
>>   Angel
>>



From mjk at sdsc.edu Mon Dec 1 19:03:16 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Mon, 1 Dec 2003 19:03:16 -0800
Subject: [Rocks-Discuss]odd kickstart problem
In-Reply-To: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>
Message-ID: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu>

You'll need to run the kpp and kgen steps (what kickstart.cgi does for
your) manually to find if this is an XML error.

       # cd /home/install/profiles/current
       # kpp compute

This will generate a kickstart file for a compute nodes, although some
information will be missing since it isn't specific to a node (not like
what ./kickstart.cgi --client=node-name generates). But what this does
do is traverse the XML graph and build a monolithic XML kickstart
profile. If this step works you can then "|" pipe the output into kgen
to convert the XML to kickstart syntax. Something in this procedure
should fail and point to the error.

       -mjk

On Dec 1, 2003, at 2:58 PM, Tim Carlson wrote:

>   Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get
>   the
>   following error in /var/log/httpd/error_log
>
>
>   Traceback (innermost last):
>     File "/opt/rocks/sbin/kgen", line 530, in ?
>        app.run()
>     File "/opt/rocks/sbin/kgen", line 497, in run
>        doc = FromXmlStream(file)
>     File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py",
>   line
>   386, in FromXmlStream
>        return reader.fromStream(stream, ownerDocument)
>     File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py",
>   line
>   372, in fromStream
>        self.parser.parse(s)
>     File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
>   58,
>   in parse
>        xmlreader.IncrementalParser.parse(self, source)
>     File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line
>   125,
>   in parse
>        self.close()
>     File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
>   154, in close
>        self.feed("", isFinal = 1)
>     File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
>   148, in feed
>        self._err_handler.fatalError(exc)
>     File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py",
>   line
>   340, in fatalError
>        raise exception
>   xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found
>
>
>   Doing a wget of
>   http://frontend-0/install/kickstart.cgi?
>   arch=i386&np=2&project=rocks
>   on one of the working internal nodes yields the same error.
>
>   Any thoughts on this?
>
>   I've also done a fresh
>   rocks-dist dist
>
>   Tim



From tim.carlson at pnl.gov Mon Dec 1 20:42:51 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 01 Dec 2003 20:42:51 -0800 (PST)
Subject: [Rocks-Discuss]odd kickstart problem
In-Reply-To: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu>
Message-ID: <Pine.GSO.4.44.0312012040250.3148-100000@paradox.emsl.pnl.gov>

On Mon, 1 Dec 2003, Mason J. Katz wrote:

> You'll need to run the kpp and kgen steps (what kickstart.cgi does for
> your) manually to find if this is an XML error.
>
>     # cd /home/install/profiles/current
>     # kpp compute

That was the trick. This sent me down the correct path. I had uninstalled
SGE on the frontend (I was having problems with SGE and wanted to start
from scratch)

Adding the 2 SGE XML files back to /home/install/profiles/2.3.2/nodes/
fixed everything

Thanks!

Tim
From landman at scalableinformatics.com Tue Dec 2 04:15:07 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 02 Dec 2003 07:15:07 -0500
Subject: [Rocks-Discuss]supermicro based MB's
Message-ID: <3FCC824B.5060406@scalableinformatics.com>

Folks:

  Working on integrating a Supermicro MB based cluster. Discovered early
on that all of the compute nodes have an Intel based NIC that RedHat
doesn't know anything about (any version of RH). Some of the
administrative nodes have other similar issues. I am seeing simply a
suprising number of mis/un detected hardware across the collection of MBs.

  Anyone have advice on where to get modules/module source for Redhat
for these things? It looks like I will need to rebuild the boot CD,
though the several times I have tried this previously have failed to
produce a working/bootable system. It looks like new modules need to be
created/inserted into the boot process (head node and cluster nodes)
kernels, as well as into the installable kernels.

   Has anyone done this for a Supermicro MB based system?   Thanks .

Joe

--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615




From jghobrial at uh.edu Tue Dec 2 08:28:08 2003
From: jghobrial at uh.edu (Joseph)
Date: Tue, 2 Dec 2003 10:28:08 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
 <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
Message-ID: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>

Indeed my md5sum is different for encoder.pyc. However, when I pulled the
file and run "cluster-fork" python responds about an import problem. So it
seems that regeneration did not occur. Is there a flag I need to pass?

I have also tried to figure out what package provides encoder and
reinstall the package, but an rpm query reveals nothing.

If this is a generated file, what generates it?

It seems that an rpm file query on ganglia show that files in the
directory belong to the package, but encoder.pyc does not.

Thanks,
Joseph



On Mon, 1 Dec 2003, Stephen Jenks wrote:
> FYI, I have a dual Athlon frontend and didn't have that problem. I know
> that doesn't exactly help you, but at least it doesn't fail on all AMD
> machines.
>
> It looks like the .pyc file might be corrupt in your installation. The
> source .py file (encoder.py) is in the
> /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing
> the .pyc file would regenerate it (if you run cluster-fork as root?)
>
> The md5sum for encoder.pyc on my system is:
> 459c78750fe6e065e9ed464ab23ab73d encoder.pyc
> So you can check if yours is different.
>
> Steve Jenks
>
>
> On Dec 1, 2003, at 11:35 AM, Joseph wrote:
>
> > On Mon, 1 Dec 2003, Angel Li wrote:
> > Hello Angel, I have the same problem and so far there is no response
> > when
> > I posted about this a month ago.
> >
> > Is your frontend an AMD setup??
> >
> > I am thinking this is an AMD problem.
> >
> > Thanks,
> > Joseph
> >
> >
> >> Hi,
> >>
> >> I recently installed Rocks 3.0 on a Linux cluster and when I run the
> >> command "cluster-fork" I get this error:
> >>
> >> apple* cluster-fork ls
> >> Traceback (innermost last):
> >>   File "/opt/rocks/sbin/cluster-fork", line 88, in ?
> >>     import rocks.pssh
> >>   File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
> >>     import gmon.encoder
> >> ImportError: Bad magic number in
> >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc
> >>
> >> Any thoughts? I'm also wondering where to find the python sources for
> >> files in /usr/lib/python1.5/site-packages/gmon.
> >>
> >> Thanks,
> >>
> >> Angel
> >>
>
From angel at miami.edu Tue Dec 2 09:02:55 2003
From: angel at miami.edu (Angel Li)
Date: Tue, 02 Dec 2003 12:02:55 -0500
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
Message-ID: <3FCCC5BF.3030903@miami.edu>

Joseph wrote:

>Indeed my md5sum is different for encoder.pyc. However, when I pulled the
>file and run "cluster-fork" python responds about an import problem. So it
>seems that regeneration did not occur. Is there a flag I need to pass?
>
>I have also tried to figure out what package provides encoder and
>reinstall the package, but an rpm query reveals nothing.
>
>If this is a generated file, what generates it?
>
>It seems that an rpm file query on ganglia show that files in the
>directory belong to the package, but encoder.pyc does not.
>
>Thanks,
>Joseph
>
>
>
>
I have finally found the python sources in the HPC rolls CD, filename
ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
seems python "compiles" the .py files to ".pyc" and then deletes the
source file the first time they are referenced? I also noticed that
there are two versions of python installed. Maybe the pyc files from one
version won't load into the other one?

Angel




From mjk at sdsc.edu Tue Dec 2 15:52:52 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 2 Dec 2003 15:52:52 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <3FCCC5BF.3030903@miami.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
<3FCCC5BF.3030903@miami.edu>
Message-ID: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>

Python creates the .pyc files for you, and does not remove the original
.py file. I would be extremely surprised it two "identical" .pyc files
had the same md5 checksum. I'd expect this to be more like C .o file
which always contain random data to pad out to the end of a page and
32/64 bit word sizes. Still this is just a guess, the real point is
you can always remove the .pyc files and the .py will regenerate it
when imported (although standard UNIX file/dir permission still apply).

What is the import error you get from cluster-fork?

     -mjk

On Dec 2, 2003, at 9:02 AM, Angel Li wrote:

> Joseph wrote:
>
>> Indeed my md5sum is different for encoder.pyc. However, when I pulled
>> the file and run "cluster-fork" python responds about an import
>> problem. So it seems that regeneration did not occur. Is there a flag
>> I need to pass?
>>
>> I have also tried to figure out what package provides encoder and
>> reinstall the package, but an rpm query reveals nothing.
>>
>> If this is a generated file, what generates it?
>>
>> It seems that an rpm file query on ganglia show that files in the
>> directory belong to the package, but encoder.pyc does not.
>>
>> Thanks,
>> Joseph
>>
>>
>>
> I have finally found the python sources in the HPC rolls CD, filename
> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
> seems python "compiles" the .py files to ".pyc" and then deletes the
> source file the first time they are referenced? I also noticed that
> there are two versions of python installed. Maybe the pyc files from
> one version won't load into the other one?
>
> Angel
>
>



From vrowley at ucsd.edu Mon Dec 1 14:27:03 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Mon, 01 Dec 2003 14:27:03 -0800
Subject: [Rocks-Discuss]PXE boot problems
Message-ID: <3FCBC037.5000302@ucsd.edu>

We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to
install a compute node via PXE. We are getting an error similar to the
one mentioned in the archives, e.g.

> Loading initrd.img....
> Ready
>
> Failed to free base memory
>
We have upgraded to syslinux-2.07-1, per the suggestion in the archives,
but continue to get the same error. Any ideas?

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at http://www.sagacitech.com/Chinaweb



From naihh at imcb.a-star.edu.sg Tue Dec 2 18:50:55 2003
From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)
Date: Wed, 3 Dec 2003 10:50:55 +0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for
Itanium?
Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>


Hi Laurence,

I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
still not working.

Any idea?

Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: Laurence Liew [mailto:laurence at scalablesys.com]
Sent: Thursday, November 20, 2003 2:53 PM
To: Nai Hong Hwa Francis
Cc: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
inRocks 3 for Itanium?

Hi Francis

GridEngine roll is ready for ia32. We will get a ia64 native version
ready as soon as we get back from SC2003. It will be released in a few
weeks time.

Globus GT2.4 is included in the Grid Roll

Cheers!
Laurence


On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
>
> Hi,
>
> Does anyone have any idea when will Sun Grid Engine be included as
part
> of Rocks 3 distribution.
>
> I am a newbie to Grid Computing.
> Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
>
> Regards
>
> Nai Hong Hwa Francis
>
> Institute of Molecular and Cell Biology (A*STAR)
> 30 Medical Drive
> Singapore 117609
> DID: 65-6874-6196
>
> -----Original Message-----
> From: npaci-rocks-discussion-request at sdsc.edu
> [mailto:npaci-rocks-discussion-request at sdsc.edu]
> Sent: Thursday, November 20, 2003 4:01 AM
> To: npaci-rocks-discussion at sdsc.edu
> Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
>
> Send npaci-rocks-discussion mailing list submissions to
>     npaci-rocks-discussion at sdsc.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> or, via email, send a message with subject or body 'help' to
>     npaci-rocks-discussion-request at sdsc.edu
>
> You can reach the person managing the list at
>     npaci-rocks-discussion-admin at sdsc.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of npaci-rocks-discussion digest..."
>
>
> Today's Topics:
>
>    1. top500 cluster installation movie (Greg Bruno)
>    2. Re: Running Normal Application on Rocks Cluster -
>         Newbie Question (Laurence Liew)
>
> --__--__--
>
> Message: 1
> To: npaci-rocks-discussion at sdsc.edu
> From: Greg Bruno <bruno at rocksclusters.org>
> Date: Tue, 18 Nov 2003 13:41:15 -0800
> Subject: [Rocks-Discuss]top500 cluster installation movie
>
> here's a crew of 7, installing the 201st fastest supercomputer in the
> world in under two hours on the showroom floor at SC 03:
>
> http://www.rocksclusters.org/rocks.mov
>
> warning: the above file is ~65MB.
>
>    - gb
>
>
> --__--__--
>
> Message: 2
> Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
Cluster
> -
>      Newbie Question
> From: Laurence Liew <laurenceliew at yahoo.com.sg>
> To: Leong Chee Shian <chee-shian.leong at schenker.com>
> Cc: npaci-rocks-discussion at sdsc.edu
> Date: Wed, 19 Nov 2003 12:31:18 +0800
>
> Chee Shian,
>
> Thanks for your call. We will take this off list and visit you next
week
> in your office as you requested.
>
> Cheers!
> laurence
>
>
>
> On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
> > I have just installed Rocks 3.0 with one frontend and two compute
> > node.
> >
> > A normal file based application is installed on the frontend and is
> > NFS shared to the compute nodes .
> >
> > Question is : When run 5 sessions of my applications , the CPU
> > utilization is all concentrated on the frontend node , nothing is
> > being passed on to the compute nodes . How do I make these 3
computers
> > to function as one and share the load ?
> >
> > Thanks everyone as I am really new to this clustering stuff..
> >
> > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > intel machines to replace our existing multi CPU sun server,
> > suggestions and recommendations are greatly appreciated.
> >
> >
> > Leong
> >
> >
> >
>
>
>
> --__--__--
>
> _______________________________________________
> npaci-rocks-discussion mailing list
> npaci-rocks-discussion at sdsc.edu
> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>
>
> End of npaci-rocks-discussion Digest
>
>
> DISCLAIMER:
> This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its contents to any
other person as it may be an offence under the Official Secrets Act.
Thank you.
--
Laurence Liew
CTO, Scalable Systems Pte Ltd
7 Bedok South Road
Singapore 469272
Tel   : 65 6827 3953
Fax    : 65 6827 3922
Mobile: 65 9029 4312
Email : laurence at scalablesys.com
http://www.scalablesys.com



DISCLAIMER:
This email is confidential and may be privileged. If you are not the intended
recipient, please delete it and notify us immediately. Please do not copy or use it
for any purpose, or disclose its contents to any other person as it may be an
offence under the Official Secrets Act. Thank you.


From laurence at scalablesys.com Tue Dec 2 19:10:08 2003
From: laurence at scalablesys.com (Laurence Liew)
Date: Wed, 03 Dec 2003 11:10:08 +0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included
      inRocks 3 for Itanium?
In-Reply-To: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>
References:
       <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>
Message-ID: <1070421007.2452.51.camel@scalable>

Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!
laurence



On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:
> Hi Laurence,
>
>   I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
>   still not working.
>
>   Any idea?
>
>   Nai Hong Hwa Francis
>   Institute of Molecular and Cell Biology (A*STAR)
>   30 Medical Drive
>   Singapore 117609.
>   DID: (65) 6874-6196
>
>   -----Original Message-----
>   From: Laurence Liew [mailto:laurence at scalablesys.com]
>   Sent: Thursday, November 20, 2003 2:53 PM
>   To: Nai Hong Hwa Francis
>   Cc: npaci-rocks-discussion at sdsc.edu
>   Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
>   inRocks 3 for Itanium?
>
>   Hi Francis
>
>   GridEngine roll is ready for ia32. We will get a ia64 native version
>   ready as soon as we get back from SC2003. It will be released in a few
>   weeks time.
>
>   Globus GT2.4 is included in the Grid Roll
>
>   Cheers!
>   Laurence
>
>
>   On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
>   >
>   > Hi,
>   >
>   > Does anyone have any idea when will Sun Grid Engine be included as
>   part
>   > of Rocks 3 distribution.
>   >
>   > I am a newbie to Grid Computing.
>   > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
>   >
>   > Regards
>   >
>   > Nai Hong Hwa Francis
>   >
>   > Institute of Molecular and Cell Biology (A*STAR)
>   > 30 Medical Drive
>   > Singapore 117609
>   > DID: 65-6874-6196
>   >
>   > -----Original Message-----
>   > From: npaci-rocks-discussion-request at sdsc.edu
>   > [mailto:npaci-rocks-discussion-request at sdsc.edu]
>   > Sent: Thursday, November 20, 2003 4:01 AM
>   > To: npaci-rocks-discussion at sdsc.edu
>   > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
>   >
>   > Send npaci-rocks-discussion mailing list submissions to
>   >   npaci-rocks-discussion at sdsc.edu
>   >
>   > To subscribe or unsubscribe via the World Wide Web, visit
>   >
>   > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>   > or, via email, send a message with subject or body 'help' to
>   >   npaci-rocks-discussion-request at sdsc.edu
>   >
>   > You can reach the person managing the list at
>   >   npaci-rocks-discussion-admin at sdsc.edu
>   >
>   > When replying, please edit your Subject line so it is more specific
>   > than "Re: Contents of npaci-rocks-discussion digest..."
>   >
>   >
>   > Today's Topics:
>   >
>   >     1. top500 cluster installation movie (Greg Bruno)
>   >     2. Re: Running Normal Application on Rocks Cluster -
>   >         Newbie Question (Laurence Liew)
>   >
>   > --__--__--
>   >
>   > Message: 1
>   > To: npaci-rocks-discussion at sdsc.edu
>   > From: Greg Bruno <bruno at rocksclusters.org>
>   > Date: Tue, 18 Nov 2003 13:41:15 -0800
>   > Subject: [Rocks-Discuss]top500 cluster installation movie
>   >
>   > here's a crew of 7, installing the 201st fastest supercomputer in the
>   > world in under two hours on the showroom floor at SC 03:
>   >
>   > http://www.rocksclusters.org/rocks.mov
>   >
>   > warning: the above file is ~65MB.
>   >
>   >    - gb
>   >
>   >
>   > --__--__--
>   >
>   > Message: 2
>   > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
>   Cluster
>   > -
>   >   Newbie Question
>   > From: Laurence Liew <laurenceliew at yahoo.com.sg>
>   > To: Leong Chee Shian <chee-shian.leong at schenker.com>
>   > Cc: npaci-rocks-discussion at sdsc.edu
>   > Date: Wed, 19 Nov 2003 12:31:18 +0800
>   >
>   > Chee Shian,
>   >
>   > Thanks for your call. We will take this off list and visit you next
>   week
>   > in your office as you requested.
>   >
>   > Cheers!
>   > laurence
> >
> >
> >
> > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
> > > I have just installed Rocks 3.0 with one frontend and two compute
> > > node.
> > >
> > > A normal file based application is installed on the frontend and is
> > > NFS shared to the compute nodes .
> > >
> > > Question is : When run 5 sessions of my applications , the CPU
> > > utilization is all concentrated on the frontend node , nothing is
> > > being passed on to the compute nodes . How do I make these 3
> computers
> > > to function as one and share the load ?
> > >
> > > Thanks everyone as I am really new to this clustering stuff..
> > >
> > > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > > intel machines to replace our existing multi CPU sun server,
> > > suggestions and recommendations are greatly appreciated.
> > >
> > >
> > > Leong
> > >
> > >
> > >
> >
> >
> >
> > --__--__--
> >
> > _______________________________________________
> > npaci-rocks-discussion mailing list
> > npaci-rocks-discussion at sdsc.edu
> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> >
> >
> > End of npaci-rocks-discussion Digest
> >
> >
> > DISCLAIMER:
> > This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its contents to any
> other person as it may be an offence under the Official Secrets Act.
> Thank you.
--
Laurence Liew
CTO, Scalable Systems Pte Ltd
7 Bedok South Road
Singapore 469272
Tel   : 65 6827 3953
Fax    : 65 6827 3922
Mobile: 65 9029 4312
Email : laurence at scalablesys.com
http://www.scalablesys.com
From DGURGUL at PARTNERS.ORG Wed Dec 3 07:24:29 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Wed, 3 Dec 2003 10:24:29 -0500
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo
      cks 3 for Itanium?
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>

Where do we find the SGE roll?   Under Lhoste at http://rocks.npaci.edu/Rocks/
there is a "Grid" roll listed.   Is SGE in that? The userguide doesn't mention
SGE.

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


-----Original Message-----
From: npaci-rocks-discussion-admin at sdsc.edu
[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Laurence Liew
Sent: Tuesday, December 02, 2003 10:10 PM
To: Nai Hong Hwa Francis
Cc: npaci-rocks-discussion at sdsc.edu
Subject: RE: [Rocks-Discuss]RE: When will Sun Grid Engine be included
inRocks 3 for Itanium?


Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!
laurence



On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:
> Hi Laurence,
>
> I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
> still not working.
>
> Any idea?
>
> Nai Hong Hwa Francis
> Institute of Molecular and Cell Biology (A*STAR)
> 30 Medical Drive
> Singapore 117609.
> DID: (65) 6874-6196
>
> -----Original Message-----
> From: Laurence Liew [mailto:laurence at scalablesys.com]
> Sent: Thursday, November 20, 2003 2:53 PM
>   To: Nai Hong Hwa Francis
>   Cc: npaci-rocks-discussion at sdsc.edu
>   Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
>   inRocks 3 for Itanium?
>
>   Hi Francis
>
>   GridEngine roll is ready for ia32. We will get a ia64 native version
>   ready as soon as we get back from SC2003. It will be released in a few
>   weeks time.
>
>   Globus GT2.4 is included in the Grid Roll
>
>   Cheers!
>   Laurence
>
>
>   On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
>   >
>   > Hi,
>   >
>   > Does anyone have any idea when will Sun Grid Engine be included as
>   part
>   > of Rocks 3 distribution.
>   >
>   > I am a newbie to Grid Computing.
>   > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
>   >
>   > Regards
>   >
>   > Nai Hong Hwa Francis
>   >
>   > Institute of Molecular and Cell Biology (A*STAR)
>   > 30 Medical Drive
>   > Singapore 117609
>   > DID: 65-6874-6196
>   >
>   > -----Original Message-----
>   > From: npaci-rocks-discussion-request at sdsc.edu
>   > [mailto:npaci-rocks-discussion-request at sdsc.edu]
>   > Sent: Thursday, November 20, 2003 4:01 AM
>   > To: npaci-rocks-discussion at sdsc.edu
>   > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
>   >
>   > Send npaci-rocks-discussion mailing list submissions to
>   >   npaci-rocks-discussion at sdsc.edu
>   >
>   > To subscribe or unsubscribe via the World Wide Web, visit
>   >
>   > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>   > or, via email, send a message with subject or body 'help' to
>   >   npaci-rocks-discussion-request at sdsc.edu
>   >
>   > You can reach the person managing the list at
>   >   npaci-rocks-discussion-admin at sdsc.edu
>   >
>   > When replying, please edit your Subject line so it is more specific
>   > than "Re: Contents of npaci-rocks-discussion digest..."
>   >
>   >
>   > Today's Topics:
>   >
>   >     1. top500 cluster installation movie (Greg Bruno)
>   >     2. Re: Running Normal Application on Rocks Cluster -
>   >         Newbie Question (Laurence Liew)
>   >
>   > --__--__--
>   >
>   > Message: 1
>   > To: npaci-rocks-discussion at sdsc.edu
>   > From: Greg Bruno <bruno at rocksclusters.org>
>   > Date: Tue, 18 Nov 2003 13:41:15 -0800
>   > Subject: [Rocks-Discuss]top500 cluster installation movie
>   >
>   > here's a crew of 7, installing the 201st fastest supercomputer in the
>   > world in under two hours on the showroom floor at SC 03:
>   >
>   > http://www.rocksclusters.org/rocks.mov
>   >
>   > warning: the above file is ~65MB.
>   >
>   >    - gb
>   >
>   >
>   > --__--__--
>   >
>   > Message: 2
>   > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
>   Cluster
>   > -
>   >   Newbie Question
>   > From: Laurence Liew <laurenceliew at yahoo.com.sg>
>   > To: Leong Chee Shian <chee-shian.leong at schenker.com>
>   > Cc: npaci-rocks-discussion at sdsc.edu
>   > Date: Wed, 19 Nov 2003 12:31:18 +0800
>   >
>   > Chee Shian,
>   >
>   > Thanks for your call. We will take this off list and visit you next
>   week
>   > in your office as you requested.
>   >
>   > Cheers!
>   > laurence
>   >
>   >
>   >
>   > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
>   > > I have just installed Rocks 3.0 with one frontend and two compute
>   > > node.
>   > >
>   > > A normal file based application is installed on the frontend and is
>   > > NFS shared to the compute nodes .
>   > >
>   > > Question is : When run 5 sessions of my applications , the CPU
>   > > utilization is all concentrated on the frontend node , nothing is
>   > > being passed on to the compute nodes . How do I make these 3
>   computers
> > > to function as one and share the load ?
> > >
> > > Thanks everyone as I am really new to this clustering stuff..
> > >
> > > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > > intel machines to replace our existing multi CPU sun server,
> > > suggestions and recommendations are greatly appreciated.
> > >
> > >
> > > Leong
> > >
> > >
> > >
> >
> >
> >
> > --__--__--
> >
> > _______________________________________________
> > npaci-rocks-discussion mailing list
> > npaci-rocks-discussion at sdsc.edu
> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> >
> >
> > End of npaci-rocks-discussion Digest
> >
> >
> > DISCLAIMER:
> > This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its contents to any
> other person as it may be an offence under the Official Secrets Act.
> Thank you.
--
Laurence Liew
CTO, Scalable Systems Pte Ltd
7 Bedok South Road
Singapore 469272
Tel   : 65 6827 3953
Fax    : 65 6827 3922
Mobile: 65 9029 4312
Email : laurence at scalablesys.com
http://www.scalablesys.com


From bruno at rocksclusters.org Wed Dec 3 07:32:14 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 3 Dec 2003 07:32:14 -0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for
Itanium?
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>
Message-ID: <DF132702-25A5-11D8-86E6-000A95C4E3B4@rocksclusters.org>

>   Where do we find the SGE roll?   Under Lhoste at
>   http://rocks.npaci.edu/Rocks/
>   there is a "Grid" roll listed.   Is SGE in that?   The userguide doesn't
>   mention
>   SGE.
the SGE roll will be available in the upcoming v3.1.0 release.
scheduled release date is december 15th.

  - gb



From jlkaiser at fnal.gov Wed Dec 3 08:35:18 2003
From: jlkaiser at fnal.gov (Joe Kaiser)
Date: Wed, 03 Dec 2003 10:35:18 -0600
Subject: [Rocks-Discuss]supermicro based MB's
In-Reply-To: <3FCC824B.5060406@scalableinformatics.com>
References: <3FCC824B.5060406@scalableinformatics.com>
Message-ID: <1070469318.12324.13.camel@nietzsche.fnal.gov>

Hi,

You don't say what version of Rocks you are using. The following is for
the X5DPA-GG board and Rocks 3.0. It requires modifying only the
pcitable in the boot image on the tftp server. I believe the procedure
for 2.3.2 requires a heck of a lot more work, (but it may not). I would
have to dig deep for the notes about the changing 2.3.2.

This should be done on the frontend:

cd /tftpboot/X86PC/UNDI/pxelinux/
cp initrd.img initrd.img.orig
cp initrd.img /tmp
cd /tmp
mv initrd.img initrd.gz
gunzip initrd.gz
mkdir /mnt/loop
mount -o loop initrd /mnt/loop
cd /mnt/loop/modules/
vi pcitable

Search for the e1000 drivers and add the following line:

0x8086 0x1013    "e1000" "Intel Corp.|82546EB Gigabit Ethernet
Controller"

write the file

cd /tmp
umount /mnt/loop
gzip initrd
mv initrd.gz initrd.img
mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/

Then boot the node.

Hope this helps.

Thanks,

Joe

On Tue, 2003-12-02 at 06:15, Joe Landman wrote:
> Folks:
>
>   Working on integrating a Supermicro MB based cluster. Discovered early
> on that all of the compute nodes have an Intel based NIC that RedHat
> doesn't know anything about (any version of RH). Some of the
> administrative nodes have other similar issues. I am seeing simply a
> suprising number of mis/un detected hardware across the collection of MBs.
>
>   Anyone have advice on where to get modules/module source for Redhat
> for these things? It looks like I will need to rebuild the boot CD,
> though the several times I have tried this previously have failed to
> produce a working/bootable system. It looks like new modules need to be
> created/inserted into the boot process (head node and cluster nodes)
> kernels, as well as into the installable kernels.
>
>     Has anyone done this for a Supermicro MB based system?  Thanks .
>
> Joe
--
===================================================================
Joe Kaiser - Systems Administrator

Fermi Lab
CD/OSS-SCS                Never laugh at live dragons.
630-840-6444
jlkaiser at fnal.gov
===================================================================



From jghobrial at uh.edu Wed Dec 3 08:59:15 2003
From: jghobrial at uh.edu (Joseph)
Date: Wed, 3 Dec 2003 10:59:15 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
 <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
<Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
 <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
Message-ID: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>

Here is the error I receive when I remove the file encoder.pyc and run the
command cluster-fork

Traceback (innermost last):
  File "/opt/rocks/sbin/cluster-fork", line 88, in ?
    import rocks.pssh
  File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
    import gmon.encoder
ImportError: No module named encoder

Thanks,
Joseph


On Tue, 2 Dec 2003, Mason J. Katz wrote:

> Python creates the .pyc files for you, and does not remove the original
>   .py file. I would be extremely surprised it two "identical" .pyc files
>   had the same md5 checksum. I'd expect this to be more like C .o file
>   which always contain random data to pad out to the end of a page and
>   32/64 bit word sizes. Still this is just a guess, the real point is
>   you can always remove the .pyc files and the .py will regenerate it
>   when imported (although standard UNIX file/dir permission still apply).
>
>   What is the import error you get from cluster-fork?
>
>      -mjk
>
>   On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>
>   > Joseph wrote:
>   >
>   >> Indeed my md5sum is different for encoder.pyc. However, when I pulled
>   >> the file and run "cluster-fork" python responds about an import
>   >> problem. So it seems that regeneration did not occur. Is there a flag
>   >> I need to pass?
>   >>
>   >> I have also tried to figure out what package provides encoder and
>   >> reinstall the package, but an rpm query reveals nothing.
>   >>
>   >> If this is a generated file, what generates it?
>   >>
>   >> It seems that an rpm file query on ganglia show that files in the
>   >> directory belong to the package, but encoder.pyc does not.
>   >>
>   >> Thanks,
>   >> Joseph
>   >>
>   >>
>   >>
>   > I have finally found the python sources in the HPC rolls CD, filename
>   > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>   > seems python "compiles" the .py files to ".pyc" and then deletes the
>   > source file the first time they are referenced? I also noticed that
>   > there are two versions of python installed. Maybe the pyc files from
>   > one version won't load into the other one?
>   >
>   > Angel
>   >
>   >
>


From mjk at sdsc.edu Wed Dec 3 15:19:38 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 3 Dec 2003 15:19:38 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
<3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
<Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
Message-ID: <2A332131-25E7-11D8-A641-000A95DA5638@sdsc.edu>

This file come from a ganglia package, what does
# rpm -q ganglia-receptor

Return?

     -mjk


On Dec 3, 2003, at 8:59 AM, Joseph wrote:

> Here is the error I receive when I remove the file encoder.pyc and run
> the
> command cluster-fork
>
> Traceback (innermost last):
>    File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>      import rocks.pssh
>    File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>      import gmon.encoder
> ImportError: No module named encoder
>
> Thanks,
> Joseph
>
>
> On Tue, 2 Dec 2003, Mason J. Katz wrote:
>
>> Python creates the .pyc files for you, and does not remove the
>> original
>> .py file. I would be extremely surprised it two "identical" .pyc
>> files
>> had the same md5 checksum. I'd expect this to be more like C .o file
>> which always contain random data to pad out to the end of a page and
>> 32/64 bit word sizes. Still this is just a guess, the real point is
>> you can always remove the .pyc files and the .py will regenerate it
>> when imported (although standard UNIX file/dir permission still
>> apply).
>>
>> What is the import error you get from cluster-fork?
>>
>>     -mjk
>>
>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>>
>>> Joseph wrote:
>>>
>>>> Indeed my md5sum is different for encoder.pyc. However, when I
>>>> pulled
>>>> the file and run "cluster-fork" python responds about an import
>>>> problem. So it seems that regeneration did not occur. Is there a
>>>> flag
>>>> I need to pass?
>>>>
>>>> I have also tried to figure out what package provides encoder and
>>>> reinstall the package, but an rpm query reveals nothing.
>>>>
>>>> If this is a generated file, what generates it?
>>>>
>>>> It seems that an rpm file query on ganglia show that files in the
>>>> directory belong to the package, but encoder.pyc does not.
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>
>>>>
>>> I have finally found the python sources in the HPC rolls CD, filename
>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>>> seems python "compiles" the .py files to ".pyc" and then deletes the
>>> source file the first time they are referenced? I also noticed that
>>> there are two versions of python installed. Maybe the pyc files from
>>> one version won't load into the other one?
>>>
>>> Angel
>>>
>>>
>>



From csamuel at vpac.org Wed Dec 3 18:09:26 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 4 Dec 2003 13:09:26 +1100
Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL
trademark removal ?
Message-ID: <200312041309.27986.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

Can someone confirm that the next Rocks release will support Opteron please ?

Also, I noticed that the current Rocks release on Itanium based on RHEL still
has a lot of mentions of RedHat in it, which from my reading of their
trademark guidelines is not permitted, is that fixed in the new version ?

cheers!
Chris
- --
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/zpdWO2KABBYQAh8RAqB8AJ9FG+IjIeem21qlFS6XYIHamIMPmwCghVTV
AgjAlVHWgdv/KzYQinHGPxs=
=IAWU
-----END PGP SIGNATURE-----



From bruno at rocksclusters.org Wed Dec 3 18:46:30 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 3 Dec 2003 18:46:30 -0800
Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL
trademark removal ?
In-Reply-To: <200312041309.27986.csamuel@vpac.org>
References: <200312041309.27986.csamuel@vpac.org>
Message-ID: <10AD9827-2604-11D8-86E6-000A95C4E3B4@rocksclusters.org>

> Can someone confirm that the next Rocks release will support Opteron
> please ?

yes, it will support opteron.

>   Also, I noticed that the current Rocks release on Itanium based on
>   RHEL still
>   has a lot of mentions of RedHat in it, which from my reading of their
>   trademark guidelines is not permitted, is that fixed in the new
>   version ?

and yes, (even though it doesn't feel like the right thing to do, as
redhat has offered to the community some outstanding technologies that
we'd like to credit), all redhat trademarks will be removed from 3.1.0.

    - gb



From fds at sdsc.edu Thu Dec 4 06:46:32 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Thu, 4 Dec 2003 06:46:32 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
<3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
<Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
Message-ID: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>

Please install the
http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1
-2.i386.rpm package, which includes the correct encoder.py file. (This
package is listed on the 3.0.0 errata page)

-Federico

On Dec 3, 2003, at 8:59 AM, Joseph wrote:

>   Here is the error I receive when I remove the file encoder.pyc and run
>   the
>   command cluster-fork
>
>   Traceback (innermost last):
>     File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>       import rocks.pssh
>     File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>       import gmon.encoder
>   ImportError: No module named encoder
>
>   Thanks,
>   Joseph
>
>
> On Tue, 2 Dec 2003, Mason J. Katz wrote:
>
>> Python creates the .pyc files for you, and does not remove the
>> original
>> .py file. I would be extremely surprised it two "identical" .pyc
>> files
>> had the same md5 checksum. I'd expect this to be more like C .o file
>> which always contain random data to pad out to the end of a page and
>> 32/64 bit word sizes. Still this is just a guess, the real point is
>> you can always remove the .pyc files and the .py will regenerate it
>> when imported (although standard UNIX file/dir permission still
>> apply).
>>
>> What is the import error you get from cluster-fork?
>>
>>    -mjk
>>
>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>>
>>> Joseph wrote:
>>>
>>>> Indeed my md5sum is different for encoder.pyc. However, when I
>>>> pulled
>>>> the file and run "cluster-fork" python responds about an import
>>>> problem. So it seems that regeneration did not occur. Is there a
>>>> flag
>>>> I need to pass?
>>>>
>>>> I have also tried to figure out what package provides encoder and
>>>> reinstall the package, but an rpm query reveals nothing.
>>>>
>>>> If this is a generated file, what generates it?
>>>>
>>>> It seems that an rpm file query on ganglia show that files in the
>>>> directory belong to the package, but encoder.pyc does not.
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>
>>>>
>>> I have finally found the python sources in the HPC rolls CD, filename
>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>>> seems python "compiles" the .py files to ".pyc" and then deletes the
>>> source file the first time they are referenced? I also noticed that
>>> there are two versions of python installed. Maybe the pyc files from
>>> one version won't load into the other one?
>>>
>>> Angel
>>>
>>>
>>
>>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA
From jghobrial at uh.edu Thu Dec 4 07:14:21 2003
From: jghobrial at uh.edu (Joseph)
Date: Thu, 4 Dec 2003 09:14:21 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
 <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
<Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
 <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
 <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
 <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>
Message-ID: <Pine.LNX.4.56.0312040913110.13972@mail.tlc2.uh.edu>

Thank you very much this solved the problem.

Joseph


On Thu, 4 Dec 2003, Federico Sacerdoti wrote:

>   Please install the
>   http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1
>   -2.i386.rpm package, which includes the correct encoder.py file. (This
>   package is listed on the 3.0.0 errata page)
>
>   -Federico
>
>   On Dec 3, 2003, at 8:59 AM, Joseph wrote:
>
>   > Here is the error I receive when I remove the file encoder.pyc and run
>   > the
>   > command cluster-fork
>   >
>   > Traceback (innermost last):
>   >   File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>   >     import rocks.pssh
>   >   File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>   >     import gmon.encoder
>   > ImportError: No module named encoder
>   >
>   > Thanks,
>   > Joseph
>   >
>   >
>   > On Tue, 2 Dec 2003, Mason J. Katz wrote:
>   >
>   >> Python creates the .pyc files for you, and does not remove the
>   >> original
>   >> .py file. I would be extremely surprised it two "identical" .pyc
>   >> files
>   >> had the same md5 checksum. I'd expect this to be more like C .o file
>   >> which always contain random data to pad out to the end of a page and
>   >> 32/64 bit word sizes. Still this is just a guess, the real point is
>   >> you can always remove the .pyc files and the .py will regenerate it
>   >> when imported (although standard UNIX file/dir permission still
>   >> apply).
>   >>
>   >> What is the import error you get from cluster-fork?
>   >>
>   >> -mjk
>   >>
>   >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>   >>
>   >>> Joseph wrote:
>   >>>
>   >>>> Indeed my md5sum is different for encoder.pyc. However, when I
>   >>>> pulled
>   >>>> the file and run "cluster-fork" python responds about an import
>   >>>> problem. So it seems that regeneration did not occur. Is there a
>   >>>> flag
>   >>>> I need to pass?
>   >>>>
>   >>>> I have also tried to figure out what package provides encoder and
>   >>>> reinstall the package, but an rpm query reveals nothing.
>   >>>>
>   >>>> If this is a generated file, what generates it?
>   >>>>
>   >>>> It seems that an rpm file query on ganglia show that files in the
>   >>>> directory belong to the package, but encoder.pyc does not.
>   >>>>
>   >>>> Thanks,
>   >>>> Joseph
>   >>>>
>   >>>>
>   >>>>
>   >>> I have finally found the python sources in the HPC rolls CD, filename
>   >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>   >>> seems python "compiles" the .py files to ".pyc" and then deletes the
>   >>> source file the first time they are referenced? I also noticed that
>   >>> there are two versions of python installed. Maybe the pyc files from
>   >>> one version won't load into the other one?
>   >>>
>   >>> Angel
>   >>>
>   >>>
>   >>
>   >>
>   Federico
>
>   Rocks Cluster Group, San Diego Supercomputing Center, CA
>


From vrowley at ucsd.edu Thu Dec 4 12:29:55 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Thu, 04 Dec 2003 12:29:55 -0800
Subject: [Rocks-Discuss]Re: PXE boot problems
In-Reply-To: <3FCBC037.5000302@ucsd.edu>
References: <3FCBC037.5000302@ucsd.edu>
Message-ID: <3FCF9943.1020806@ucsd.edu>

Uh, nevermind. We had upgraded syslinux on our frontend, not the node
we were trying to PXE boot. Sigh.

V. Rowley wrote:
> We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to
> install a compute node via PXE. We are getting an error similar to the
> one mentioned in the archives, e.g.
>
>> Loading initrd.img....
>> Ready
>>
>> Failed to free base memory
>>
>
> We have upgraded to syslinux-2.07-1, per the suggestion in the archives,
> but continue to get the same error. Any ideas?
>

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at http://www.sagacitech.com/Chinaweb



From cdwan at mail.ahc.umn.edu Fri Dec 5 08:16:07 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Fri, 5 Dec 2003 10:16:07 -0600 (CST)
Subject: [Rocks-Discuss]Private NIS master
Message-ID: <Pine.GSO.4.58.0312042305070.18193@lenti.med.umn.edu>

Hello all. Long time listener, first time caller.    Thanks for all the
great work.

I'm integrating a Rocks cluster into an existing NIS domain. I noticed
that while the cluster database now supports a PrivateNISMaster, that
variable doesn't make it into the /etc/yp.conf on the compute nodes. They
remain broadcast.

Assume that, for whatever reason, I don't want to set up a repeater
(slave) ypserv process on my frontend.    I added the option "--nisserver
<var name="Kickstart_PrivateNISMaster"/>" to the
"profiles/3.0.0/nodes/nis-client.xml" file, removed the ypserver on my
frontend, and it works like I want it to.

Am I missing anything fundamental here?

-Chris Dwan
 University of Minnesota


From wyzhong78 at msn.com Mon Dec 8 06:18:34 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Mon, 08 Dec 2003 22:18:34 +0800
Subject: [Rocks-Discuss]3.0.0 problem: not able to boot up
Message-ID: <BAY3-F14uFqD45TpNO40002c14c@hotmail.com>

Hi,everyone!
I installed rocks 3.0.0 defautly, There wasn't any trouble in the
installing. But I haven't be able to boot,it stopped at the beginning,the
message "GRUB" showed on the screen,and waiting....
   my hardware are double Xeon 2.4G,MSI 9138,Seagate SCSI disk.
   Any appreciate is welcome!

_________________________________________________________________
???? MSN Explorer:   http://explorer.msn.com/lccn/



From angelini at vki.ac.be Mon Dec 8 06:20:45 2003
From: angelini at vki.ac.be (Angelini Giuseppe)
Date: Mon, 08 Dec 2003 15:20:45 +0100
Subject: [Rocks-Discuss]How to use MPICH with ssh
Message-ID: <3FD488BD.3EBBDB8D@vki.ac.be>

Dear rocks folk,


I have recently installed mpich with Lahay Fortran and now that I can
compile and link,
I would like to run but it seems that I have another problem. In fact I
have the following
error message when I try to run:

[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
$DPT/hybflow
p0_13226: p4_error: Path to program is invalid while starting
/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
-1
    p4_error: latest msg from perror: No such file or directory
p0_13226: p4_error: Child process exited while making connection to
remote process on compute-0-6: 0
p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32

I am wondering why it is looking for /usr/bin/rsh for the communication,

I expected to use ssh and not rsh.

Any help will be welcome.


Regards.


Giuseppe Angelini



From casuj at cray.com Mon Dec 8 07:31:21 2003
From: casuj at cray.com (John Casu)
Date: Mon, 8 Dec 2003 07:31:21 -0800
Subject: [Rocks-Discuss]How to use MPICH with ssh
In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>; from Angelini Giuseppe on Mon, Dec 08,
2003 at 03:20:45PM +0100
References: <3FD488BD.3EBBDB8D@vki.ac.be>
Message-ID: <20031208073121.A10151@stemp3.wc.cray.com>
On Mon, Dec 08, 2003 at 03:20:45PM +0100, Angelini Giuseppe wrote:
>
> Dear rocks folk,
>
>
> I have recently installed mpich with Lahay Fortran and now that I can
> compile and link,
> I would like to run but it seems that I have another problem. In fact I
> have the following
> error message when I try to run:
>
> [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
> $DPT/hybflow
> p0_13226: p4_error: Path to program is invalid while starting
> /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
> -1
>     p4_error: latest msg from perror: No such file or directory
> p0_13226: p4_error: Child process exited while making connection to
> remote process on compute-0-6: 0
> p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
> p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32
>
> I am wondering why it is looking for /usr/bin/rsh for the communication,
>
> I expected to use ssh and not rsh.
>
> Any help will be welcome.
>


build mpich thus:

RSHCOMMAND=ssh ./configure .....


>
> Regards.
>
>
> Giuseppe Angelini

--
"Roses are red, Violets are blue,
 You lookin' at me ?
 YOU LOOKIN' AT ME ?!"    -- Get Fuzzy.
=======================================================================
John Casu
Cray Inc.                                           casuj at cray.com
411 First Avenue South, Suite 600                   Tel: (206) 701-2173
Seattle, WA 98104-2860                              Fax: (206) 701-2500
=======================================================================


From davidow at molbio.mgh.harvard.edu Mon Dec 8 08:12:53 2003
From: davidow at molbio.mgh.harvard.edu (Lance Davidow)
Date: Mon, 8 Dec 2003 11:12:53 -0500
Subject: [Rocks-Discuss]How to use MPICH with ssh
In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>
References: <3FD488BD.3EBBDB8D@vki.ac.be>
Message-ID: <p06002001bbfa51fea005@[132.183.190.222]>

Giuseppe,

Here's an answer from a newbie who just faced the same problem.

You are using the wrong flavor of mpich (and mpirun). There are
several different distributions which work differently in ROCKS. the
one you are using in the default path expects serv_p4 demons and
.rhosts files in your home directory. The different flavors may be
more compatible with different compilers as well.

[lance at rescluster2 lance]$ which   mpirun
/opt/mpich-mpd/gnu/bin/mpirun

the one you probably want is
/opt/mpich/gnu/bin/mpirun

[lance at rescluster2 lance]$ locate mpirun
...
/opt/mpich-mpd/gnu/bin/mpirun
...
/opt/mpich/myrinet/gnu/bin/mpirun
...
/opt/mpich/gnu/bin/mpirun

Cheers,
Lance


At 3:20 PM +0100 12/8/03, Angelini Giuseppe wrote:
>Dear rocks folk,
>
>
>I have recently installed mpich with Lahay Fortran and now that I can
>compile and link,
>I would like to run but it seems that I have another problem. In fact I
>have the following
>error message when I try to run:
>
>[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
>$DPT/hybflow
>p0_13226: p4_error: Path to program is invalid while starting
>/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
>-1
>     p4_error: latest msg from perror: No such file or directory
>p0_13226: p4_error: Child process exited while making connection to
>remote process on compute-0-6: 0
>p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
>p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32
>
>I am wondering why it is looking for /usr/bin/rsh for the communication,
>
>I expected to use ssh and not rsh.
>
>Any help will be welcome.
>
>
>Regards.
>
>Giuseppe Angelini


--
Lance Davidow, PhD
Director of Bioinformatics
Dept of Molecular Biology
Mass General Hospital
Boston MA 02114
davidow at molbio.mgh.harvard.edu
617.726-5955
Fax: 617.726-6893


From rscarce at caci.com Fri Dec 5 16:43:00 2003
From: rscarce at caci.com (Reed Scarce)
Date: Fri, 5 Dec 2003 19:43:00 -0500
Subject: [Rocks-Discuss]PXE and system images
Message-ID: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>

We want to initialize new hardware with a known good image from identical
hardware currently in use. The process imagined would be to PXE boot to a
disk image server, PXE would create a RAM system that would request the
system disk image from the server, which would push the desired system
disk image to the requesting system. Upon completion the system would be
available as a cluster member.

The lab configuration is a PC grade frontend with two 3Com 905s and a
single server grade cluster node with integrated Intel 82551 (10/100)(the
only PXE interface) and two integrated Intel 82546 (10/100/1000). The
cluster node is one of the stock of nodes for the expansion. The stock of
nodes have a Linux OS pre-installed, which would be eliminated in the
process.

Currently the node will PXE boot from the 10/100 and pickup an
installation boot from one of the g-bit interfaces. From there kickstart
wants to take over.

Any recommendations how to get kickstart to push an image to the disk?

Thanks,

Reed Scarce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031205/dad04521/attachment-0001.html

From wyzhong78 at msn.com Mon Dec 8 05:36:37 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Mon, 08 Dec 2003 21:36:37 +0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
Message-ID: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>

Hi,everyone!
I have installed Rocks 3.0.0 with default options successful,there was not
any trouble.But I boot it up,it stopped at beginning,just show "GRUB" on
the screen and waiting...
Thanks for your help!

_________________________________________________________________
???? MSN Explorer:   http://explorer.msn.com/lccn/



From daniel.kidger at quadrics.com Mon Dec 8 09:54:53 2003
From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)
Date: Mon, 8 Dec 2003 17:54:53 -0000
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>

Dear all,
    Previously I have been installing a custom kernel on the compute nodes
with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).

However I am now trying to do it the 'proper' way. So I do (on :
# cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm 
  /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
# cd /home/install
# rocks-dist dist
# SSH_NO_PASSWD=1 shoot-node compute-0-0

Hence:
# find /home/install/ |xargs -l grep -nH qsnet
shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate
my rpm in that directory rocks-dist notices this and warns me.)

However the node always ends up with "2.4.20-20.7smp" again.
anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel-
smp-2.4.20-20.7."

So my question is:
   It looks like my RPM has a name that Rocks doesn't understand properly.
   What is wrong with my name ?
   and what are the rules for getting the correct name ?
     (.i686.rpm is of course correct, but I don't have -smp. in the name Is this
the problem ?)

cf. Greg Bruno's wisdom:
  https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html


Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

>


From DGURGUL at PARTNERS.ORG Mon Dec 8 11:09:27 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Mon, 8 Dec 2003 14:09:27 -0500
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15840@phsexch7.mgh.harvard.edu>

I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" and
then "cluster-fork service gschedule restart" (not sure I had to do the last).
I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and one who
ssh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10 (for
the user on 0-17):

17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03
17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10:   1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07
10: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
10: dennis   pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on the correct
nodes.

Do the numbers on the left of the -mpd output correspond to the node names?

Thanks.

Dennis

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169



From DGURGUL at PARTNERS.ORG Mon Dec 8 11:28:30 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Mon, 8 Dec 2003 14:28:30 -0500
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>

Maybe this is a better description of the "strangeness".

I did "cluster-fork --mpd hostname":

1:   compute-0-0.local
2:   compute-0-1.local
3:   compute-0-3.local
4:   compute-0-13.local
5:   compute-0-11.local
6:   compute-0-15.local
7:   compute-0-16.local
8:   compute-0-19.local
9:   compute-0-21.local
10: compute-0-17.local
11: compute-0-5.local
12: compute-0-20.local
13: compute-0-18.local
14: compute-0-12.local
15: compute-0-9.local
16: compute-0-4.local
17: compute-0-8.local
18: compute-0-14.local
19: compute-0-2.local
20: compute-0-6.local
0: compute-0-7.local
21: compute-0-10.local

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


-----Original Message-----
From: npaci-rocks-discussion-admin at sdsc.edu
[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
Dennis J.
Sent: Monday, December 08, 2003 2:09 PM
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness


I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
and
then "cluster-fork service gschedule restart" (not sure I had to do the
last).
I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and one
who
ssh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10
(for
the user on 0-17):

17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03
17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10:   1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07
10: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
10: dennis   pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on the
correct
nodes.

Do the numbers on the left of the -mpd output correspond to the node names?
Thanks.

Dennis

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


From tim.carlson at pnl.gov Mon Dec 8 12:35:16 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 08 Dec 2003 12:35:16 -0800 (PST)
Subject: [Rocks-Discuss]PXE and system images
In-Reply-To:
 <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>
Message-ID: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>

On Fri, 5 Dec 2003, Reed Scarce wrote:

>   We want to initialize new hardware with a known good image from identical
>   hardware currently in use. The process imagined would be to PXE boot to a
>   disk image server, PXE would create a RAM system that would request the
>   system disk image from the server, which would push the desired system
>   disk image to the requesting system. Upon completion the system would be
>   available as a cluster member.
>
>   The lab configuration is a PC grade frontend with two 3Com 905s and a
>   single server grade cluster node with integrated Intel 82551 (10/100)(the
>   only PXE interface) and two integrated Intel 82546 (10/100/1000). The
>   cluster node is one of the stock of nodes for the expansion. The stock of
>   nodes have a Linux OS pre-installed, which would be eliminated in the
>   process.
>
>   Currently the node will PXE boot from the 10/100 and pickup an
>   installation boot from one of the g-bit interfaces. From there kickstart
>   wants to take over.
>
>   Any recommendations how to get kickstart to push an image to the disk?

This sounds like you want to use Oscar instead of ROCKS.

http://oscar.openclustergroup.org/tiki-index.php

I'm not exactly sure why you think that the kickstart process won't give
you exactly the same image on ever machine. If the hardware is the same,
you'll get the same image on each machine.

We have boxes with the same setup, 10/100 PXE, and then dual gigabit. Our
method for installing ROCKS on this type of hardware is the following

1) Run insert-ethers and choose "manager" type of node.
2) Connect all the PXE interfaces to the switch and boot them all. Do not
   connect the gigabit interface
3) Once all of the nodes have PXE booted, exit insert-ethers. Start
   insert-ethers again and this time choose compute node
4) Hook up the gigabit interface and the PXE interface to your nodes. All
of your machines will now install.
5) In our case, we now quickly disconnect the PXE interface because we
   don't want to have the machine continually install. The real ROCKS
   method would have you choose (HD/net) for booting in the BIOS, but if you
already
   have an OS on your machine, you would have to go into the BIOS twice
   before the compute nodes were installed. We disable rocks-grub and just
   connect up the PXE cable if we need to reinstall.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From tim.carlson at pnl.gov Mon Dec 8 12:42:23 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 08 Dec 2003 12:42:23 -0800 (PST)
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
Message-ID: <Pine.LNX.4.44.0312081238270.19031-100000@scorpion.emsl.pnl.gov>

On Mon, 8 Dec 2003 daniel.kidger at quadrics.com wrote:

I've gotten confused from time to time as to where to place custom RPMS
(it's changed between releases), so my not-so-clean method is to just rip
out the kernels in /home/install/rocks-dist/7.3/en/os/i386/Redhat/RPMS
and drop my own in. Then do a

cd /home/install
rocks-dist dist
shoot-node

You are probably running into an issue where the "force" directory is more
of an "in addition to" directory and your 2.4.18 kernel is being noted,
but ignored since the 2.4.20 kernel is newer. I assume you nodes get both
and SMP and UP version of 2.4.20 and that your custom 2.4.18 is nowhere to
be found on the compute node.

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support

>       Previously I have been installing a custom kernel on the compute nodes
>   with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).
>
>   However I am now trying to do it the 'proper' way. So I do (on :
>   # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm 
>     /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
>   # cd /home/install
>   # rocks-dist dist
>   # SSH_NO_PASSWD=1 shoot-node compute-0-0
>
>   Hence:
>   # find /home/install/ |xargs -l grep -nH qsnet
> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate
my rpm in that directory rocks-dist notices this and warns me.)
>
> However the node always ends up with "2.4.20-20.7smp" again.
> anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing
kernel-smp-2.4.20-20.7."
>
> So my question is:
>    It looks like my RPM has a name that Rocks doesn't understand properly.
>    What is wrong with my name ?
>    and what are the rules for getting the correct name ?
>      (.i686.rpm is of course correct, but I don't have -smp. in the name Is this
the problem ?)
>
> cf. Greg Bruno's wisdom:
>   https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html
>
>
> Yours,
> Daniel.



From fds at sdsc.edu Mon Dec 8 12:51:12 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Mon, 8 Dec 2003 12:51:12 -0800
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>
Message-ID: <423D0494-29C0-11D8-804D-000393A4725A@sdsc.edu>

You are right, and I think this is a shortcoming of MPD. There is no
obvious way to force the MPD numbering to correspond to the order the
nodes were called out on the command line (cluster-fork --mpd actually
makes a shell call to mpirun and it calls out all the node names
explicitly). MPD seems to number the output differently, as you found
out.

So mpd for now may be more useful for jobs that are not sensitive to
this. If enough of you find this shortcoming to be a real annoyance, we
could work on putting the node name label on the output by explicitly
calling "hostname" or similar.

Good ideas are welcome :)
-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

>   Maybe this is a better description of the "strangeness".
>
>   I did "cluster-fork --mpd hostname":
>
>   1:   compute-0-0.local
>   2:   compute-0-1.local
>   3:   compute-0-3.local
>   4:   compute-0-13.local
>   5:   compute-0-11.local
>   6:   compute-0-15.local
>   7:   compute-0-16.local
>   8: compute-0-19.local
>   9: compute-0-21.local
>   10: compute-0-17.local
>   11: compute-0-5.local
>   12: compute-0-20.local
>   13: compute-0-18.local
>   14: compute-0-12.local
>   15: compute-0-9.local
>   16: compute-0-4.local
>   17: compute-0-8.local
>   18: compute-0-14.local
>   19: compute-0-2.local
>   20: compute-0-6.local
>   0: compute-0-7.local
>   21: compute-0-10.local
>
>   Dennis J. Gurgul
>   Partners Health Care System
>   Research Management
>   Research Computing Core
>   617.724.3169
>
>
>   -----Original Message-----
>   From: npaci-rocks-discussion-admin at sdsc.edu
>   [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
>   Dennis J.
>   Sent: Monday, December 08, 2003 2:09 PM
>   To: npaci-rocks-discussion at sdsc.edu
>   Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
>   I just did "cluster-fork -Uvh
>   /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
>   and
>   then "cluster-fork service gschedule restart" (not sure I had to do the
>   last).
>   I also put 3.0.1-2 and restarted gschedule on the frontend.
>
>   Now I run "cluster-fork --mpd w".
>
>   I currently have a user who ssh'd to compute-0-8 from the frontend and
>   one
>   who
>   ssh'd into compute-0-17 from the front end.
>
>   But the return shows the users on lines for 17 (for the user on 0-8)
>   and 10
>   (for
>   the user on 0-17):
>
>   17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
>   0.03
>   17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU
>   WHAT
>   17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
>   -bash
>
>   10:   1:58pm   up 24 days,   3:21,   1 user,   load average: 0.02, 0.04,
> 0.07
> 10: USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU
> WHAT
> 10: dennis   pts/0    rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
> -bash
>
> When I do "cluster-fork w" (without the --mpd) the users show up on the
> correct
> nodes.
>
> Do the numbers on the left of the -mpd output correspond to the node
> names?
>
> Thanks.
>
> Dennis
>
> Dennis J. Gurgul
> Partners Health Care System
> Research Management
> Research Computing Core
> 617.724.3169
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From DGURGUL at PARTNERS.ORG Mon Dec 8 12:55:13 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Mon, 8 Dec 2003 15:55:13 -0500
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>

Thanks.

On a related note, when I did "cluster-fork service gschedule restart" gschedule
started with the "OK" output, but then the fork process hung on each node and I
had to ^c out for it to go on to the next node.

I tried to ssh to a node and then did the gschedule restart. Even then, after I
tried to "exit" out of the node, the session hung and I had to log back in and
kill it from the frontend.


Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


-----Original Message-----
From: Federico Sacerdoti [mailto:fds at sdsc.edu]
Sent: Monday, December 08, 2003 3:51 PM
To: Gurgul, Dennis J.
Cc: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness
You are right, and I think this is a shortcoming of MPD. There is no
obvious way to force the MPD numbering to correspond to the order the
nodes were called out on the command line (cluster-fork --mpd actually
makes a shell call to mpirun and it calls out all the node names
explicitly). MPD seems to number the output differently, as you found
out.

So mpd for now may be more useful for jobs that are not sensitive to
this. If enough of you find this shortcoming to be a real annoyance, we
could work on putting the node name label on the output by explicitly
calling "hostname" or similar.

Good ideas are welcome :)
-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

>   Maybe this is a better description of the "strangeness".
>
>   I did "cluster-fork --mpd hostname":
>
>   1: compute-0-0.local
>   2: compute-0-1.local
>   3: compute-0-3.local
>   4: compute-0-13.local
>   5: compute-0-11.local
>   6: compute-0-15.local
>   7: compute-0-16.local
>   8: compute-0-19.local
>   9: compute-0-21.local
>   10: compute-0-17.local
>   11: compute-0-5.local
>   12: compute-0-20.local
>   13: compute-0-18.local
>   14: compute-0-12.local
>   15: compute-0-9.local
>   16: compute-0-4.local
>   17: compute-0-8.local
>   18: compute-0-14.local
>   19: compute-0-2.local
>   20: compute-0-6.local
>   0: compute-0-7.local
>   21: compute-0-10.local
>
>   Dennis J. Gurgul
>   Partners Health Care System
>   Research Management
>   Research Computing Core
>   617.724.3169
>
>
>   -----Original Message-----
>   From: npaci-rocks-discussion-admin at sdsc.edu
>   [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
>   Dennis J.
>   Sent: Monday, December 08, 2003 2:09 PM
>   To: npaci-rocks-discussion at sdsc.edu
> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
> I just did "cluster-fork -Uvh
> /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
> and
> then "cluster-fork service gschedule restart" (not sure I had to do the
> last).
> I also put 3.0.1-2 and restarted gschedule on the frontend.
>
> Now I run "cluster-fork --mpd w".
>
> I currently have a user who ssh'd to compute-0-8 from the frontend and
> one
> who
> ssh'd into compute-0-17 from the front end.
>
> But the return shows the users on lines for 17 (for the user on 0-8)
> and 10
> (for
> the user on 0-17):
>
> 17:    1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
> 0.03
> 17: USER      TTY     FROM              LOGIN@   IDLE   JCPU   PCPU
> WHAT
> 17: lance     pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
> -bash
>
> 10:    1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,
> 0.07
> 10: USER      TTY     FROM              LOGIN@   IDLE   JCPU   PCPU
> WHAT
> 10: dennis    pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
> -bash
>
> When I do "cluster-fork w" (without the --mpd) the users show up on the
> correct
> nodes.
>
> Do the numbers on the left of the -mpd output correspond to the node
> names?
>
> Thanks.
>
> Dennis
>
> Dennis J. Gurgul
> Partners Health Care System
> Research Management
> Research Computing Core
> 617.724.3169
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA


From mjk at sdsc.edu   Mon Dec   8 12:58:22 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Mon, 8 Dec 2003 12:58:22 -0800
Subject: [Rocks-Discuss]PXE and system images
In-Reply-To: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>
Message-ID: <4261C250-29C1-11D8-AECB-000A95DA5638@sdsc.edu>

On Dec 8, 2003, at 12:35 PM, Tim Carlson wrote:

> 5) In our case, we now quickly disconnect the PXE interface because we
>    don't want to have the machine continually install. The real ROCKS
>    method would have you choose (HD/net) for booting in the BIOS, but
> if you already
>    have an OS on your machine, you would have to go into the BIOS twice
>    before the compute nodes were installed. We disable rocks-grub and
> just
>    connect up the PXE cable if we need to reinstall.
>

For most boxes we've seen that support PXE there is an option to hit
<F12> to force a network PXE boot, this allows you to force a PXE even
when a valid OS/Boot block exists on your hard disk. If you don't have
this you do indeed need to go into BIOS twice -- a pain.


       -mjk



From fds at sdsc.edu Mon Dec 8 13:26:46 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Mon, 8 Dec 2003 13:26:46 -0800
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>
Message-ID: <39CC5B05-29C5-11D8-804D-000393A4725A@sdsc.edu>

I've seen this before as well. I believe it has something to do with
the way the color "[ OK ]" characters are interacting with the ssh
session from the normal cluster-fork. We have yet to characterize this
bug adequately.

-Federico

On Dec 8, 2003, at 12:55 PM, Gurgul, Dennis J. wrote:

>   Thanks.
>
>   On a related note, when I did "cluster-fork service gschedule restart"
>   gschedule
>   started with the "OK" output, but then the fork process hung on each
>   node and I
>   had to ^c out for it to go on to the next node.
>
>   I tried to ssh to a node and then did the gschedule restart. Even
>   then, after I
>   tried to "exit" out of the node, the session hung and I had to log
>   back in and
>   kill it from the frontend.
>
>
> Dennis J. Gurgul
> Partners Health Care System
> Research Management
> Research Computing Core
> 617.724.3169
>
>
> -----Original Message-----
> From: Federico Sacerdoti [mailto:fds at sdsc.edu]
> Sent: Monday, December 08, 2003 3:51 PM
> To: Gurgul, Dennis J.
> Cc: npaci-rocks-discussion at sdsc.edu
> Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
> You are right, and I think this is a shortcoming of MPD. There is no
> obvious way to force the MPD numbering to correspond to the order the
> nodes were called out on the command line (cluster-fork --mpd actually
> makes a shell call to mpirun and it calls out all the node names
> explicitly). MPD seems to number the output differently, as you found
> out.
>
> So mpd for now may be more useful for jobs that are not sensitive to
> this. If enough of you find this shortcoming to be a real annoyance, we
> could work on putting the node name label on the output by explicitly
> calling "hostname" or similar.
>
> Good ideas are welcome :)
> -Federico
>
> On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:
>
>> Maybe this is a better description of the "strangeness".
>>
>> I did "cluster-fork --mpd hostname":
>>
>> 1: compute-0-0.local
>> 2: compute-0-1.local
>> 3: compute-0-3.local
>> 4: compute-0-13.local
>> 5: compute-0-11.local
>> 6: compute-0-15.local
>> 7: compute-0-16.local
>> 8: compute-0-19.local
>> 9: compute-0-21.local
>> 10: compute-0-17.local
>> 11: compute-0-5.local
>> 12: compute-0-20.local
>> 13: compute-0-18.local
>> 14: compute-0-12.local
>> 15: compute-0-9.local
>> 16: compute-0-4.local
>> 17: compute-0-8.local
>> 18: compute-0-14.local
>> 19: compute-0-2.local
>> 20: compute-0-6.local
>> 0: compute-0-7.local
>>   21: compute-0-10.local
>>
>>   Dennis J. Gurgul
>>   Partners Health Care System
>>   Research Management
>>   Research Computing Core
>>   617.724.3169
>>
>>
>>   -----Original Message-----
>>   From: npaci-rocks-discussion-admin at sdsc.edu
>>   [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
>>   Dennis J.
>>   Sent: Monday, December 08, 2003 2:09 PM
>>   To: npaci-rocks-discussion at sdsc.edu
>>   Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>>
>>
>>   I just did "cluster-fork -Uvh
>>   /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
>>   and
>>   then "cluster-fork service gschedule restart" (not sure I had to do
>>   the
>>   last).
>>   I also put 3.0.1-2 and restarted gschedule on the frontend.
>>
>>   Now I run "cluster-fork --mpd w".
>>
>>   I currently have a user who ssh'd to compute-0-8 from the frontend and
>>   one
>>   who
>>   ssh'd into compute-0-17 from the front end.
>>
>>   But the return shows the users on lines for 17 (for the user on 0-8)
>>   and 10
>>   (for
>>   the user on 0-17):
>>
>>   17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
>>   0.03
>>   17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU
>>   WHAT
>>   17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
>>   -bash
>>
>>   10:   1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,
>>   0.07
>>   10: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU
>>   WHAT
>>   10: dennis   pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
>>   -bash
>>
>>   When I do "cluster-fork w" (without the --mpd) the users show up on
>>   the
>>   correct
>>   nodes.
>>
>>   Do the numbers on the left of the -mpd output correspond to the node
>>   names?
>>
>> Thanks.
>>
>> Dennis
>>
>> Dennis J. Gurgul
>> Partners Health Care System
>> Research Management
>> Research Computing Core
>> 617.724.3169
>>
> Federico
>
> Rocks Cluster Group, San Diego Supercomputing Center, CA
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From bruno at rocksclusters.org Mon Dec 8 15:31:08 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 8 Dec 2003 15:31:08 -0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
In-Reply-To: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>
References: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>
Message-ID: <9979F090-29D6-11D8-9715-000A95C4E3B4@rocksclusters.org>

> I have installed Rocks 3.0.0 with default options successful,there was
> not any trouble.But I boot it up,it stopped at beginning,just show
> "GRUB" on the screen and waiting...

when you built the frontend, did you start with the rocks base CD then
add the HPC roll?

    - gb



From bruno at rocksclusters.org Mon Dec 8 15:37:46 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 8 Dec 2003 15:37:46 -0800
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
References: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
Message-ID: <8700A2BE-29D7-11D8-9715-000A95C4E3B4@rocksclusters.org>

>       Previously I have been installing a custom kernel on the compute
>   nodes
>   with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix
>   grub.conf).
>
>   However I am now trying to do it the 'proper' way. So I do (on :
>   # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm 
>     /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
>   # cd /home/install
>   # rocks-dist dist
>   # SSH_NO_PASSWD=1 shoot-node compute-0-0
>
>   Hence:
>   # find /home/install/ |xargs -l grep -nH qsnet
>   shows me that hdlist and hdlist2 now contain this RPM. (and indeed If
>   I duplicate my rpm in that directory rocks-dist notices this and warns
>   me.)
>
>   However the node always ends up with "2.4.20-20.7smp" again.
>   anaconda-ks.cfg contains just "kernel-smp" and install.log has
>   "Installing kernel-smp-2.4.20-20.7."
>
>   So my question is:
>      It looks like my RPM has a name that Rocks doesn't understand
>   properly.
>      What is wrong with my name ?
>      and what are the rules for getting the correct name ?
>        (.i686.rpm is of course correct, but I don't have -smp. in the
>   name Is this the problem ?)

the anaconda installer looks for kernel packages with a specific format:

       kernel-<kernel ver>-<redhat ver>.i686.rpm

and for smp nodes:

       kernel-smp-<kernel ver>-<redhat ver>.i686.rpm

we have made the necessary patches to files under /usr/src/linux-2.4 in
order to produce redhat-compliant kernels. see:

http://www.rocksclusters.org/rocks-documentation/3.0.0/customization-
kernel.html

also, would you be interested in making your changes for the quadrics
interconnect available to the general rocks community?

    - gb



From purikk at hotmail.com Mon Dec 8 20:23:35 2003
From: purikk at hotmail.com (purushotham komaravolu)
Date: Mon, 8 Dec 2003 23:23:35 -0500
Subject: [Rocks-Discuss]AMD Opteron
References: <200312082001.hB8K1KJ24139@postal.sdsc.edu>
Message-ID: <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com>

Hello,
            I am a newbie to ROCKS cluster. I wanted to setup clusters on
32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel and
AMD).
I found the 64-bit download for Intel on the website but not for AMD. Does
it work for AMD opteron? if not what is the ETA for AMD-64.
We are planning to but AMD-64 bit machines shortly, and I would like to
volunteer for the beta testing if needed.
Thanks
Regards,
Puru
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December
2003 December

Weitere ähnliche Inhalte

Was ist angesagt?

Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
Opendaylight app development
Opendaylight app developmentOpendaylight app development
Opendaylight app developmentvjanandr
 
Crossing into Kernel Space
Crossing into Kernel SpaceCrossing into Kernel Space
Crossing into Kernel SpaceDavid Evans
 
Workload Isolation - Asya Kamsky
Workload Isolation - Asya KamskyWorkload Isolation - Asya Kamsky
Workload Isolation - Asya KamskyMongoDB
 
110864103 adventures-in-bug-hunting
110864103 adventures-in-bug-hunting110864103 adventures-in-bug-hunting
110864103 adventures-in-bug-huntingbob dobbs
 
PuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetPuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetWalter Heck
 
44 con slides
44 con slides44 con slides
44 con slidesgeeksec80
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica SetsMongoDB
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)MongoDB
 
Building OpenDNS Stats
Building OpenDNS StatsBuilding OpenDNS Stats
Building OpenDNS StatsGeorge Ang
 
Federating clusters
Federating clustersFederating clusters
Federating clustersChris Dwan
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming PatternsHao Chen
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operationsgrim_radical
 
Linux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for BeginnersLinux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for BeginnersDavide Ciambelli
 
High Availability Server with DRBD in linux
High Availability Server with DRBD in linuxHigh Availability Server with DRBD in linux
High Availability Server with DRBD in linuxAli Rachman
 
Python twisted
Python twistedPython twisted
Python twistedMahendra M
 
2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-iLogesh Kumar Anandhan
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking systemJesse Vincent
 
Kubernetes Tutorial
Kubernetes TutorialKubernetes Tutorial
Kubernetes TutorialCi Jie Li
 

Was ist angesagt? (19)

Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Opendaylight app development
Opendaylight app developmentOpendaylight app development
Opendaylight app development
 
Crossing into Kernel Space
Crossing into Kernel SpaceCrossing into Kernel Space
Crossing into Kernel Space
 
Workload Isolation - Asya Kamsky
Workload Isolation - Asya KamskyWorkload Isolation - Asya Kamsky
Workload Isolation - Asya Kamsky
 
110864103 adventures-in-bug-hunting
110864103 adventures-in-bug-hunting110864103 adventures-in-bug-hunting
110864103 adventures-in-bug-hunting
 
PuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetPuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of Puppet
 
44 con slides
44 con slides44 con slides
44 con slides
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica Sets
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)
 
Building OpenDNS Stats
Building OpenDNS StatsBuilding OpenDNS Stats
Building OpenDNS Stats
 
Federating clusters
Federating clustersFederating clusters
Federating clusters
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming Patterns
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
 
Linux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for BeginnersLinux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for Beginners
 
High Availability Server with DRBD in linux
High Availability Server with DRBD in linuxHigh Availability Server with DRBD in linux
High Availability Server with DRBD in linux
 
Python twisted
Python twistedPython twisted
Python twisted
 
2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking system
 
Kubernetes Tutorial
Kubernetes TutorialKubernetes Tutorial
Kubernetes Tutorial
 

Andere mochten auch

BPRRL Generation 11 Prologue
BPRRL Generation 11 PrologueBPRRL Generation 11 Prologue
BPRRL Generation 11 Prologuendainye
 
Aardvark Final Www2010
Aardvark Final Www2010Aardvark Final Www2010
Aardvark Final Www2010guestcc519e
 
Elisha bc days 3 to 6
Elisha bc days 3 to 6Elisha bc days 3 to 6
Elisha bc days 3 to 6ndainye
 
Ownership and control in multinational joint ventures
Ownership and control in multinational joint venturesOwnership and control in multinational joint ventures
Ownership and control in multinational joint venturesanushreeg0
 
Ill Be There For You Part Two
Ill Be There For You Part TwoIll Be There For You Part Two
Ill Be There For You Part Twondainye
 
Cisp Payment Application Best Practices
Cisp Payment Application Best PracticesCisp Payment Application Best Practices
Cisp Payment Application Best Practicesguestcc519e
 
Case Study Aardvark
Case Study AardvarkCase Study Aardvark
Case Study AardvarkFM Signal
 

Andere mochten auch (7)

BPRRL Generation 11 Prologue
BPRRL Generation 11 PrologueBPRRL Generation 11 Prologue
BPRRL Generation 11 Prologue
 
Aardvark Final Www2010
Aardvark Final Www2010Aardvark Final Www2010
Aardvark Final Www2010
 
Elisha bc days 3 to 6
Elisha bc days 3 to 6Elisha bc days 3 to 6
Elisha bc days 3 to 6
 
Ownership and control in multinational joint ventures
Ownership and control in multinational joint venturesOwnership and control in multinational joint ventures
Ownership and control in multinational joint ventures
 
Ill Be There For You Part Two
Ill Be There For You Part TwoIll Be There For You Part Two
Ill Be There For You Part Two
 
Cisp Payment Application Best Practices
Cisp Payment Application Best PracticesCisp Payment Application Best Practices
Cisp Payment Application Best Practices
 
Case Study Aardvark
Case Study AardvarkCase Study Aardvark
Case Study Aardvark
 

Ähnlich wie 2003 December

Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
Slackware Demystified [SELF 2011]
Slackware Demystified [SELF 2011]Slackware Demystified [SELF 2011]
Slackware Demystified [SELF 2011]Vincent Batts
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumpsTier1app
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
(Fun clojure)
(Fun clojure)(Fun clojure)
(Fun clojure)Timo Sulg
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Installing spark 2
Installing spark 2Installing spark 2
Installing spark 2Ahmed Mekawy
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumpsTier1 App
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 
High performance PHP8 at Scale - PhpersSummit 2023
High performance PHP8 at Scale - PhpersSummit 2023 High performance PHP8 at Scale - PhpersSummit 2023
High performance PHP8 at Scale - PhpersSummit 2023 Max Małecki
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamStewart Needham
 
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
파이썬 개발환경 구성하기의 끝판왕 - Docker Composeraccoony
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsEric Chiang
 

Ähnlich wie 2003 December (20)

Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Sge
SgeSge
Sge
 
Slackware Demystified [SELF 2011]
Slackware Demystified [SELF 2011]Slackware Demystified [SELF 2011]
Slackware Demystified [SELF 2011]
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumps
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
(Fun clojure)
(Fun clojure)(Fun clojure)
(Fun clojure)
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Operation outbreak
Operation outbreakOperation outbreak
Operation outbreak
 
Installing spark 2
Installing spark 2Installing spark 2
Installing spark 2
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumps
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
Move Over, Rsync
Move Over, RsyncMove Over, Rsync
Move Over, Rsync
 
High performance PHP8 at Scale - PhpersSummit 2023
High performance PHP8 at Scale - PhpersSummit 2023 High performance PHP8 at Scale - PhpersSummit 2023
High performance PHP8 at Scale - PhpersSummit 2023
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
 
Cabra Arretado Aperriando o WordPress
Cabra Arretado Aperriando o WordPressCabra Arretado Aperriando o WordPress
Cabra Arretado Aperriando o WordPress
 
Talk NullByteCon 2015
Talk NullByteCon 2015Talk NullByteCon 2015
Talk NullByteCon 2015
 
Hacking the swisscom modem
Hacking the swisscom modemHacking the swisscom modem
Hacking the swisscom modem
 
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
 

2003 December

  • 1. From angel at miami.edu Mon Dec 1 10:25:34 2003 From: angel at miami.edu (Angel Li) Date: Mon, 01 Dec 2003 13:25:34 -0500 Subject: [Rocks-Discuss]cluster-fork Message-ID: <3FCB879E.8050905@miami.edu> Hi, I recently installed Rocks 3.0 on a Linux cluster and when I run the command "cluster-fork" I get this error: apple* cluster-fork ls Traceback (innermost last): File "/opt/rocks/sbin/cluster-fork", line 88, in ? import rocks.pssh File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? import gmon.encoder ImportError: Bad magic number in /usr/lib/python1.5/site-packages/gmon/encoder.pyc Any thoughts? I'm also wondering where to find the python sources for files in /usr/lib/python1.5/site-packages/gmon. Thanks, Angel From jghobrial at uh.edu Mon Dec 1 11:35:06 2003 From: jghobrial at uh.edu (Joseph) Date: Mon, 1 Dec 2003 13:35:06 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <3FCB879E.8050905@miami.edu> References: <3FCB879E.8050905@miami.edu> Message-ID: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> On Mon, 1 Dec 2003, Angel Li wrote: Hello Angel, I have the same problem and so far there is no response when I posted about this a month ago. Is your frontend an AMD setup?? I am thinking this is an AMD problem. Thanks, Joseph > Hi, > > I recently installed Rocks 3.0 on a Linux cluster and when I run the > command "cluster-fork" I get this error: > > apple* cluster-fork ls > Traceback (innermost last): > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > import rocks.pssh > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
  • 2. > import gmon.encoder > ImportError: Bad magic number in > /usr/lib/python1.5/site-packages/gmon/encoder.pyc > > Any thoughts? I'm also wondering where to find the python sources for > files in /usr/lib/python1.5/site-packages/gmon. > > Thanks, > > Angel > From tim.carlson at pnl.gov Mon Dec 1 14:58:54 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 01 Dec 2003 14:58:54 -0800 (PST) Subject: [Rocks-Discuss]odd kickstart problem In-Reply-To: <76AC0F5E-2025-11D8-804D-000393A4725A@sdsc.edu> Message-ID: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov> Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get the following error in /var/log/httpd/error_log Traceback (innermost last): File "/opt/rocks/sbin/kgen", line 530, in ? app.run() File "/opt/rocks/sbin/kgen", line 497, in run doc = FromXmlStream(file) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 386, in FromXmlStream return reader.fromStream(stream, ownerDocument) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 372, in fromStream self.parser.parse(s) File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 58, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line 125, in parse self.close() File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 154, in close self.feed("", isFinal = 1) File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 148, in feed self._err_handler.fatalError(exc) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 340, in fatalError raise exception xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found Doing a wget of http://frontend-0/install/kickstart.cgi? arch=i386&np=2&project=rocks on one of the working internal nodes yields the same error. Any thoughts on this?
  • 3. I've also done a fresh rocks-dist dist Tim From sjenks at uci.edu Mon Dec 1 15:35:54 2003 From: sjenks at uci.edu (Stephen Jenks) Date: Mon, 1 Dec 2003 15:35:54 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> Message-ID: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> FYI, I have a dual Athlon frontend and didn't have that problem. I know that doesn't exactly help you, but at least it doesn't fail on all AMD machines. It looks like the .pyc file might be corrupt in your installation. The source .py file (encoder.py) is in the /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing the .pyc file would regenerate it (if you run cluster-fork as root?) The md5sum for encoder.pyc on my system is: 459c78750fe6e065e9ed464ab23ab73d encoder.pyc So you can check if yours is different. Steve Jenks On Dec 1, 2003, at 11:35 AM, Joseph wrote: > On Mon, 1 Dec 2003, Angel Li wrote: > Hello Angel, I have the same problem and so far there is no response > when > I posted about this a month ago. > > Is your frontend an AMD setup?? > > I am thinking this is an AMD problem. > > Thanks, > Joseph > > >> Hi, >> >> I recently installed Rocks 3.0 on a Linux cluster and when I run the >> command "cluster-fork" I get this error: >> >> apple* cluster-fork ls >> Traceback (innermost last): >> File "/opt/rocks/sbin/cluster-fork", line 88, in ? >> import rocks.pssh >> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? >> import gmon.encoder >> ImportError: Bad magic number in
  • 4. >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc >> >> Any thoughts? I'm also wondering where to find the python sources for >> files in /usr/lib/python1.5/site-packages/gmon. >> >> Thanks, >> >> Angel >> From mjk at sdsc.edu Mon Dec 1 19:03:16 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Mon, 1 Dec 2003 19:03:16 -0800 Subject: [Rocks-Discuss]odd kickstart problem In-Reply-To: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov> References: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov> Message-ID: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu> You'll need to run the kpp and kgen steps (what kickstart.cgi does for your) manually to find if this is an XML error. # cd /home/install/profiles/current # kpp compute This will generate a kickstart file for a compute nodes, although some information will be missing since it isn't specific to a node (not like what ./kickstart.cgi --client=node-name generates). But what this does do is traverse the XML graph and build a monolithic XML kickstart profile. If this step works you can then "|" pipe the output into kgen to convert the XML to kickstart syntax. Something in this procedure should fail and point to the error. -mjk On Dec 1, 2003, at 2:58 PM, Tim Carlson wrote: > Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get > the > following error in /var/log/httpd/error_log > > > Traceback (innermost last): > File "/opt/rocks/sbin/kgen", line 530, in ? > app.run() > File "/opt/rocks/sbin/kgen", line 497, in run > doc = FromXmlStream(file) > File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line > 386, in FromXmlStream > return reader.fromStream(stream, ownerDocument) > File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line > 372, in fromStream > self.parser.parse(s) > File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line > 58, > in parse
  • 5. > xmlreader.IncrementalParser.parse(self, source) > File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line > 125, > in parse > self.close() > File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line > 154, in close > self.feed("", isFinal = 1) > File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line > 148, in feed > self._err_handler.fatalError(exc) > File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line > 340, in fatalError > raise exception > xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found > > > Doing a wget of > http://frontend-0/install/kickstart.cgi? > arch=i386&np=2&project=rocks > on one of the working internal nodes yields the same error. > > Any thoughts on this? > > I've also done a fresh > rocks-dist dist > > Tim From tim.carlson at pnl.gov Mon Dec 1 20:42:51 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 01 Dec 2003 20:42:51 -0800 (PST) Subject: [Rocks-Discuss]odd kickstart problem In-Reply-To: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu> Message-ID: <Pine.GSO.4.44.0312012040250.3148-100000@paradox.emsl.pnl.gov> On Mon, 1 Dec 2003, Mason J. Katz wrote: > You'll need to run the kpp and kgen steps (what kickstart.cgi does for > your) manually to find if this is an XML error. > > # cd /home/install/profiles/current > # kpp compute That was the trick. This sent me down the correct path. I had uninstalled SGE on the frontend (I was having problems with SGE and wanted to start from scratch) Adding the 2 SGE XML files back to /home/install/profiles/2.3.2/nodes/ fixed everything Thanks! Tim
  • 6. From landman at scalableinformatics.com Tue Dec 2 04:15:07 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 02 Dec 2003 07:15:07 -0500 Subject: [Rocks-Discuss]supermicro based MB's Message-ID: <3FCC824B.5060406@scalableinformatics.com> Folks: Working on integrating a Supermicro MB based cluster. Discovered early on that all of the compute nodes have an Intel based NIC that RedHat doesn't know anything about (any version of RH). Some of the administrative nodes have other similar issues. I am seeing simply a suprising number of mis/un detected hardware across the collection of MBs. Anyone have advice on where to get modules/module source for Redhat for these things? It looks like I will need to rebuild the boot CD, though the several times I have tried this previously have failed to produce a working/bootable system. It looks like new modules need to be created/inserted into the boot process (head node and cluster nodes) kernels, as well as into the installable kernels. Has anyone done this for a Supermicro MB based system? Thanks . Joe -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From jghobrial at uh.edu Tue Dec 2 08:28:08 2003 From: jghobrial at uh.edu (Joseph) Date: Tue, 2 Dec 2003 10:28:08 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> Message-ID: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> Indeed my md5sum is different for encoder.pyc. However, when I pulled the file and run "cluster-fork" python responds about an import problem. So it seems that regeneration did not occur. Is there a flag I need to pass? I have also tried to figure out what package provides encoder and reinstall the package, but an rpm query reveals nothing. If this is a generated file, what generates it? It seems that an rpm file query on ganglia show that files in the directory belong to the package, but encoder.pyc does not. Thanks,
  • 7. Joseph On Mon, 1 Dec 2003, Stephen Jenks wrote: > FYI, I have a dual Athlon frontend and didn't have that problem. I know > that doesn't exactly help you, but at least it doesn't fail on all AMD > machines. > > It looks like the .pyc file might be corrupt in your installation. The > source .py file (encoder.py) is in the > /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing > the .pyc file would regenerate it (if you run cluster-fork as root?) > > The md5sum for encoder.pyc on my system is: > 459c78750fe6e065e9ed464ab23ab73d encoder.pyc > So you can check if yours is different. > > Steve Jenks > > > On Dec 1, 2003, at 11:35 AM, Joseph wrote: > > > On Mon, 1 Dec 2003, Angel Li wrote: > > Hello Angel, I have the same problem and so far there is no response > > when > > I posted about this a month ago. > > > > Is your frontend an AMD setup?? > > > > I am thinking this is an AMD problem. > > > > Thanks, > > Joseph > > > > > >> Hi, > >> > >> I recently installed Rocks 3.0 on a Linux cluster and when I run the > >> command "cluster-fork" I get this error: > >> > >> apple* cluster-fork ls > >> Traceback (innermost last): > >> File "/opt/rocks/sbin/cluster-fork", line 88, in ? > >> import rocks.pssh > >> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > >> import gmon.encoder > >> ImportError: Bad magic number in > >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc > >> > >> Any thoughts? I'm also wondering where to find the python sources for > >> files in /usr/lib/python1.5/site-packages/gmon. > >> > >> Thanks, > >> > >> Angel > >> >
  • 8. From angel at miami.edu Tue Dec 2 09:02:55 2003 From: angel at miami.edu (Angel Li) Date: Tue, 02 Dec 2003 12:02:55 -0500 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> Message-ID: <3FCCC5BF.3030903@miami.edu> Joseph wrote: >Indeed my md5sum is different for encoder.pyc. However, when I pulled the >file and run "cluster-fork" python responds about an import problem. So it >seems that regeneration did not occur. Is there a flag I need to pass? > >I have also tried to figure out what package provides encoder and >reinstall the package, but an rpm query reveals nothing. > >If this is a generated file, what generates it? > >It seems that an rpm file query on ganglia show that files in the >directory belong to the package, but encoder.pyc does not. > >Thanks, >Joseph > > > > I have finally found the python sources in the HPC rolls CD, filename ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it seems python "compiles" the .py files to ".pyc" and then deletes the source file the first time they are referenced? I also noticed that there are two versions of python installed. Maybe the pyc files from one version won't load into the other one? Angel From mjk at sdsc.edu Tue Dec 2 15:52:52 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 2 Dec 2003 15:52:52 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <3FCCC5BF.3030903@miami.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> Message-ID: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> Python creates the .pyc files for you, and does not remove the original .py file. I would be extremely surprised it two "identical" .pyc files had the same md5 checksum. I'd expect this to be more like C .o file which always contain random data to pad out to the end of a page and
  • 9. 32/64 bit word sizes. Still this is just a guess, the real point is you can always remove the .pyc files and the .py will regenerate it when imported (although standard UNIX file/dir permission still apply). What is the import error you get from cluster-fork? -mjk On Dec 2, 2003, at 9:02 AM, Angel Li wrote: > Joseph wrote: > >> Indeed my md5sum is different for encoder.pyc. However, when I pulled >> the file and run "cluster-fork" python responds about an import >> problem. So it seems that regeneration did not occur. Is there a flag >> I need to pass? >> >> I have also tried to figure out what package provides encoder and >> reinstall the package, but an rpm query reveals nothing. >> >> If this is a generated file, what generates it? >> >> It seems that an rpm file query on ganglia show that files in the >> directory belong to the package, but encoder.pyc does not. >> >> Thanks, >> Joseph >> >> >> > I have finally found the python sources in the HPC rolls CD, filename > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > seems python "compiles" the .py files to ".pyc" and then deletes the > source file the first time they are referenced? I also noticed that > there are two versions of python installed. Maybe the pyc files from > one version won't load into the other one? > > Angel > > From vrowley at ucsd.edu Mon Dec 1 14:27:03 2003 From: vrowley at ucsd.edu (V. Rowley) Date: Mon, 01 Dec 2003 14:27:03 -0800 Subject: [Rocks-Discuss]PXE boot problems Message-ID: <3FCBC037.5000302@ucsd.edu> We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to install a compute node via PXE. We are getting an error similar to the one mentioned in the archives, e.g. > Loading initrd.img.... > Ready > > Failed to free base memory >
  • 10. We have upgraded to syslinux-2.07-1, per the suggestion in the archives, but continue to get the same error. Any ideas? -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb From naihh at imcb.a-star.edu.sg Tue Dec 2 18:50:55 2003 From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis) Date: Wed, 3 Dec 2003 10:50:55 +0800 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg> Hi Laurence, I just downloaded the Rocks3.0 for IA32 and installed it but SGE is still not working. Any idea? Nai Hong Hwa Francis Institute of Molecular and Cell Biology (A*STAR) 30 Medical Drive Singapore 117609. DID: (65) 6874-6196 -----Original Message----- From: Laurence Liew [mailto:laurence at scalablesys.com] Sent: Thursday, November 20, 2003 2:53 PM To: Nai Hong Hwa Francis Cc: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? Hi Francis GridEngine roll is ready for ia32. We will get a ia64 native version ready as soon as we get back from SC2003. It will be released in a few weeks time. Globus GT2.4 is included in the Grid Roll Cheers! Laurence On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote: > > Hi,
  • 11. > > Does anyone have any idea when will Sun Grid Engine be included as part > of Rocks 3 distribution. > > I am a newbie to Grid Computing. > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid? > > Regards > > Nai Hong Hwa Francis > > Institute of Molecular and Cell Biology (A*STAR) > 30 Medical Drive > Singapore 117609 > DID: 65-6874-6196 > > -----Original Message----- > From: npaci-rocks-discussion-request at sdsc.edu > [mailto:npaci-rocks-discussion-request at sdsc.edu] > Sent: Thursday, November 20, 2003 4:01 AM > To: npaci-rocks-discussion at sdsc.edu > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs > > Send npaci-rocks-discussion mailing list submissions to > npaci-rocks-discussion at sdsc.edu > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > or, via email, send a message with subject or body 'help' to > npaci-rocks-discussion-request at sdsc.edu > > You can reach the person managing the list at > npaci-rocks-discussion-admin at sdsc.edu > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of npaci-rocks-discussion digest..." > > > Today's Topics: > > 1. top500 cluster installation movie (Greg Bruno) > 2. Re: Running Normal Application on Rocks Cluster - > Newbie Question (Laurence Liew) > > --__--__-- > > Message: 1 > To: npaci-rocks-discussion at sdsc.edu > From: Greg Bruno <bruno at rocksclusters.org> > Date: Tue, 18 Nov 2003 13:41:15 -0800 > Subject: [Rocks-Discuss]top500 cluster installation movie > > here's a crew of 7, installing the 201st fastest supercomputer in the > world in under two hours on the showroom floor at SC 03: > > http://www.rocksclusters.org/rocks.mov >
  • 12. > warning: the above file is ~65MB. > > - gb > > > --__--__-- > > Message: 2 > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks Cluster > - > Newbie Question > From: Laurence Liew <laurenceliew at yahoo.com.sg> > To: Leong Chee Shian <chee-shian.leong at schenker.com> > Cc: npaci-rocks-discussion at sdsc.edu > Date: Wed, 19 Nov 2003 12:31:18 +0800 > > Chee Shian, > > Thanks for your call. We will take this off list and visit you next week > in your office as you requested. > > Cheers! > laurence > > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote: > > I have just installed Rocks 3.0 with one frontend and two compute > > node. > > > > A normal file based application is installed on the frontend and is > > NFS shared to the compute nodes . > > > > Question is : When run 5 sessions of my applications , the CPU > > utilization is all concentrated on the frontend node , nothing is > > being passed on to the compute nodes . How do I make these 3 computers > > to function as one and share the load ? > > > > Thanks everyone as I am really new to this clustering stuff.. > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive > > intel machines to replace our existing multi CPU sun server, > > suggestions and recommendations are greatly appreciated. > > > > > > Leong > > > > > > > > > > --__--__-- > > _______________________________________________ > npaci-rocks-discussion mailing list
  • 13. > npaci-rocks-discussion at sdsc.edu > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > > End of npaci-rocks-discussion Digest > > > DISCLAIMER: > This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. -- Laurence Liew CTO, Scalable Systems Pte Ltd 7 Bedok South Road Singapore 469272 Tel : 65 6827 3953 Fax : 65 6827 3922 Mobile: 65 9029 4312 Email : laurence at scalablesys.com http://www.scalablesys.com DISCLAIMER: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. From laurence at scalablesys.com Tue Dec 2 19:10:08 2003 From: laurence at scalablesys.com (Laurence Liew) Date: Wed, 03 Dec 2003 11:10:08 +0800 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? In-Reply-To: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg> References: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg> Message-ID: <1070421007.2452.51.camel@scalable> Hi, SGE is in the SGE roll. You need to download the base, hpc and sge roll. The install is now different from V2.3.x Cheers! laurence On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote: > Hi Laurence, >
  • 14. > I just downloaded the Rocks3.0 for IA32 and installed it but SGE is > still not working. > > Any idea? > > Nai Hong Hwa Francis > Institute of Molecular and Cell Biology (A*STAR) > 30 Medical Drive > Singapore 117609. > DID: (65) 6874-6196 > > -----Original Message----- > From: Laurence Liew [mailto:laurence at scalablesys.com] > Sent: Thursday, November 20, 2003 2:53 PM > To: Nai Hong Hwa Francis > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included > inRocks 3 for Itanium? > > Hi Francis > > GridEngine roll is ready for ia32. We will get a ia64 native version > ready as soon as we get back from SC2003. It will be released in a few > weeks time. > > Globus GT2.4 is included in the Grid Roll > > Cheers! > Laurence > > > On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote: > > > > Hi, > > > > Does anyone have any idea when will Sun Grid Engine be included as > part > > of Rocks 3 distribution. > > > > I am a newbie to Grid Computing. > > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid? > > > > Regards > > > > Nai Hong Hwa Francis > > > > Institute of Molecular and Cell Biology (A*STAR) > > 30 Medical Drive > > Singapore 117609 > > DID: 65-6874-6196 > > > > -----Original Message----- > > From: npaci-rocks-discussion-request at sdsc.edu > > [mailto:npaci-rocks-discussion-request at sdsc.edu] > > Sent: Thursday, November 20, 2003 4:01 AM > > To: npaci-rocks-discussion at sdsc.edu > > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs > > > > Send npaci-rocks-discussion mailing list submissions to
  • 15. > > npaci-rocks-discussion at sdsc.edu > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > or, via email, send a message with subject or body 'help' to > > npaci-rocks-discussion-request at sdsc.edu > > > > You can reach the person managing the list at > > npaci-rocks-discussion-admin at sdsc.edu > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of npaci-rocks-discussion digest..." > > > > > > Today's Topics: > > > > 1. top500 cluster installation movie (Greg Bruno) > > 2. Re: Running Normal Application on Rocks Cluster - > > Newbie Question (Laurence Liew) > > > > --__--__-- > > > > Message: 1 > > To: npaci-rocks-discussion at sdsc.edu > > From: Greg Bruno <bruno at rocksclusters.org> > > Date: Tue, 18 Nov 2003 13:41:15 -0800 > > Subject: [Rocks-Discuss]top500 cluster installation movie > > > > here's a crew of 7, installing the 201st fastest supercomputer in the > > world in under two hours on the showroom floor at SC 03: > > > > http://www.rocksclusters.org/rocks.mov > > > > warning: the above file is ~65MB. > > > > - gb > > > > > > --__--__-- > > > > Message: 2 > > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks > Cluster > > - > > Newbie Question > > From: Laurence Liew <laurenceliew at yahoo.com.sg> > > To: Leong Chee Shian <chee-shian.leong at schenker.com> > > Cc: npaci-rocks-discussion at sdsc.edu > > Date: Wed, 19 Nov 2003 12:31:18 +0800 > > > > Chee Shian, > > > > Thanks for your call. We will take this off list and visit you next > week > > in your office as you requested. > > > > Cheers! > > laurence
  • 16. > > > > > > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote: > > > I have just installed Rocks 3.0 with one frontend and two compute > > > node. > > > > > > A normal file based application is installed on the frontend and is > > > NFS shared to the compute nodes . > > > > > > Question is : When run 5 sessions of my applications , the CPU > > > utilization is all concentrated on the frontend node , nothing is > > > being passed on to the compute nodes . How do I make these 3 > computers > > > to function as one and share the load ? > > > > > > Thanks everyone as I am really new to this clustering stuff.. > > > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive > > > intel machines to replace our existing multi CPU sun server, > > > suggestions and recommendations are greatly appreciated. > > > > > > > > > Leong > > > > > > > > > > > > > > > > > --__--__-- > > > > _______________________________________________ > > npaci-rocks-discussion mailing list > > npaci-rocks-discussion at sdsc.edu > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > > > > > End of npaci-rocks-discussion Digest > > > > > > DISCLAIMER: > > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its contents to any > other person as it may be an offence under the Official Secrets Act. > Thank you. -- Laurence Liew CTO, Scalable Systems Pte Ltd 7 Bedok South Road Singapore 469272 Tel : 65 6827 3953 Fax : 65 6827 3922 Mobile: 65 9029 4312 Email : laurence at scalablesys.com http://www.scalablesys.com
  • 17. From DGURGUL at PARTNERS.ORG Wed Dec 3 07:24:29 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Wed, 3 Dec 2003 10:24:29 -0500 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for Itanium? Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu> Where do we find the SGE roll? Under Lhoste at http://rocks.npaci.edu/Rocks/ there is a "Grid" roll listed. Is SGE in that? The userguide doesn't mention SGE. Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 -----Original Message----- From: npaci-rocks-discussion-admin at sdsc.edu [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Laurence Liew Sent: Tuesday, December 02, 2003 10:10 PM To: Nai Hong Hwa Francis Cc: npaci-rocks-discussion at sdsc.edu Subject: RE: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? Hi, SGE is in the SGE roll. You need to download the base, hpc and sge roll. The install is now different from V2.3.x Cheers! laurence On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote: > Hi Laurence, > > I just downloaded the Rocks3.0 for IA32 and installed it but SGE is > still not working. > > Any idea? > > Nai Hong Hwa Francis > Institute of Molecular and Cell Biology (A*STAR) > 30 Medical Drive > Singapore 117609. > DID: (65) 6874-6196 > > -----Original Message----- > From: Laurence Liew [mailto:laurence at scalablesys.com] > Sent: Thursday, November 20, 2003 2:53 PM
  • 18. > To: Nai Hong Hwa Francis > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included > inRocks 3 for Itanium? > > Hi Francis > > GridEngine roll is ready for ia32. We will get a ia64 native version > ready as soon as we get back from SC2003. It will be released in a few > weeks time. > > Globus GT2.4 is included in the Grid Roll > > Cheers! > Laurence > > > On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote: > > > > Hi, > > > > Does anyone have any idea when will Sun Grid Engine be included as > part > > of Rocks 3 distribution. > > > > I am a newbie to Grid Computing. > > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid? > > > > Regards > > > > Nai Hong Hwa Francis > > > > Institute of Molecular and Cell Biology (A*STAR) > > 30 Medical Drive > > Singapore 117609 > > DID: 65-6874-6196 > > > > -----Original Message----- > > From: npaci-rocks-discussion-request at sdsc.edu > > [mailto:npaci-rocks-discussion-request at sdsc.edu] > > Sent: Thursday, November 20, 2003 4:01 AM > > To: npaci-rocks-discussion at sdsc.edu > > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs > > > > Send npaci-rocks-discussion mailing list submissions to > > npaci-rocks-discussion at sdsc.edu > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > or, via email, send a message with subject or body 'help' to > > npaci-rocks-discussion-request at sdsc.edu > > > > You can reach the person managing the list at > > npaci-rocks-discussion-admin at sdsc.edu > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of npaci-rocks-discussion digest..." > >
  • 19. > > > > Today's Topics: > > > > 1. top500 cluster installation movie (Greg Bruno) > > 2. Re: Running Normal Application on Rocks Cluster - > > Newbie Question (Laurence Liew) > > > > --__--__-- > > > > Message: 1 > > To: npaci-rocks-discussion at sdsc.edu > > From: Greg Bruno <bruno at rocksclusters.org> > > Date: Tue, 18 Nov 2003 13:41:15 -0800 > > Subject: [Rocks-Discuss]top500 cluster installation movie > > > > here's a crew of 7, installing the 201st fastest supercomputer in the > > world in under two hours on the showroom floor at SC 03: > > > > http://www.rocksclusters.org/rocks.mov > > > > warning: the above file is ~65MB. > > > > - gb > > > > > > --__--__-- > > > > Message: 2 > > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks > Cluster > > - > > Newbie Question > > From: Laurence Liew <laurenceliew at yahoo.com.sg> > > To: Leong Chee Shian <chee-shian.leong at schenker.com> > > Cc: npaci-rocks-discussion at sdsc.edu > > Date: Wed, 19 Nov 2003 12:31:18 +0800 > > > > Chee Shian, > > > > Thanks for your call. We will take this off list and visit you next > week > > in your office as you requested. > > > > Cheers! > > laurence > > > > > > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote: > > > I have just installed Rocks 3.0 with one frontend and two compute > > > node. > > > > > > A normal file based application is installed on the frontend and is > > > NFS shared to the compute nodes . > > > > > > Question is : When run 5 sessions of my applications , the CPU > > > utilization is all concentrated on the frontend node , nothing is > > > being passed on to the compute nodes . How do I make these 3 > computers
  • 20. > > > to function as one and share the load ? > > > > > > Thanks everyone as I am really new to this clustering stuff.. > > > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive > > > intel machines to replace our existing multi CPU sun server, > > > suggestions and recommendations are greatly appreciated. > > > > > > > > > Leong > > > > > > > > > > > > > > > > > --__--__-- > > > > _______________________________________________ > > npaci-rocks-discussion mailing list > > npaci-rocks-discussion at sdsc.edu > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > > > > > End of npaci-rocks-discussion Digest > > > > > > DISCLAIMER: > > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its contents to any > other person as it may be an offence under the Official Secrets Act. > Thank you. -- Laurence Liew CTO, Scalable Systems Pte Ltd 7 Bedok South Road Singapore 469272 Tel : 65 6827 3953 Fax : 65 6827 3922 Mobile: 65 9029 4312 Email : laurence at scalablesys.com http://www.scalablesys.com From bruno at rocksclusters.org Wed Dec 3 07:32:14 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 3 Dec 2003 07:32:14 -0800 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for Itanium? In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu> References: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu> Message-ID: <DF132702-25A5-11D8-86E6-000A95C4E3B4@rocksclusters.org> > Where do we find the SGE roll? Under Lhoste at > http://rocks.npaci.edu/Rocks/ > there is a "Grid" roll listed. Is SGE in that? The userguide doesn't > mention > SGE.
  • 21. the SGE roll will be available in the upcoming v3.1.0 release. scheduled release date is december 15th. - gb From jlkaiser at fnal.gov Wed Dec 3 08:35:18 2003 From: jlkaiser at fnal.gov (Joe Kaiser) Date: Wed, 03 Dec 2003 10:35:18 -0600 Subject: [Rocks-Discuss]supermicro based MB's In-Reply-To: <3FCC824B.5060406@scalableinformatics.com> References: <3FCC824B.5060406@scalableinformatics.com> Message-ID: <1070469318.12324.13.camel@nietzsche.fnal.gov> Hi, You don't say what version of Rocks you are using. The following is for the X5DPA-GG board and Rocks 3.0. It requires modifying only the pcitable in the boot image on the tftp server. I believe the procedure for 2.3.2 requires a heck of a lot more work, (but it may not). I would have to dig deep for the notes about the changing 2.3.2. This should be done on the frontend: cd /tftpboot/X86PC/UNDI/pxelinux/ cp initrd.img initrd.img.orig cp initrd.img /tmp cd /tmp mv initrd.img initrd.gz gunzip initrd.gz mkdir /mnt/loop mount -o loop initrd /mnt/loop cd /mnt/loop/modules/ vi pcitable Search for the e1000 drivers and add the following line: 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet Controller" write the file cd /tmp umount /mnt/loop gzip initrd mv initrd.gz initrd.img mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/ Then boot the node. Hope this helps. Thanks, Joe On Tue, 2003-12-02 at 06:15, Joe Landman wrote:
  • 22. > Folks: > > Working on integrating a Supermicro MB based cluster. Discovered early > on that all of the compute nodes have an Intel based NIC that RedHat > doesn't know anything about (any version of RH). Some of the > administrative nodes have other similar issues. I am seeing simply a > suprising number of mis/un detected hardware across the collection of MBs. > > Anyone have advice on where to get modules/module source for Redhat > for these things? It looks like I will need to rebuild the boot CD, > though the several times I have tried this previously have failed to > produce a working/bootable system. It looks like new modules need to be > created/inserted into the boot process (head node and cluster nodes) > kernels, as well as into the installable kernels. > > Has anyone done this for a Supermicro MB based system? Thanks . > > Joe -- =================================================================== Joe Kaiser - Systems Administrator Fermi Lab CD/OSS-SCS Never laugh at live dragons. 630-840-6444 jlkaiser at fnal.gov =================================================================== From jghobrial at uh.edu Wed Dec 3 08:59:15 2003 From: jghobrial at uh.edu (Joseph) Date: Wed, 3 Dec 2003 10:59:15 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> Message-ID: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> Here is the error I receive when I remove the file encoder.pyc and run the command cluster-fork Traceback (innermost last): File "/opt/rocks/sbin/cluster-fork", line 88, in ? import rocks.pssh File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? import gmon.encoder ImportError: No module named encoder Thanks, Joseph On Tue, 2 Dec 2003, Mason J. Katz wrote: > Python creates the .pyc files for you, and does not remove the original
  • 23. > .py file. I would be extremely surprised it two "identical" .pyc files > had the same md5 checksum. I'd expect this to be more like C .o file > which always contain random data to pad out to the end of a page and > 32/64 bit word sizes. Still this is just a guess, the real point is > you can always remove the .pyc files and the .py will regenerate it > when imported (although standard UNIX file/dir permission still apply). > > What is the import error you get from cluster-fork? > > -mjk > > On Dec 2, 2003, at 9:02 AM, Angel Li wrote: > > > Joseph wrote: > > > >> Indeed my md5sum is different for encoder.pyc. However, when I pulled > >> the file and run "cluster-fork" python responds about an import > >> problem. So it seems that regeneration did not occur. Is there a flag > >> I need to pass? > >> > >> I have also tried to figure out what package provides encoder and > >> reinstall the package, but an rpm query reveals nothing. > >> > >> If this is a generated file, what generates it? > >> > >> It seems that an rpm file query on ganglia show that files in the > >> directory belong to the package, but encoder.pyc does not. > >> > >> Thanks, > >> Joseph > >> > >> > >> > > I have finally found the python sources in the HPC rolls CD, filename > > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > > seems python "compiles" the .py files to ".pyc" and then deletes the > > source file the first time they are referenced? I also noticed that > > there are two versions of python installed. Maybe the pyc files from > > one version won't load into the other one? > > > > Angel > > > > > From mjk at sdsc.edu Wed Dec 3 15:19:38 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 3 Dec 2003 15:19:38 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> Message-ID: <2A332131-25E7-11D8-A641-000A95DA5638@sdsc.edu> This file come from a ganglia package, what does
  • 24. # rpm -q ganglia-receptor Return? -mjk On Dec 3, 2003, at 8:59 AM, Joseph wrote: > Here is the error I receive when I remove the file encoder.pyc and run > the > command cluster-fork > > Traceback (innermost last): > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > import rocks.pssh > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > import gmon.encoder > ImportError: No module named encoder > > Thanks, > Joseph > > > On Tue, 2 Dec 2003, Mason J. Katz wrote: > >> Python creates the .pyc files for you, and does not remove the >> original >> .py file. I would be extremely surprised it two "identical" .pyc >> files >> had the same md5 checksum. I'd expect this to be more like C .o file >> which always contain random data to pad out to the end of a page and >> 32/64 bit word sizes. Still this is just a guess, the real point is >> you can always remove the .pyc files and the .py will regenerate it >> when imported (although standard UNIX file/dir permission still >> apply). >> >> What is the import error you get from cluster-fork? >> >> -mjk >> >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote: >> >>> Joseph wrote: >>> >>>> Indeed my md5sum is different for encoder.pyc. However, when I >>>> pulled >>>> the file and run "cluster-fork" python responds about an import >>>> problem. So it seems that regeneration did not occur. Is there a >>>> flag >>>> I need to pass? >>>> >>>> I have also tried to figure out what package provides encoder and >>>> reinstall the package, but an rpm query reveals nothing. >>>> >>>> If this is a generated file, what generates it? >>>> >>>> It seems that an rpm file query on ganglia show that files in the
  • 25. >>>> directory belong to the package, but encoder.pyc does not. >>>> >>>> Thanks, >>>> Joseph >>>> >>>> >>>> >>> I have finally found the python sources in the HPC rolls CD, filename >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it >>> seems python "compiles" the .py files to ".pyc" and then deletes the >>> source file the first time they are referenced? I also noticed that >>> there are two versions of python installed. Maybe the pyc files from >>> one version won't load into the other one? >>> >>> Angel >>> >>> >> From csamuel at vpac.org Wed Dec 3 18:09:26 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 4 Dec 2003 13:09:26 +1100 Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL trademark removal ? Message-ID: <200312041309.27986.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks, Can someone confirm that the next Rocks release will support Opteron please ? Also, I noticed that the current Rocks release on Itanium based on RHEL still has a lot of mentions of RedHat in it, which from my reading of their trademark guidelines is not permitted, is that fixed in the new version ? cheers! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/zpdWO2KABBYQAh8RAqB8AJ9FG+IjIeem21qlFS6XYIHamIMPmwCghVTV AgjAlVHWgdv/KzYQinHGPxs= =IAWU -----END PGP SIGNATURE----- From bruno at rocksclusters.org Wed Dec 3 18:46:30 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 3 Dec 2003 18:46:30 -0800
  • 26. Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL trademark removal ? In-Reply-To: <200312041309.27986.csamuel@vpac.org> References: <200312041309.27986.csamuel@vpac.org> Message-ID: <10AD9827-2604-11D8-86E6-000A95C4E3B4@rocksclusters.org> > Can someone confirm that the next Rocks release will support Opteron > please ? yes, it will support opteron. > Also, I noticed that the current Rocks release on Itanium based on > RHEL still > has a lot of mentions of RedHat in it, which from my reading of their > trademark guidelines is not permitted, is that fixed in the new > version ? and yes, (even though it doesn't feel like the right thing to do, as redhat has offered to the community some outstanding technologies that we'd like to credit), all redhat trademarks will be removed from 3.1.0. - gb From fds at sdsc.edu Thu Dec 4 06:46:32 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Thu, 4 Dec 2003 06:46:32 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> Message-ID: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu> Please install the http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1 -2.i386.rpm package, which includes the correct encoder.py file. (This package is listed on the 3.0.0 errata page) -Federico On Dec 3, 2003, at 8:59 AM, Joseph wrote: > Here is the error I receive when I remove the file encoder.pyc and run > the > command cluster-fork > > Traceback (innermost last): > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > import rocks.pssh > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > import gmon.encoder > ImportError: No module named encoder > > Thanks, > Joseph
  • 27. > > > On Tue, 2 Dec 2003, Mason J. Katz wrote: > >> Python creates the .pyc files for you, and does not remove the >> original >> .py file. I would be extremely surprised it two "identical" .pyc >> files >> had the same md5 checksum. I'd expect this to be more like C .o file >> which always contain random data to pad out to the end of a page and >> 32/64 bit word sizes. Still this is just a guess, the real point is >> you can always remove the .pyc files and the .py will regenerate it >> when imported (although standard UNIX file/dir permission still >> apply). >> >> What is the import error you get from cluster-fork? >> >> -mjk >> >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote: >> >>> Joseph wrote: >>> >>>> Indeed my md5sum is different for encoder.pyc. However, when I >>>> pulled >>>> the file and run "cluster-fork" python responds about an import >>>> problem. So it seems that regeneration did not occur. Is there a >>>> flag >>>> I need to pass? >>>> >>>> I have also tried to figure out what package provides encoder and >>>> reinstall the package, but an rpm query reveals nothing. >>>> >>>> If this is a generated file, what generates it? >>>> >>>> It seems that an rpm file query on ganglia show that files in the >>>> directory belong to the package, but encoder.pyc does not. >>>> >>>> Thanks, >>>> Joseph >>>> >>>> >>>> >>> I have finally found the python sources in the HPC rolls CD, filename >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it >>> seems python "compiles" the .py files to ".pyc" and then deletes the >>> source file the first time they are referenced? I also noticed that >>> there are two versions of python installed. Maybe the pyc files from >>> one version won't load into the other one? >>> >>> Angel >>> >>> >> >> Federico Rocks Cluster Group, San Diego Supercomputing Center, CA
  • 28. From jghobrial at uh.edu Thu Dec 4 07:14:21 2003 From: jghobrial at uh.edu (Joseph) Date: Thu, 4 Dec 2003 09:14:21 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu> Message-ID: <Pine.LNX.4.56.0312040913110.13972@mail.tlc2.uh.edu> Thank you very much this solved the problem. Joseph On Thu, 4 Dec 2003, Federico Sacerdoti wrote: > Please install the > http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1 > -2.i386.rpm package, which includes the correct encoder.py file. (This > package is listed on the 3.0.0 errata page) > > -Federico > > On Dec 3, 2003, at 8:59 AM, Joseph wrote: > > > Here is the error I receive when I remove the file encoder.pyc and run > > the > > command cluster-fork > > > > Traceback (innermost last): > > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > > import rocks.pssh > > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > > import gmon.encoder > > ImportError: No module named encoder > > > > Thanks, > > Joseph > > > > > > On Tue, 2 Dec 2003, Mason J. Katz wrote: > > > >> Python creates the .pyc files for you, and does not remove the > >> original > >> .py file. I would be extremely surprised it two "identical" .pyc > >> files > >> had the same md5 checksum. I'd expect this to be more like C .o file > >> which always contain random data to pad out to the end of a page and > >> 32/64 bit word sizes. Still this is just a guess, the real point is > >> you can always remove the .pyc files and the .py will regenerate it > >> when imported (although standard UNIX file/dir permission still > >> apply).
  • 29. > >> > >> What is the import error you get from cluster-fork? > >> > >> -mjk > >> > >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote: > >> > >>> Joseph wrote: > >>> > >>>> Indeed my md5sum is different for encoder.pyc. However, when I > >>>> pulled > >>>> the file and run "cluster-fork" python responds about an import > >>>> problem. So it seems that regeneration did not occur. Is there a > >>>> flag > >>>> I need to pass? > >>>> > >>>> I have also tried to figure out what package provides encoder and > >>>> reinstall the package, but an rpm query reveals nothing. > >>>> > >>>> If this is a generated file, what generates it? > >>>> > >>>> It seems that an rpm file query on ganglia show that files in the > >>>> directory belong to the package, but encoder.pyc does not. > >>>> > >>>> Thanks, > >>>> Joseph > >>>> > >>>> > >>>> > >>> I have finally found the python sources in the HPC rolls CD, filename > >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > >>> seems python "compiles" the .py files to ".pyc" and then deletes the > >>> source file the first time they are referenced? I also noticed that > >>> there are two versions of python installed. Maybe the pyc files from > >>> one version won't load into the other one? > >>> > >>> Angel > >>> > >>> > >> > >> > Federico > > Rocks Cluster Group, San Diego Supercomputing Center, CA > From vrowley at ucsd.edu Thu Dec 4 12:29:55 2003 From: vrowley at ucsd.edu (V. Rowley) Date: Thu, 04 Dec 2003 12:29:55 -0800 Subject: [Rocks-Discuss]Re: PXE boot problems In-Reply-To: <3FCBC037.5000302@ucsd.edu> References: <3FCBC037.5000302@ucsd.edu> Message-ID: <3FCF9943.1020806@ucsd.edu> Uh, nevermind. We had upgraded syslinux on our frontend, not the node we were trying to PXE boot. Sigh. V. Rowley wrote:
  • 30. > We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to > install a compute node via PXE. We are getting an error similar to the > one mentioned in the archives, e.g. > >> Loading initrd.img.... >> Ready >> >> Failed to free base memory >> > > We have upgraded to syslinux-2.07-1, per the suggestion in the archives, > but continue to get the same error. Any ideas? > -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb From cdwan at mail.ahc.umn.edu Fri Dec 5 08:16:07 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Fri, 5 Dec 2003 10:16:07 -0600 (CST) Subject: [Rocks-Discuss]Private NIS master Message-ID: <Pine.GSO.4.58.0312042305070.18193@lenti.med.umn.edu> Hello all. Long time listener, first time caller. Thanks for all the great work. I'm integrating a Rocks cluster into an existing NIS domain. I noticed that while the cluster database now supports a PrivateNISMaster, that variable doesn't make it into the /etc/yp.conf on the compute nodes. They remain broadcast. Assume that, for whatever reason, I don't want to set up a repeater (slave) ypserv process on my frontend. I added the option "--nisserver <var name="Kickstart_PrivateNISMaster"/>" to the "profiles/3.0.0/nodes/nis-client.xml" file, removed the ypserver on my frontend, and it works like I want it to. Am I missing anything fundamental here? -Chris Dwan University of Minnesota From wyzhong78 at msn.com Mon Dec 8 06:18:34 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Mon, 08 Dec 2003 22:18:34 +0800 Subject: [Rocks-Discuss]3.0.0 problem: not able to boot up Message-ID: <BAY3-F14uFqD45TpNO40002c14c@hotmail.com> Hi,everyone!
  • 31. I installed rocks 3.0.0 defautly, There wasn't any trouble in the installing. But I haven't be able to boot,it stopped at the beginning,the message "GRUB" showed on the screen,and waiting.... my hardware are double Xeon 2.4G,MSI 9138,Seagate SCSI disk. Any appreciate is welcome! _________________________________________________________________ ???? MSN Explorer: http://explorer.msn.com/lccn/ From angelini at vki.ac.be Mon Dec 8 06:20:45 2003 From: angelini at vki.ac.be (Angelini Giuseppe) Date: Mon, 08 Dec 2003 15:20:45 +0100 Subject: [Rocks-Discuss]How to use MPICH with ssh Message-ID: <3FD488BD.3EBBDB8D@vki.ac.be> Dear rocks folk, I have recently installed mpich with Lahay Fortran and now that I can compile and link, I would like to run but it seems that I have another problem. In fact I have the following error message when I try to run: [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE $DPT/hybflow p0_13226: p4_error: Path to program is invalid while starting /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7: -1 p4_error: latest msg from perror: No such file or directory p0_13226: p4_error: Child process exited while making connection to remote process on compute-0-6: 0 p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32 p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32 I am wondering why it is looking for /usr/bin/rsh for the communication, I expected to use ssh and not rsh. Any help will be welcome. Regards. Giuseppe Angelini From casuj at cray.com Mon Dec 8 07:31:21 2003 From: casuj at cray.com (John Casu) Date: Mon, 8 Dec 2003 07:31:21 -0800 Subject: [Rocks-Discuss]How to use MPICH with ssh In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>; from Angelini Giuseppe on Mon, Dec 08, 2003 at 03:20:45PM +0100 References: <3FD488BD.3EBBDB8D@vki.ac.be> Message-ID: <20031208073121.A10151@stemp3.wc.cray.com>
  • 32. On Mon, Dec 08, 2003 at 03:20:45PM +0100, Angelini Giuseppe wrote: > > Dear rocks folk, > > > I have recently installed mpich with Lahay Fortran and now that I can > compile and link, > I would like to run but it seems that I have another problem. In fact I > have the following > error message when I try to run: > > [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE > $DPT/hybflow > p0_13226: p4_error: Path to program is invalid while starting > /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7: > -1 > p4_error: latest msg from perror: No such file or directory > p0_13226: p4_error: Child process exited while making connection to > remote process on compute-0-6: 0 > p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32 > p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32 > > I am wondering why it is looking for /usr/bin/rsh for the communication, > > I expected to use ssh and not rsh. > > Any help will be welcome. > build mpich thus: RSHCOMMAND=ssh ./configure ..... > > Regards. > > > Giuseppe Angelini -- "Roses are red, Violets are blue, You lookin' at me ? YOU LOOKIN' AT ME ?!" -- Get Fuzzy. ======================================================================= John Casu Cray Inc. casuj at cray.com 411 First Avenue South, Suite 600 Tel: (206) 701-2173 Seattle, WA 98104-2860 Fax: (206) 701-2500 ======================================================================= From davidow at molbio.mgh.harvard.edu Mon Dec 8 08:12:53 2003 From: davidow at molbio.mgh.harvard.edu (Lance Davidow) Date: Mon, 8 Dec 2003 11:12:53 -0500 Subject: [Rocks-Discuss]How to use MPICH with ssh In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>
  • 33. References: <3FD488BD.3EBBDB8D@vki.ac.be> Message-ID: <p06002001bbfa51fea005@[132.183.190.222]> Giuseppe, Here's an answer from a newbie who just faced the same problem. You are using the wrong flavor of mpich (and mpirun). There are several different distributions which work differently in ROCKS. the one you are using in the default path expects serv_p4 demons and .rhosts files in your home directory. The different flavors may be more compatible with different compilers as well. [lance at rescluster2 lance]$ which mpirun /opt/mpich-mpd/gnu/bin/mpirun the one you probably want is /opt/mpich/gnu/bin/mpirun [lance at rescluster2 lance]$ locate mpirun ... /opt/mpich-mpd/gnu/bin/mpirun ... /opt/mpich/myrinet/gnu/bin/mpirun ... /opt/mpich/gnu/bin/mpirun Cheers, Lance At 3:20 PM +0100 12/8/03, Angelini Giuseppe wrote: >Dear rocks folk, > > >I have recently installed mpich with Lahay Fortran and now that I can >compile and link, >I would like to run but it seems that I have another problem. In fact I >have the following >error message when I try to run: > >[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE >$DPT/hybflow >p0_13226: p4_error: Path to program is invalid while starting >/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7: >-1 > p4_error: latest msg from perror: No such file or directory >p0_13226: p4_error: Child process exited while making connection to >remote process on compute-0-6: 0 >p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32 >p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32 > >I am wondering why it is looking for /usr/bin/rsh for the communication, > >I expected to use ssh and not rsh. > >Any help will be welcome. > >
  • 34. >Regards. > >Giuseppe Angelini -- Lance Davidow, PhD Director of Bioinformatics Dept of Molecular Biology Mass General Hospital Boston MA 02114 davidow at molbio.mgh.harvard.edu 617.726-5955 Fax: 617.726-6893 From rscarce at caci.com Fri Dec 5 16:43:00 2003 From: rscarce at caci.com (Reed Scarce) Date: Fri, 5 Dec 2003 19:43:00 -0500 Subject: [Rocks-Discuss]PXE and system images Message-ID: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com> We want to initialize new hardware with a known good image from identical hardware currently in use. The process imagined would be to PXE boot to a disk image server, PXE would create a RAM system that would request the system disk image from the server, which would push the desired system disk image to the requesting system. Upon completion the system would be available as a cluster member. The lab configuration is a PC grade frontend with two 3Com 905s and a single server grade cluster node with integrated Intel 82551 (10/100)(the only PXE interface) and two integrated Intel 82546 (10/100/1000). The cluster node is one of the stock of nodes for the expansion. The stock of nodes have a Linux OS pre-installed, which would be eliminated in the process. Currently the node will PXE boot from the 10/100 and pickup an installation boot from one of the g-bit interfaces. From there kickstart wants to take over. Any recommendations how to get kickstart to push an image to the disk? Thanks, Reed Scarce -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031205/dad04521/attachment-0001.html From wyzhong78 at msn.com Mon Dec 8 05:36:37 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Mon, 08 Dec 2003 21:36:37 +0800 Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up Message-ID: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com> Hi,everyone! I have installed Rocks 3.0.0 with default options successful,there was not any trouble.But I boot it up,it stopped at beginning,just show "GRUB" on
  • 35. the screen and waiting... Thanks for your help! _________________________________________________________________ ???? MSN Explorer: http://explorer.msn.com/lccn/ From daniel.kidger at quadrics.com Mon Dec 8 09:54:53 2003 From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com) Date: Mon, 8 Dec 2003 17:54:53 -0000 Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> Dear all, Previously I have been installing a custom kernel on the compute nodes with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf). However I am now trying to do it the 'proper' way. So I do (on : # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm /home/install/rocks-dist/7.3/en/os/i386/force/RPMS # cd /home/install # rocks-dist dist # SSH_NO_PASSWD=1 shoot-node compute-0-0 Hence: # find /home/install/ |xargs -l grep -nH qsnet shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate my rpm in that directory rocks-dist notices this and warns me.) However the node always ends up with "2.4.20-20.7smp" again. anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel- smp-2.4.20-20.7." So my question is: It looks like my RPM has a name that Rocks doesn't understand properly. What is wrong with my name ? and what are the rules for getting the correct name ? (.i686.rpm is of course correct, but I don't have -smp. in the name Is this the problem ?) cf. Greg Bruno's wisdom: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- > From DGURGUL at PARTNERS.ORG Mon Dec 8 11:09:27 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Mon, 8 Dec 2003 14:09:27 -0500
  • 36. Subject: [Rocks-Discuss]cluster-fork --mpd strangeness Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15840@phsexch7.mgh.harvard.edu> I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" and then "cluster-fork service gschedule restart" (not sure I had to do the last). I also put 3.0.1-2 and restarted gschedule on the frontend. Now I run "cluster-fork --mpd w". I currently have a user who ssh'd to compute-0-8 from the frontend and one who ssh'd into compute-0-17 from the front end. But the return shows the users on lines for 17 (for the user on 0-8) and 10 (for the user on 0-17): 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash When I do "cluster-fork w" (without the --mpd) the users show up on the correct nodes. Do the numbers on the left of the -mpd output correspond to the node names? Thanks. Dennis Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 From DGURGUL at PARTNERS.ORG Mon Dec 8 11:28:30 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Mon, 8 Dec 2003 14:28:30 -0500 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu> Maybe this is a better description of the "strangeness". I did "cluster-fork --mpd hostname": 1: compute-0-0.local 2: compute-0-1.local 3: compute-0-3.local 4: compute-0-13.local 5: compute-0-11.local 6: compute-0-15.local 7: compute-0-16.local 8: compute-0-19.local 9: compute-0-21.local
  • 37. 10: compute-0-17.local 11: compute-0-5.local 12: compute-0-20.local 13: compute-0-18.local 14: compute-0-12.local 15: compute-0-9.local 16: compute-0-4.local 17: compute-0-8.local 18: compute-0-14.local 19: compute-0-2.local 20: compute-0-6.local 0: compute-0-7.local 21: compute-0-10.local Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 -----Original Message----- From: npaci-rocks-discussion-admin at sdsc.edu [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, Dennis J. Sent: Monday, December 08, 2003 2:09 PM To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]cluster-fork --mpd strangeness I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" and then "cluster-fork service gschedule restart" (not sure I had to do the last). I also put 3.0.1-2 and restarted gschedule on the frontend. Now I run "cluster-fork --mpd w". I currently have a user who ssh'd to compute-0-8 from the frontend and one who ssh'd into compute-0-17 from the front end. But the return shows the users on lines for 17 (for the user on 0-8) and 10 (for the user on 0-17): 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash When I do "cluster-fork w" (without the --mpd) the users show up on the correct nodes. Do the numbers on the left of the -mpd output correspond to the node names?
  • 38. Thanks. Dennis Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 From tim.carlson at pnl.gov Mon Dec 8 12:35:16 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 08 Dec 2003 12:35:16 -0800 (PST) Subject: [Rocks-Discuss]PXE and system images In-Reply-To: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com> Message-ID: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov> On Fri, 5 Dec 2003, Reed Scarce wrote: > We want to initialize new hardware with a known good image from identical > hardware currently in use. The process imagined would be to PXE boot to a > disk image server, PXE would create a RAM system that would request the > system disk image from the server, which would push the desired system > disk image to the requesting system. Upon completion the system would be > available as a cluster member. > > The lab configuration is a PC grade frontend with two 3Com 905s and a > single server grade cluster node with integrated Intel 82551 (10/100)(the > only PXE interface) and two integrated Intel 82546 (10/100/1000). The > cluster node is one of the stock of nodes for the expansion. The stock of > nodes have a Linux OS pre-installed, which would be eliminated in the > process. > > Currently the node will PXE boot from the 10/100 and pickup an > installation boot from one of the g-bit interfaces. From there kickstart > wants to take over. > > Any recommendations how to get kickstart to push an image to the disk? This sounds like you want to use Oscar instead of ROCKS. http://oscar.openclustergroup.org/tiki-index.php I'm not exactly sure why you think that the kickstart process won't give you exactly the same image on ever machine. If the hardware is the same, you'll get the same image on each machine. We have boxes with the same setup, 10/100 PXE, and then dual gigabit. Our method for installing ROCKS on this type of hardware is the following 1) Run insert-ethers and choose "manager" type of node. 2) Connect all the PXE interfaces to the switch and boot them all. Do not connect the gigabit interface 3) Once all of the nodes have PXE booted, exit insert-ethers. Start insert-ethers again and this time choose compute node 4) Hook up the gigabit interface and the PXE interface to your nodes. All
  • 39. of your machines will now install. 5) In our case, we now quickly disconnect the PXE interface because we don't want to have the machine continually install. The real ROCKS method would have you choose (HD/net) for booting in the BIOS, but if you already have an OS on your machine, you would have to go into the BIOS twice before the compute nodes were installed. We disable rocks-grub and just connect up the PXE cable if we need to reinstall. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From tim.carlson at pnl.gov Mon Dec 8 12:42:23 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 08 Dec 2003 12:42:23 -0800 (PST) Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> Message-ID: <Pine.LNX.4.44.0312081238270.19031-100000@scorpion.emsl.pnl.gov> On Mon, 8 Dec 2003 daniel.kidger at quadrics.com wrote: I've gotten confused from time to time as to where to place custom RPMS (it's changed between releases), so my not-so-clean method is to just rip out the kernels in /home/install/rocks-dist/7.3/en/os/i386/Redhat/RPMS and drop my own in. Then do a cd /home/install rocks-dist dist shoot-node You are probably running into an issue where the "force" directory is more of an "in addition to" directory and your 2.4.18 kernel is being noted, but ignored since the 2.4.20 kernel is newer. I assume you nodes get both and SMP and UP version of 2.4.20 and that your custom 2.4.18 is nowhere to be found on the compute node. Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support > Previously I have been installing a custom kernel on the compute nodes > with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf). > > However I am now trying to do it the 'proper' way. So I do (on : > # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm > /home/install/rocks-dist/7.3/en/os/i386/force/RPMS > # cd /home/install > # rocks-dist dist > # SSH_NO_PASSWD=1 shoot-node compute-0-0 > > Hence: > # find /home/install/ |xargs -l grep -nH qsnet
  • 40. > shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate my rpm in that directory rocks-dist notices this and warns me.) > > However the node always ends up with "2.4.20-20.7smp" again. > anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel-smp-2.4.20-20.7." > > So my question is: > It looks like my RPM has a name that Rocks doesn't understand properly. > What is wrong with my name ? > and what are the rules for getting the correct name ? > (.i686.rpm is of course correct, but I don't have -smp. in the name Is this the problem ?) > > cf. Greg Bruno's wisdom: > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html > > > Yours, > Daniel. From fds at sdsc.edu Mon Dec 8 12:51:12 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Mon, 8 Dec 2003 12:51:12 -0800 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu> References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu> Message-ID: <423D0494-29C0-11D8-804D-000393A4725A@sdsc.edu> You are right, and I think this is a shortcoming of MPD. There is no obvious way to force the MPD numbering to correspond to the order the nodes were called out on the command line (cluster-fork --mpd actually makes a shell call to mpirun and it calls out all the node names explicitly). MPD seems to number the output differently, as you found out. So mpd for now may be more useful for jobs that are not sensitive to this. If enough of you find this shortcoming to be a real annoyance, we could work on putting the node name label on the output by explicitly calling "hostname" or similar. Good ideas are welcome :) -Federico On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote: > Maybe this is a better description of the "strangeness". > > I did "cluster-fork --mpd hostname": > > 1: compute-0-0.local > 2: compute-0-1.local > 3: compute-0-3.local > 4: compute-0-13.local > 5: compute-0-11.local > 6: compute-0-15.local > 7: compute-0-16.local
  • 41. > 8: compute-0-19.local > 9: compute-0-21.local > 10: compute-0-17.local > 11: compute-0-5.local > 12: compute-0-20.local > 13: compute-0-18.local > 14: compute-0-12.local > 15: compute-0-9.local > 16: compute-0-4.local > 17: compute-0-8.local > 18: compute-0-14.local > 19: compute-0-2.local > 20: compute-0-6.local > 0: compute-0-7.local > 21: compute-0-10.local > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > > > -----Original Message----- > From: npaci-rocks-discussion-admin at sdsc.edu > [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, > Dennis J. > Sent: Monday, December 08, 2003 2:09 PM > To: npaci-rocks-discussion at sdsc.edu > Subject: [Rocks-Discuss]cluster-fork --mpd strangeness > > > I just did "cluster-fork -Uvh > /sourcedir/ganglia-python-3.0.1-2.i386.rpm" > and > then "cluster-fork service gschedule restart" (not sure I had to do the > last). > I also put 3.0.1-2 and restarted gschedule on the frontend. > > Now I run "cluster-fork --mpd w". > > I currently have a user who ssh'd to compute-0-8 from the frontend and > one > who > ssh'd into compute-0-17 from the front end. > > But the return shows the users on lines for 17 (for the user on 0-8) > and 10 > (for > the user on 0-17): > > 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, > 0.03 > 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s > -bash > > 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,
  • 42. > 0.07 > 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s > -bash > > When I do "cluster-fork w" (without the --mpd) the users show up on the > correct > nodes. > > Do the numbers on the left of the -mpd output correspond to the node > names? > > Thanks. > > Dennis > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From DGURGUL at PARTNERS.ORG Mon Dec 8 12:55:13 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Mon, 8 Dec 2003 15:55:13 -0500 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu> Thanks. On a related note, when I did "cluster-fork service gschedule restart" gschedule started with the "OK" output, but then the fork process hung on each node and I had to ^c out for it to go on to the next node. I tried to ssh to a node and then did the gschedule restart. Even then, after I tried to "exit" out of the node, the session hung and I had to log back in and kill it from the frontend. Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 -----Original Message----- From: Federico Sacerdoti [mailto:fds at sdsc.edu] Sent: Monday, December 08, 2003 3:51 PM To: Gurgul, Dennis J. Cc: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness
  • 43. You are right, and I think this is a shortcoming of MPD. There is no obvious way to force the MPD numbering to correspond to the order the nodes were called out on the command line (cluster-fork --mpd actually makes a shell call to mpirun and it calls out all the node names explicitly). MPD seems to number the output differently, as you found out. So mpd for now may be more useful for jobs that are not sensitive to this. If enough of you find this shortcoming to be a real annoyance, we could work on putting the node name label on the output by explicitly calling "hostname" or similar. Good ideas are welcome :) -Federico On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote: > Maybe this is a better description of the "strangeness". > > I did "cluster-fork --mpd hostname": > > 1: compute-0-0.local > 2: compute-0-1.local > 3: compute-0-3.local > 4: compute-0-13.local > 5: compute-0-11.local > 6: compute-0-15.local > 7: compute-0-16.local > 8: compute-0-19.local > 9: compute-0-21.local > 10: compute-0-17.local > 11: compute-0-5.local > 12: compute-0-20.local > 13: compute-0-18.local > 14: compute-0-12.local > 15: compute-0-9.local > 16: compute-0-4.local > 17: compute-0-8.local > 18: compute-0-14.local > 19: compute-0-2.local > 20: compute-0-6.local > 0: compute-0-7.local > 21: compute-0-10.local > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > > > -----Original Message----- > From: npaci-rocks-discussion-admin at sdsc.edu > [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, > Dennis J. > Sent: Monday, December 08, 2003 2:09 PM > To: npaci-rocks-discussion at sdsc.edu
  • 44. > Subject: [Rocks-Discuss]cluster-fork --mpd strangeness > > > I just did "cluster-fork -Uvh > /sourcedir/ganglia-python-3.0.1-2.i386.rpm" > and > then "cluster-fork service gschedule restart" (not sure I had to do the > last). > I also put 3.0.1-2 and restarted gschedule on the frontend. > > Now I run "cluster-fork --mpd w". > > I currently have a user who ssh'd to compute-0-8 from the frontend and > one > who > ssh'd into compute-0-17 from the front end. > > But the return shows the users on lines for 17 (for the user on 0-8) > and 10 > (for > the user on 0-17): > > 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, > 0.03 > 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s > -bash > > 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, > 0.07 > 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s > -bash > > When I do "cluster-fork w" (without the --mpd) the users show up on the > correct > nodes. > > Do the numbers on the left of the -mpd output correspond to the node > names? > > Thanks. > > Dennis > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From mjk at sdsc.edu Mon Dec 8 12:58:22 2003
  • 45. From: mjk at sdsc.edu (Mason J. Katz) Date: Mon, 8 Dec 2003 12:58:22 -0800 Subject: [Rocks-Discuss]PXE and system images In-Reply-To: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov> References: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov> Message-ID: <4261C250-29C1-11D8-AECB-000A95DA5638@sdsc.edu> On Dec 8, 2003, at 12:35 PM, Tim Carlson wrote: > 5) In our case, we now quickly disconnect the PXE interface because we > don't want to have the machine continually install. The real ROCKS > method would have you choose (HD/net) for booting in the BIOS, but > if you already > have an OS on your machine, you would have to go into the BIOS twice > before the compute nodes were installed. We disable rocks-grub and > just > connect up the PXE cable if we need to reinstall. > For most boxes we've seen that support PXE there is an option to hit <F12> to force a network PXE boot, this allows you to force a PXE even when a valid OS/Boot block exists on your hard disk. If you don't have this you do indeed need to go into BIOS twice -- a pain. -mjk From fds at sdsc.edu Mon Dec 8 13:26:46 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Mon, 8 Dec 2003 13:26:46 -0800 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu> References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu> Message-ID: <39CC5B05-29C5-11D8-804D-000393A4725A@sdsc.edu> I've seen this before as well. I believe it has something to do with the way the color "[ OK ]" characters are interacting with the ssh session from the normal cluster-fork. We have yet to characterize this bug adequately. -Federico On Dec 8, 2003, at 12:55 PM, Gurgul, Dennis J. wrote: > Thanks. > > On a related note, when I did "cluster-fork service gschedule restart" > gschedule > started with the "OK" output, but then the fork process hung on each > node and I > had to ^c out for it to go on to the next node. > > I tried to ssh to a node and then did the gschedule restart. Even > then, after I > tried to "exit" out of the node, the session hung and I had to log > back in and > kill it from the frontend.
  • 46. > > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > > > -----Original Message----- > From: Federico Sacerdoti [mailto:fds at sdsc.edu] > Sent: Monday, December 08, 2003 3:51 PM > To: Gurgul, Dennis J. > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness > > > You are right, and I think this is a shortcoming of MPD. There is no > obvious way to force the MPD numbering to correspond to the order the > nodes were called out on the command line (cluster-fork --mpd actually > makes a shell call to mpirun and it calls out all the node names > explicitly). MPD seems to number the output differently, as you found > out. > > So mpd for now may be more useful for jobs that are not sensitive to > this. If enough of you find this shortcoming to be a real annoyance, we > could work on putting the node name label on the output by explicitly > calling "hostname" or similar. > > Good ideas are welcome :) > -Federico > > On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote: > >> Maybe this is a better description of the "strangeness". >> >> I did "cluster-fork --mpd hostname": >> >> 1: compute-0-0.local >> 2: compute-0-1.local >> 3: compute-0-3.local >> 4: compute-0-13.local >> 5: compute-0-11.local >> 6: compute-0-15.local >> 7: compute-0-16.local >> 8: compute-0-19.local >> 9: compute-0-21.local >> 10: compute-0-17.local >> 11: compute-0-5.local >> 12: compute-0-20.local >> 13: compute-0-18.local >> 14: compute-0-12.local >> 15: compute-0-9.local >> 16: compute-0-4.local >> 17: compute-0-8.local >> 18: compute-0-14.local >> 19: compute-0-2.local >> 20: compute-0-6.local >> 0: compute-0-7.local
  • 47. >> 21: compute-0-10.local >> >> Dennis J. Gurgul >> Partners Health Care System >> Research Management >> Research Computing Core >> 617.724.3169 >> >> >> -----Original Message----- >> From: npaci-rocks-discussion-admin at sdsc.edu >> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, >> Dennis J. >> Sent: Monday, December 08, 2003 2:09 PM >> To: npaci-rocks-discussion at sdsc.edu >> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness >> >> >> I just did "cluster-fork -Uvh >> /sourcedir/ganglia-python-3.0.1-2.i386.rpm" >> and >> then "cluster-fork service gschedule restart" (not sure I had to do >> the >> last). >> I also put 3.0.1-2 and restarted gschedule on the frontend. >> >> Now I run "cluster-fork --mpd w". >> >> I currently have a user who ssh'd to compute-0-8 from the frontend and >> one >> who >> ssh'd into compute-0-17 from the front end. >> >> But the return shows the users on lines for 17 (for the user on 0-8) >> and 10 >> (for >> the user on 0-17): >> >> 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, >> 0.03 >> 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU >> WHAT >> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s >> -bash >> >> 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, >> 0.07 >> 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU >> WHAT >> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s >> -bash >> >> When I do "cluster-fork w" (without the --mpd) the users show up on >> the >> correct >> nodes. >> >> Do the numbers on the left of the -mpd output correspond to the node >> names?
  • 48. >> >> Thanks. >> >> Dennis >> >> Dennis J. Gurgul >> Partners Health Care System >> Research Management >> Research Computing Core >> 617.724.3169 >> > Federico > > Rocks Cluster Group, San Diego Supercomputing Center, CA > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From bruno at rocksclusters.org Mon Dec 8 15:31:08 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 8 Dec 2003 15:31:08 -0800 Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up In-Reply-To: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com> References: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com> Message-ID: <9979F090-29D6-11D8-9715-000A95C4E3B4@rocksclusters.org> > I have installed Rocks 3.0.0 with default options successful,there was > not any trouble.But I boot it up,it stopped at beginning,just show > "GRUB" on the screen and waiting... when you built the frontend, did you start with the rocks base CD then add the HPC roll? - gb From bruno at rocksclusters.org Mon Dec 8 15:37:46 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 8 Dec 2003 15:37:46 -0800 Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> Message-ID: <8700A2BE-29D7-11D8-9715-000A95C4E3B4@rocksclusters.org> > Previously I have been installing a custom kernel on the compute > nodes > with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix > grub.conf). > > However I am now trying to do it the 'proper' way. So I do (on : > # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm > /home/install/rocks-dist/7.3/en/os/i386/force/RPMS > # cd /home/install > # rocks-dist dist > # SSH_NO_PASSWD=1 shoot-node compute-0-0
  • 49. > > Hence: > # find /home/install/ |xargs -l grep -nH qsnet > shows me that hdlist and hdlist2 now contain this RPM. (and indeed If > I duplicate my rpm in that directory rocks-dist notices this and warns > me.) > > However the node always ends up with "2.4.20-20.7smp" again. > anaconda-ks.cfg contains just "kernel-smp" and install.log has > "Installing kernel-smp-2.4.20-20.7." > > So my question is: > It looks like my RPM has a name that Rocks doesn't understand > properly. > What is wrong with my name ? > and what are the rules for getting the correct name ? > (.i686.rpm is of course correct, but I don't have -smp. in the > name Is this the problem ?) the anaconda installer looks for kernel packages with a specific format: kernel-<kernel ver>-<redhat ver>.i686.rpm and for smp nodes: kernel-smp-<kernel ver>-<redhat ver>.i686.rpm we have made the necessary patches to files under /usr/src/linux-2.4 in order to produce redhat-compliant kernels. see: http://www.rocksclusters.org/rocks-documentation/3.0.0/customization- kernel.html also, would you be interested in making your changes for the quadrics interconnect available to the general rocks community? - gb From purikk at hotmail.com Mon Dec 8 20:23:35 2003 From: purikk at hotmail.com (purushotham komaravolu) Date: Mon, 8 Dec 2003 23:23:35 -0500 Subject: [Rocks-Discuss]AMD Opteron References: <200312082001.hB8K1KJ24139@postal.sdsc.edu> Message-ID: <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com> Hello, I am a newbie to ROCKS cluster. I wanted to setup clusters on 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel and AMD). I found the 64-bit download for Intel on the website but not for AMD. Does it work for AMD opteron? if not what is the ETA for AMD-64. We are planning to but AMD-64 bit machines shortly, and I would like to volunteer for the beta testing if needed. Thanks Regards, Puru