Groups | Search | Server Info | Login | Register


Groups > comp.parallel.mpi > #67

Re: Running "MPI" program on "Cluster"

From blmblm@myrealbox.com <blmblm.myrealbox@gmail.com>
Newsgroups comp.parallel.mpi
Subject Re: Running "MPI" program on "Cluster"
Date 2013-09-28 17:33 +0000
Organization None
Message-ID <baoi7lFje3dU1@mid.individual.net> (permalink)
References <320c9a76-fe5c-4ff1-b674-5248db10759d@googlegroups.com> <c9c984d7-528d-4f20-8565-db299ad11120@googlegroups.com>

Show all headers | View raw


In article <c9c984d7-528d-4f20-8565-db299ad11120@googlegroups.com>,
Meenal Chougule  <meenal.chougule@gmail.com> wrote:
> On Thursday, September 26, 2013 2:17:19 PM UTC+8, Meenal Chougule wrote:
> > Hello everyone,
> > 
> > 
> > 
> >      I have a program having Master and Slave kind of nature. I want to execute those on a cluster. 
> > 
> > 
> > 
> >      for cluster there is a master and 2 slaves. cluster master does decomposition of work and slave executes that.
> > 
> >   
> > 
> >      i know IP`s of both slave but i want to know the command by which i can execute the or options in mpirun.


I still don't know what you mean by "the or options" here.


> > 
> > 
> > 
> > 
> > 
> > Thanku,
> > 
> > M D C
> 
> The command i used is
> 
> mpirun -np ./manager <cnf file as a input> ./worker "no of worker"



Is this the actual command?  I ask because I thought "-np" needed to
be followed by a number of processes.  But I notice from one of your
other responses [*] that you're using MPICH, and my recent experience
has been with OpenMPI, and I suppose the arguments could be different.

But if this actually launches one copy of a "master" program and
two copies of a "slave" program, well, you've already solved the problem
I thought you were having (how to accomplish that).

[*] Is there a reason you didn't put all the information in one reply
rather than spreading it out?  Well, probably not that important.



> 
> The error i got by unusual termination is this,
> 
> ============================================================
> =======================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> [proxy:1:0@minal] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
> [proxy:1:0@minal] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:1:0@minal] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
> [mpiexec@minal] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
> [mpiexec@minal] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec@minal] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion
> [mpiexec@minal] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion
> 
> 

Without knowing anything about your code I can't say much!  *Maybe*
the mention of "waiting for completion" means that process B was
waiting for process A to send it something, but process A ended
without sending it.  But that's at best a guess.

Have you successfully run other MPI programs (even simple "hello world" 
ones) on this cluster?  

-- 
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.

Back to comp.parallel.mpi | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-09-25 23:17 -0700
  Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-09-26 16:45 +0000
  Re: Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-09-27 23:03 -0700
  Re: Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-09-27 23:04 -0700
  Re: Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-09-27 23:06 -0700
    Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-09-28 17:33 +0000
      Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-10-01 15:29 +0000
  Re: Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-09-27 23:06 -0700
    Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-09-28 17:34 +0000
  Re: Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-10-02 00:45 -0700
    Re: Running "MPI" program on "Cluster" Keith Thompson <kst-u@mib.org> - 2013-10-02 11:52 -0700
      Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-10-03 21:43 +0000
        Re: Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-10-06 09:10 -0700
          Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-10-18 18:57 +0000
            Re: Running "MPI" program on "Cluster" Meenal Chougule <meenal.chougule@gmail.com> - 2013-10-21 23:48 -0700
              Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-10-22 16:53 +0000
    Re: Running "MPI" program on "Cluster" blmblm@myrealbox.com <blmblm.myrealbox@gmail.com> - 2013-10-02 19:10 +0000

csiph-web