Re: better way to get PID of mpirun?

From: Eric Roman (ERoman_at_lbl_dot_gov)
Date: Fri Apr 23 2004 - 10:43:42 PDT

  • Next message: jcduell_at_lbl_dot_gov: "Re: blcr compile fix.."
    If you just want to track the PIDs, I you can use the PBS job ID to look
    up the session ID on the node (forgot how...  but easy if your nodes aren't
    time-shared, just log in.), and then use pstree to find the PID.
    
    You could mod. the mpirun, too.  Users usually don't build their own MPI libs.
    
     - E
    
    On Fri, Apr 23, 2004 at 10:30:58AM -0700, JCDuell_at_lbl_dot_gov wrote:
    > On Fri, Apr 23, 2004 at 11:36:59AM -0500, Kevin wrote:
    > > Dear All,
    > > 
    > > When we use blcr to checkpoint the MPI program, the current pid of
    > > mpirun is needed. But in fact, it is not possible that we ask all the
    > > programmers who code mpi applications print out the pid of mpirun in
    > > theire program. So it there some suggestion to get the pid of mpirun
    > > without modifying the user MPI application source codes, while we are
    > > ready to use blcr's command line functionality?
    >  
    > Tingyu,
    > 
    > The normal case will evenutally be for our checkpoint/retart software to
    > be integrated into the batch system on a machine (like PBS,
    > LoadLeveller, etc.), so you'd use a utility like 'qstat' to get your
    > job's ID, and then you could checkpoint/restart it with that ID.
    > 
    > We are adding such support to the Scalable Software Systems system
    > project's software:
    > 
    >     http://www.scidac.org/ScalableSystems/
    > 
    > And we hope at some point to see OpenPBS support too (and support from
    > commercial vendors).
    > 
    > 
    > -- 
    > Jason Duell             Future Technologies Group
    > <jcduell_at_lbl_dot_gov>       Computational Research Division
    > Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory
    
    -- 
    Eric Roman                       Computational Research Division
    510-486-6420                     Berkeley Lab
    

  • Next message: jcduell_at_lbl_dot_gov: "Re: blcr compile fix.."