Error in exec

From: Kevin (tz9_at_msstate.edu)
Date: Wed May 19 2004 - 08:07:21 PDT

  • Next message: Eric Roman: "Re: Error in exec"
    Dear Sir, 
    
    I used lam7.0.4 combined with blcr-0.2.0 to perform checkpoint mpi program. It works fine with single program and MPI program running on one node before.Today when I tried to checkpoint a MPI program (the "hello" program under example directory with LAM package)running on one node of our cluster, the MPI program could be checkpointed and context file is saved. But when I try to restart it, it returns "Error in exec" to the screen.I can't figure out where the problem is.Could you please give me some suggestion?
    
    Below are some information on my operation and configuration:
    
    [kevin@Sparrow-01-02 ~/src]mpirun C ./hello 
    //it works fine and information displayed at console 1, 
    
    [kevin@Sparrow-01-02 ~/src] getpid mpirun 
    //I got the pid of mpirun with a script "getpid" from console 2, assumed it is 344
    
    [kevin@Sparrow-01-02 ~/src]cr_checkpoint 344
    //checkpoint the ./hello from console2, it works fine, the context.344 is saved to disk
    
    [kevin@Sparrow-01-02 ~/src]cr_restart context.344
    Error in exec
    
    ---below are configurations----------------------------------
    [kevin@Sparrow-01-02 ~/src]lamnodes
    n0      Sparrow-01-02.ERC.MsState.Edu:1:origin,this_node
    
    [kevin@Sparrow-01-02 ~/src]laminfo
               LAM/MPI: 7.0.4
                Prefix: /home/kevin/LAM
          Architecture: i686-pc-linux-gnu
         Configured by: kevin
         Configured on: Mon May  3 15:45:08 CDT 2004
        Configure host: Sparrow-01-01.ERC.MsState.Edu
            C bindings: yes
          C++ bindings: yes
      Fortran bindings: yes
           C profiling: yes
         C++ profiling: yes
     Fortran profiling: yes
         ROMIO support: yes
          IMPI support: no
         Debug support: no
          Purify clean: no
              SSI boot: globus (Module v0.5)
              SSI boot: rsh (Module v1.0)
              SSI coll: lam_basic (Module v7.0)
              SSI coll: smp (Module v1.0)
               SSI rpi: crtcp (Module v1.0.1)
               SSI rpi: lamd (Module v7.0)
               SSI rpi: sysv (Module v7.0)
               SSI rpi: tcp (Module v7.0)
               SSI rpi: usysv (Module v7.0)
                SSI cr: blcr (Module v1.0.1)
    
    
     
    

  • Next message: Eric Roman: "Re: Error in exec"