Re: blcr-question

jcduell_at_lbl_dot_gov
Date: Fri Jul 23 2004 - 13:14:16 PDT

  • Next message: jcduell_at_lbl_dot_gov: "Re: Fw: BLCR checkpoint sizes"
    On Tue, Jul 20, 2004 at 06:52:45PM +0200, Karel Semsch wrote:
    > My name is Karel Semsch and I am student of Czech Technical
    >  University. I am trying to get blcr
    >  working on 2-processor-machine by running these commands:
    >  mpirun -np 2 -ssi rpi crtcp -ssi cr blcr PROGRAM_NAME
    >  Then I did:
    >  cr_checkpoint MPIRUN_PID
    >  Till now everything worked fine, there were three context files
    >  in my home directory, but when I tried to do:
    >  cr_restart context.MPIRUN_PID
    >  then one node was working, but the other node not. The
    >  runtime of the copy of the program on second node was always 0:00.
    >  So I had to kill the whole application, because the other node did not start. Can you give me some advice, please?
    
    You need to run cr_restart as an mpi application, i.e. try
    
    
    >  mpirun -np 2 -ssi rpi crtcp -ssi cr blcr cr_restart context.MPIRUN_PID
    
    You may not need all of the '-ssi' arguments, but that may depend on
    your LAM installation/configuration.
    
    -- 
    Jason Duell             Future Technologies Group
    <jcduell_at_lbl_dot_gov>       Computational Research Division
    Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory
    

  • Next message: jcduell_at_lbl_dot_gov: "Re: Fw: BLCR checkpoint sizes"