Problems with BLCR?

From: Jeff Squyres (
Date: Mon Jul 25 2005 - 06:24:05 PDT

  • Next message: Pradeep Padala: "Re: Problems with BLCR?"
    A user was having problems with LAM + BLCR, so I got a guest account on 
    his cluster and gave it a whirl.  With my own build of LAM/MPI, I'm 
    able to checkpoint just fine (i.e., I get N+1 checkpoint files).  But 
    when I try to restart, I get the following error:
    [jeff@linf1 ~]$ cr_restart context.4037
    cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
    cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
    cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
    cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
    What does this mean?
    I had checkpointed a simple "hello world" MPI application (4 MPI 
    processes) on a single node.
    The user has already been in contact with Paul -- from his initial post 
    on the LAM list 
    "P.S. I am using a patched version of blcr to make it work on FC4. The
    patch was given to me by Paul Hargrove."
    The specific version of BLCR in use is:
    [jeff@linf1 ~]$ cr_restart --version
    cr_restart version 0.4.pre1_snapshot_2005_06_27
    Sidenote: I notice that cr_checkpoint has a "--version" switch, but it 
    is not listed in "cr_checkpoint --help" (which was somewhat confusing). 
      Ditto for cr_run.
    {+} Jeff Squyres
    {+} The Open MPI Project

  • Next message: Pradeep Padala: "Re: Problems with BLCR?"