Re: problems with cr_checkpoint: ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP):Input/output error

From: José M. Martín (jmartin_at_onsager.ugr.es)
Date: Thu Feb 21 2008 - 00:51:53 PST

  • Next message: Paul H. Hargrove: "Re: problems with cr_checkpoint: ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP):Input/output error"
    I have done some aditional test.
    
    It  only fails on a volume mounted with GlusterFS, a distribuited FS. In local 
    drive, it works. So, it must be a issue with this FS. 
    
    There are no entries in /var/log/messages and dmesg about the error.
    
    Thanks,
    
    José
    
    
    
    
    El Wednesday 20 February 2008 18:07:30 Paul H. Hargrove escribió:
    > José,
    >
    >   Sorry the error reporting isn't very clear.  That is one of the weaker
    > parts of BLCR right now.
    >   Since the testsuite passes, the most likely reason for the message you
    > see is an actual I/O failure when trying to write out the checkpoint
    > context file for your application.  The BLCR code will map (nearly) all
    > failed write() calls to EIO, even if the actual cause was an
    > out-of-space or over-quota error.
    >   You might find some useful information in /var/log/messages, or via
    > dmesg, about what BLCR was doing at the time of the error.  If you can
    > send us those messages, we may be able to narrow down what the problem is.
    >
    > -Paul
    >
    > P.S.
    > I will ensure the next release of BLCR produces a less confusing error
    > message, such as "cr_checkpoint: checkpoint failed: Input/output
    > error".  There really should be no reference to the internal ioctl() call.
    >
    > José M. Martín wrote:
    > > Hello,
    > >
    > > first, thanks for this project.
    > >
    > > I tried to set up blcr, but I have a problem. When I lunch a program and
    > > I do the checkpoint, I get the following error:
    > > ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP): Input/output error
    > >
    > > I have tried with kernels 2.6.20 (vanilla) and 2.6.18.8-0.8 (opensuse
    > > 10.2 default) on a node. On both, I get the same error.
    > > Nevertheless, on other node with opensuse 10.2 and kernel 2.6.23.1, it
    > > runs without problem.
    > >
    > > I have passed the testsuite:
    > > ======================
    > > All 34 tests passed
    > > (1 tests were not run)
    > > ======================
    > >
    > > No hugetlbfs mount point found (test skipped)
    > > SKIP: hugetlbfs.ct
    > >
    > > I can load the blcr modules without problem, execute binaries, link
    > > libraries,...
    > >
    > > I'm using version 0.6.4
    > > Nodes are x86 (Pentium 4)
    > >
    > > Any help will be apreciated.
    > >
    > > Thanks in advance
    

  • Next message: Paul H. Hargrove: "Re: problems with cr_checkpoint: ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP):Input/output error"