Re: Error while using cr_checkpoint on ARM

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Aug 06 2008 - 10:28:01 PDT

  • Next message: Manish Dwivedi: "Re: Error while using cr_checkpoint on ARM"
    Manish,
    
      I am sorry to hear that you are having problems.  From the information 
    you provide below, it is hard to say what the problem is, other than to 
    guess that your ARM system is low on memory.
      I am aware of a kernel-side memory leak in blcr-0.7.2, which should be 
    fixed in the 0.7.3 release expected later this week or early next week.  
    So, I'd like to know if the failure you describe happens on the very 
    first use of cr_checkpoint, or does it happen after BLCR has been used 
    several times (for instance by running "make check")?  If it works for a 
    while and then begins to fail, I'd suspect the known memory leak and 
    suggest that you wait for blcr-0.7.3.
      If you are seeing failure on the very first attempt to use blcr, then 
    I suggest that you rebuild blcr with debugging enabled and send me the 
    information dumped to the system logs (run dmesg or see 
    /var/log/messages to find the logs).  To do this, you'll need to start 
    at the beginning of the configure/make/install process and pass the 
    "--enable-debug" option to configure, and then proceed with the rest of 
    the build/install process.  Be sure to "make insmod" (or manually rmmod 
    the old modules and insmod/modprobe the new ones); otherwise the kernel 
    modules from your previous (non-debug) build may still be running.  With 
    the new kernel modules loaded, you should retry your failing command and 
    then look for messages with "blcr: " in them in the system logs.
    
      I also should tell you that there is an ARM-specific mailing list 
    (very low volume) for BLCR that may help you reach other ARM users.  You 
    can find list info and subscribe (required to post) at 
    https://hpcrdm.lbl.gov/mailman/listinfo/blcr-arm
    
    -Paul
    
    Manish Dwivedi wrote:
    > Hi All,
    >
    > I am trying to use BLCR for ARM. But when I am trying to use 
    > cr_checkpoint with a hello.c program it is giving me an error as below:
    >
    > cr_checkpoint --term <pid> (command run)
    > Checkpoint failed: Cannot allocate memory
    >
    > I have compiled hello.c in the same kernel as mentioned in the release 
    > notes, I am using blcr-0.7.2.tar.gz for this.
    >
    > Could anyone help me out resolving this issue so that I can test it. 
    > It works fine for me on a X86 machine.
    >
    > Regards,
    > Manish
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Manish Dwivedi: "Re: Error while using cr_checkpoint on ARM"