Re: Checkpoint failed: support missing from application

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Sep 23 2005 - 16:21:43 PDT

  • Next message: U.S.Politics Today: "U.S. Politics Today Special Notice / Healthcare Industry News"
    Adolfo,
       It is as simple as "blcr does not support static linked binaries". 
    Not only does cr_run not perform its magic with a statically linked 
    binary, but we've never really tested the case of libcr statically 
    linked into an application.  Since we don't generate a libcr.a by 
    default, however, I think linking explicitly with -static will either 
    fail or result in a binary that is dynamically linked to blcr - not sure.
    
    -Paul
    
    Adolfo J. Banchio wrote:
    > Paul,
    > 
    > thanks for your prompt reply.
    > 
    > The program was run using cr_run, and then I checked
    > at /proc/<PID>/maps and there is no line for
    > the blcr libraries. So this is the reason.
    > 
    > The only difference with other codes (compiled with
    > same compiler) is that this one (and another one
    > I recompiled for testing) is compiled with -static flag.
    > 
    > Is as simple as that no program compiled "statically" will
    > accept cr_run for checkpointing? In other words, for 
    > statically linked codes you have to include the libraries
    > at linking time. Is this true?
    > 
    > 
    > thanks for your help
    > 
    > 
    > best regards,
    > 
    > adolfo
    > 
    > 
    > 
    > 
    > 
    > On Fri, 2005-09-23 at 14:08, Paul H. Hargrove wrote:
    > 
    >>  Checkpointing with BLCR requires that a small stub library be linked 
    >>into an application.  The message you are seeing is the one generated 
    >>when a checkpoint request is issued for an application that does not 
    >>include this support.
    >>
    >>  A LAM/MPI built with BLCR support will automatically link in this 
    >>library into applications it compiles.  Other applications may do so 
    >>explicitly when they are built, or more typically via an LD_PRELOAD done 
    >>by the "cr_run" utility we provide.  For instance, "cr_run ./a.out" 
    >>would run a.out with the BLCR library loaded.
    >>
    >>  It is also possible that the application is correctly linked with the 
    >>library, but is somehow disabling the BLCR hook.  One can look for 
    >>"libcr.so" in /proc/<pid>/maps to determine if the process with the 
    >>given pid has the BLCR library loaded.  If it is loaded and you still 
    >>get the "support missing from application" messages, then we can discuss 
    >>how to determine the cause of the interference.
    >>
    >>-Paul
    >>
    >>Adolfo J. Banchio wrote:
    >>
    >>
    >>>Hello,
    >>>
    >>>first of all my excuses if this question was already answered
    >>>(in this case just point me to that answer), since I can not
    >>>get access to the search page of the archive.
    >>>
    >>>Now, the problem,
    >>>
    >>>I have a process running (started with cr_run)
    >>>
    >>>which gives this error message when checkpointed:
    >>>
    >>>   "Checkpoint failed: support missing from application"
    >>>
    >>>and the exit status of cr_checkpoint is 52.
    >>>
    >>>What could be the reason for this?
    >>>
    >>>By the way, I have BLCR working with SGE, and besides for this
    >>>user, it is working Very good for process migration.
    >>>
    >>>best regards,
    >>>
    >>>adolfo
    >>>
    >>>
    >>> 
    >>>
    > 
    > 
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: U.S.Politics Today: "U.S. Politics Today Special Notice / Healthcare Industry News"