checkpointing multiple processes []

From: Anton V. Uzunov (anton.uzunov_at_dsto_dot_defence_dot_gov.au)
Date: Thu Nov 02 2006 - 21:19:29 PST

  • Next message: Paul H. Hargrove: "Re: checkpointing multiple processes []"
    Hi, 
    
    I am currently testing BLCR in the hope of using it as our
    checkpoint/restore library, and I have encountered a problem with
    checkpointing multi-process applications. For example, BLCR has trouble
    (or perhaps I am not doing something correct?) checkpointing a simple C
    program which uses (a slightly modified version of) the "filecounting"
    example provided with BLCR:
    ...
    pid_t p = fork();
    if (p == 0)
      execlp( "filecounting", ... );
    waitpid( p, ... );
    ...
    (The slight modification in "filecounting" consitst of making it multi-threaded
    as per the other BLCR example, "pthread_counting"). 
    In such a case two PIDs are created, one for the parent and child
    processes respectively, and while both processes can be checkpointed
    using cr_checkpoint PID,  neither of them can be restored via
    cr_restart. Perhaps this has to do with BLCR not having implemented
    checkpointing of process groups? If this is the case, do you know
    (approximately) when this functionality will be implemented? Is there
    perhaps a (newer, not entirely stable) CVS snapshot that has (some of)
    this functionality? Or should I perhaps use the library hooks to
    implement multi-process checkpointing myself, if this has not already
    been implemented? I would appreciate any information on this. 
    
    Best regards, 
    Anton V. Uzunov
    
    -- 
    Anton V. Uzunov 
    Information Networks Division, ACC Group,
    Defence Science and Technology Organization,
    Edinburgh SA, Australia
    ph: (+061) (08) 8259-7598
    e-mail: Anton.Uzunov_at_dsto_dot_defence_dot_gov.au
    
    IMPORTANT: This e-mail remains the property of the Australian Defence
    Organisation and is subject to the jurisdiction of section 70 of the
    CRIMES ACT 1914. If you have received this e-mail in error, you are
    requested to contact the sender and delete the e-mail.
    

  • Next message: Paul H. Hargrove: "Re: checkpointing multiple processes []"