Re: Fw: BLCR checkpoint sizes

jcduell_at_lbl_dot_gov
Date: Fri Jul 23 2004 - 13:07:02 PDT

  • Next message: jcduell_at_lbl_dot_gov: "Re: blcr-question"
    On Fri, Jul 23, 2004 at 02:51:39PM -0400, Grigory Bronevetsky wrote:
    > I'm trying to evaluate BLCR by checkpoint some codes from the SPLASH-2 
    > benchmark suite but I am getting very strange results. In particular, when 
    > I checkpoint radix, which should have at least 800 MB of state, BLCR 
    > produces a checkpoint that is between 237MB and 369MB in size. This 
    > doesn't make sense to me. Are you implementing some kinds of optimizations 
    > like checkpoint compression or page-touch detection? I can't find any 
    > mention of this in the BLCR papers but given the small checkpoint sizes, I 
    > can't find another explanation.
    
    We do several optimizations:  first, we do not save program text, nor
    that of shared libraries.  This by itself may account for the
    difference.  We also do not save "zero pages", i.e. those that have
    never been touched and logically contain all 0s (calloc calls, large
    untouched static arrays, etc.).  We don't do any compression.
    
    In the future we'll support saving the program text, so that programs
    can be migrated onto nodes where the program/libraries may not be
    present.
    
    Are your checkpoints restarting?
    
    Cheers,
    
    -- 
    Jason Duell             Future Technologies Group
    <jcduell_at_lbl_dot_gov>       Computational Research Division
    Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory
    

  • Next message: jcduell_at_lbl_dot_gov: "Re: blcr-question"