BLCR 0.5.0_b5 (beta) release now available

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Feb 22 2007 - 16:50:41 PST

  • Next message: Rajagopal Natarajan: "Asynchronous checkpointing support in BLCR"
    A fifth beta release of BLCR 0.5.0 is now available at
    http://mantis.lbl.gov/blcr-dist
    
    Here are the changes since last week's beta4:
     - Added support for open()s of /dev/{null,zero,full,random,urandom}
     - Fix additional bugs found during testing:
       + Bug 1933: crash restoring dup of ignored fd (socket or chrdev)
       + MAP_SHARED mmap()ed regions would become MAP_PRIVATE upon restart
       + Checkpointing a process tree failed for multi-threaded processes
       + Certain failed restart cases would leave unkillable processes
    
    Below is the full NEWS entry, relative to the Nov. 2005 0.4.2 release.
    
    This will become 0.5.0 in about 7 days, unless significant new bugs are
    reported.
    
    -Paul
    
    PS
    You are receiving this either because you are on the checkpoint_at_lbl_dot_gov
    list, or because you've recently sent email to the list (or me directly)
    asking about BLCR status.
    
    0.5.0_b5
    --------
    February 22, 2007
    Functionality and expanded-support release.
     - Expanded kernel coverage
       + 2.6.0 through 2.6.19 for x86 and x86_64
       + 2.4.0 through 2.4.34 for x86 only
     - Multi-process support (related processes and associated pipes)
       + See BLCR_Users_Guide.html and the cr_checkpoint man page
     - Support for 32-bit apps on 64-bit kernels
       + See "--enable-multilib" in BLCR_Admin_Guide.html
     - Support for directories opened with opendir()
     - Support for open()s of /dev/{null,zero,full,random,urandom}
     - Support for checkpoints on Luster file systems
       + Contributed by Dean Luick <luick_at_cray_dot_com>
     - Support for building static libcr
       + Contributed by Dean Luick <luick_at_cray_dot_com>
     - Fixes to many distclean problems
       + Issues identified by Dean Luick <luick_at_cray_dot_com>
     - I/O aggregation for improved performance
       + Contributed by Qi Gao <[email protected]>
     - Additional examples and test cases
     - API addition: cr_get_restart_info()
     - "Retool" of configure code for ease of addition/maintenance
     - Numerous bug fixes, including:
       + Bug 1396: SIGPIPE when restarting w/ stdin/out from/to a pipe
       + Bug 1640: context files > 2GB require O_LARGEFILE
       + Bug 1662: context files open R/W leads to restart failure
       + Bug 1669: checkpoint to a socket fails
       + Bug 1807: unrecognized warning suppression flag passed to gcc
       + Bug 1854: libcr link failure w/ stack-protection-enabled gcc
       + Bug 1925: link failure w/ pthread_atfork() on some glibc versions
       + Bug 1933: crash restoring dup of ignored fd (socket or chrdev)
       + Incorrect treatment of certain anonymous mmap() cases
       + MAP_SHARED mmap()ed regions would become MAP_PRIVATE upon restart
         * NOTE: We still fail to restore any sharing among processes
           when using MAP_ANONYMOUS or when mapping an unlinked file.
           However, children fork()ed after a restart will now correctly
           share with their parent.
           FIXING THE LOST SHARING IS A HIGH-PRIORITY ITEM FOR 0.6.0
       + Wrong parent for restored orphans (children of init)
       + dup()ed file descriptors always restored together
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Rajagopal Natarajan: "Asynchronous checkpointing support in BLCR"