Re: checkpoint and hsperfdata

From: Michael Brown (michael_brown_3_at_yahoo.com)
Date: Tue Jan 03 2006 - 13:31:02 PST

  • Next message: Ronny T. Lampert: "blcr 0.4.2 not compiling on 2.6.15"
    Thanks for the explanation.  Now that I understand
    what's happening, it's easy enough for me to copy the
    contents of the hsperfdata directory at this point in
    time.  There doesn't seem to be any machine-specific
    information in this directory.
    
    Mike
    
    
    --- "Paul H. Hargrove" <PHHargrove_at_lbl_dot_gov> wrote:
    
    > Mike,
    > Whatever /tmp/hsperfdata_user is, it is not
    > something internal to BLCR.
    > So, I can only assume it is created/used by the
    > Matlab code, perhaps
    > indirectly through some library linked into the
    > application. Not knowing
    > specifically what the files are I can't guarantee
    > that they can safely
    > be copied between hosts. You could see problems, for
    > instance, if the
    > files contain information like the IP address of the
    > original host, or
    > license keys tied to the MAC address of the original
    > host.
    > In the present version of BLCR, open files are dealt
    > with only "by
    > reference", and we must blindly assume that the
    > files and containing
    > directories still exist, with unmodified contents,
    > at restart. In the
    > future we will have the option to capture the
    > content of files as well.
    > We are looking at having some configuration or
    > heuristic to distinguish
    > file systems that are local (such as /tmp) from ones
    > that are shared
    > (such as an NFS-mounted /home) to decide when to
    > capture the file
    > content. I have no estimated date for such a
    > feature.
    > 
    > -Paul
    > 
    > Michael Brown wrote:
    > >I'm testing checkpoints for the 32bit linux 2.4
    > >kernel.  I'm using two hosts with identical
    > hardware
    > >and images.  I'm trying to make sure that I can
    > >restore checkpoints between hosts.  
    > >
    > >I noticed that although the basic counting example
    > can
    > >be checkpointed on host0 and started on host1, my
    > >custom Matlab code cannot.
    > >
    > >System messages suggested the restart failed
    > because
    > >the /tmp/hsperfdata_user directory didn't exist on
    > >host1.  After copying this directory between hosts,
    > >the restart worked properly.  
    > >
    > >I'm wondering if it is safe to do this before I put
    > it
    > >in widespread use.  Are there any other system
    > files
    > >that should be copied also?  What is the hsperfdata
    > >directory?  Can this information be stored in the
    > >checkpoint itself?
    > >
    > >Thanks,
    > >
    > >Mike
    > >
    > >
    > >		
    > >__________________________________________ 
    > >Yahoo! DSL � Something to write home about. 
    > >Just $16.99/mo. or less. 
    > >dsl.yahoo.com 
    > >
    > >  
    > 
    > 
    > -- 
    > Paul H. Hargrove                         
    > PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 
    > HPC Research Department                   Tel:
    > +1-510-495-2352
    > Lawrence Berkeley National Laboratory     Fax:
    > +1-510-486-6900
    > 
    > 
    
    
    
    	
    		
    __________________________________ 
    Yahoo! for Good - Make a difference this year. 
    http://brand.yahoo.com/cybergivingweek2005/
    

  • Next message: Ronny T. Lampert: "blcr 0.4.2 not compiling on 2.6.15"