Re: Problems with --enable-restore-ids

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Feb 18 2009 - 13:07:06 PST

  • Next message: Ted Cabeen: "Re: Problems with --enable-restore-ids"
    Ted,
    
      You are right about the nscd cache file being opened as root (or other 
    "system" id).  The program acquires the file descriptor via fd passing 
    from a privileged daemon process. Since we can't safely reopen this file 
    as the user and we are equally unable to reproduce the descriptor 
    passing from the daemon, BLCR is incompatible with nscd (see FAQ:  
    http://mantis.lbl.gov/blcr/doc/html/FAQ.html#nscd ).  If you were to 
    perform the restart as the original user you would encounter this 
    problem regardless of --enable-restore-ids of not.  I am afraid the only 
    known solution is to disable nscd.
    
    -Paul
    
    Ted Cabeen wrote:
    > I'm having problems with 0.8.0 with --enable-restore-ids.  When I try 
    > to restart a checkpointed job, I get the following error:
    > - open('/var/cache/nscd/passwd', 0x0) failed: -13
    > - mmap failed: /var/cache/nscd/passwd
    > - thaw_threads returned error, aborting. -13
    > Restart failed: Permission denied
    >
    > If I recompile 0.8.0 without restore-ids, it doesn't have this error.  
    > I think that the problem may be that the nscd cache is opened on 
    > behalf of the program by libc as root, but when BLCR tries to restart 
    > the checkpointed program as the original user, it can't open the nscd 
    > cache.  Is there a way to fix this?
    >
    > --Ted
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Ted Cabeen: "Re: Problems with --enable-restore-ids"