Re: /proc/PID/exe not restored on restart in 0.8.2

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Aug 26 2009 - 13:11:44 PDT

  • Next message: Paul H. Hargrove: "ANNC: Scheduled downtime for BLCR Bugzilla"
    If 0.8.1 + 2.6.18-53 works, I'd be interested in having you verify that 
    0.8.2 works on that kernel too.
    That would help to confirm that this is not dependent on BLCR version.
    
    -Paul
    
    Josh Hursey wrote:
    > I posted the Bug #2620:
    >   http://upc-bugs.lbl.gov/bugzilla/show_bug.cgi?id=2620
    >
    > As another data point, I have access to another machine running BLCR 
    > 0.8.1 and Linux kernel 2.6.18-53, and it is running correctly in this 
    > regard.
    >
    > I will try to downgrade the kernel and see if that helps.
    >
    > Thanks,
    > Josh
    >
    > On Aug 26, 2009, at 3:32 PM, Paul H. Hargrove wrote:
    >
    >> Josh,
    >>
    >> The error you see building 0.8.1 is because it does not support a 
    >> 2.6.29.6 kernel (supported 2.6.29, but the put_fs_struct change took 
    >> place somewhere in the 2.6.26.X series).
    >>
    >> I have tried 0.8.1 and 0.8.2 on 2.6.29 and 2.6.30 kernels and all 
    >> three valid combinations show the invalid /proc/pid/exe link.
    >>
    >> The problem does not appear to be dependent on BLCR version, and a 
    >> quick look at kernel sources suggest the problem may originate in a 
    >> kernel change between 2.6.25 and 2.6.26.  So, I suspect that if you 
    >> use a kernel 2.6.25 or older the symlink will be correct.
    >>
    >> If you have a moment, please enter a bug report for this.  Based on 
    >> the kernel change I noticed, this is probably less than a 1-day job 
    >> to fix and test, but I don't know when I'll be able to start.
    >>
    >> -Paul
    >>
    >> Josh Hursey wrote:
    >>> I have a Linux box running Fedora 11, and BLCR 0.8.2
    >>> ----------
    >>> shell$ uname -a
    >>> Linux cloud9 2.6.29.6-217.2.8.fc11.i586 #1 SMP Sat Aug 15 00:44:39 
    >>> EDT 2009 i686 i686 i386 GNU/Linux
    >>> ----------
    >>>
    >>> I am finding that the /proc/PID/exe link is not restored on 
    >>> cr_restart (I used the counting example in the distribution). It is 
    >>> valid when running normally, but after restart the link is invalid 
    >>> (not pointing to anything).
    >>>
    >>> I believe that 0.8.1 was working correctly in this regard, but I 
    >>> cannot verify on this machine at the moment (build error 
    >>> cr_dest_file.c:189: error: implicit declaration of function 
    >>> �put_fs_struct�).
    >>>
    >>> Have others seen this problem? I can send along more info if that 
    >>> might help.
    >>>
    >>> -- Josh
    >>>
    >>>
    >>
    >>
    >> -- 
    >> Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >> Future Technologies Group                 Tel: +1-510-495-2352
    >> HPC Research Department                   Fax: +1-510-486-6900
    >> Lawrence Berkeley National Laboratory
    >
    >
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Paul H. Hargrove: "ANNC: Scheduled downtime for BLCR Bugzilla"