Re: Questions on BLCR..

From: Ladislav Subr (subr-blcr_at_sirrah.troja.mff.cuni.cz)
Date: Thu Jan 13 2005 - 00:41:06 PST

  • Next message: Paul H. Hargrove: "Re: Questions on BLCR.."
    Dear Paul,
    
    I'm interested in the 2.6 + Opteron support for BLCR. My aim is to use it as a 
    migration & backup tool on an Opteron cluster. Is it possible to get your 
    current version? I'm just about to start testing my wrappers and it would be 
    helpful if I could do that directly on the target architecture.
    
    Best regards
    
    	Ladislav
    
    > Tarun,
    >    I am still working on Linux 2.6 and Opteron support.  I had hope to
    > be done w/ 2.6 by Jan 1, but am running behind.  At this point blcr
    > passes the single threaded tests on an Athlon running SuSE Linux 9.2 (a
    > 2.6.8 kernel), but gets a kernel Oops on the multi-threaded tests.  I
    > believe that there is an uninitialized pointer or a similar problem in
    > the kernel module, which is proving difficult to track down.
    >
    >    I am afraid I don't have a very accurate estimate on session or
    > process group support at this time.  I'd certainly like to see this
    > support done in time for an April release.
    >
    >    I am also sorry to tell you that currently there is no way to
    > checkpoint a process tree with the current BLCR.  The problem is that at
    > restart time there is presently no "resource naming" that would allow
    > identification of the shared file descriptors (such as the common
    > connection to stdin and stdout, or the pipes between processes).
    >
    > -Paul
    >
    > Tarun Agarwal wrote:
    > > Hi Paul,
    > >
    > > I had met you at SC2004. As I had said I am working on integrating
    > > checkpointing support using BLCR in a batch system here at UIUC. Saving
    > > sessions seems critical to using BLCR for checkpointing. You had put that
    > > in ongoing work at that time. I'd appreciate if you could tell me when
    > > can this support be expected?. Alternatively is there some way of
    > > checkpointing a process subtree (say a shell script and its forks) in the
    > > current version?
    > >
    > > Thanks
    > > Tarun
    > >
    > > On Wed, 3 Nov 2004, Paul H. Hargrove wrote:
    > >>I am hoping to have the 2.6 port for ia32 done by Jan 1.  I expect that
    > >> the Opteron-specifc support will be finished at about the same time, or
    > >> soon after that.  The speed with which we can get Opteron support
    > >> implemented will depend in part on availability of test platforms.
    > >>
    > >>-Paul
    > >>
    > >>Tarun Agarwal wrote:
    > >>>Thanks for the quick response. Is there some time frame that you have in
    > >>>mind for the 2.6 kernel compatible release of BLCR?
    > >>>
    > >>>Thanks
    > >>>Tarun
    > >>>
    > >>>On Tue, 2 Nov 2004, Paul H. Hargrove wrote:
    > >>>>BLCR does not support the Opteron at all at this time.
    > >>>>Support for Opteron will be for the 2.6 kernel only, and that work is
    > >>>>still in
    > >>>>progress.
    > >>>>
    > >>>>-Paul
    > >>>>
    > >>>>Tarun Agarwal wrote:
    > >>>>>Hi
    > >>>>>
    > >>>>>I am trying to use BLCR on Linux 2.4 running on Opteron machine. Does
    > >>>>>BLCR
    > >>>>>work on the AMD Opteron architecture running 2.4 kernel? I got the
    > >>>>>following error upon running make :
    > >>>>>
    > >>>>># make
    > >>>>>make  all-recursive
    > >>>>>make[1]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3'
    > >>>>>Making all in man
    > >>>>>make[2]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3/man'
    > >>>>>make[2]: Nothing to be done for `all'.
    > >>>>>make[2]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3/man'
    > >>>>>Making all in include
    > >>>>>make[2]: Entering directory
    > >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include'
    > >>>>>make[2]: Nothing to be done for `all'.
    > >>>>>make[2]: Leaving directory
    > >>>>> `/home/kale/testmpi/tarun/blcr-0.2.3/include' Making all in cr_module
    > >>>>>make[2]: Entering directory
    > >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module'
    > >>>>>if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -I../include
    > >>>>>-I../vmadump
    > >>>>>-I/usr/src/linux-2.4/include -D__KERNEL__ -DMODULE   -Wall
    > >>>>>-Wstrict-prototypes -O2 -fomit-frame-pointer  -g -O2 -MT
    > >>>>> cr_dump_self.o -MD
    > >>>>>-MP -MF ".deps/cr_dump_self.Tpo" \
    > >>>>> -c -o cr_dump_self.o `test -f 'cr_dump_self.c' || echo
    > >>>>>'./'`cr_dump_self.c; \
    > >>>>>then mv -f ".deps/cr_dump_self.Tpo" ".deps/cr_dump_self.Po"; \
    > >>>>>else rm -f ".deps/cr_dump_self.Tpo"; exit 1; \
    > >>>>>fi
    > >>>>>In file included from cr_dump_self.c:35:
    > >>>>>../vmadump/vmadump.h:84:2: #error VMADUMP does not support this
    > >>>>>architecture
    > >>>>>cr_dump_self.c: In function `cr_do_coredump':
    > >>>>>cr_dump_self.c:70: warning: implicit declaration of function
    > >>>>>`get_pt_regs'
    > >>>>>cr_dump_self.c:71: warning: passing arg 2 of pointer to function makes
    > >>>>>pointer from integer without a cast
    > >>>>>cr_dump_self.c: In function `cr_do_vmadump':
    > >>>>>cr_dump_self.c:1103: warning: passing arg 2 of
    > >>>>> `vmadump_freeze_threads' makes pointer from integer without a cast
    > >>>>>make[2]: *** [cr_dump_self.o] Error 1
    > >>>>>make[2]: Leaving directory
    > >>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module'
    > >>>>>make[1]: *** [all-recursive] Error 1
    > >>>>>make[1]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3'
    > >>>>>make: *** [all] Error 2
    > >>>>>#
    > >>>>>
    > >>>>>Thnaks
    > >>>>>Tarun Agarwal
    > >>>>>Graduate Student, CS, UIUC.
    > >>>>
    > >>>>--
    > >>>>Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > >>>>Future Technologies Group
    > >>>>HPC Research Department                   Tel: +1-510-495-2352
    > >>>>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    > >>
    > >>--
    > >>Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > >>Future Technologies Group
    > >>HPC Research Department                   Tel: +1-510-495-2352
    > >>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Paul H. Hargrove: "Re: Questions on BLCR.."