Re: Questions on BLCR..

From: Ladislav Subr (subr-blcr_at_sirrah.troja.mff.cuni.cz)
Date: Fri Jan 14 2005 - 00:23:48 PST

  • Next message: Paolo Victor: "Question about the BLCR installation requirements"
    Dear Paul,
    
    sorry, I haven't read your mail to Tarun carefully and thought that you 
    already have passed those single thread tests on Opteron... But I'm still 
    ready to help with testing on Opterons, let me know when the time comes :-)
    
    All the best
    
    	Ladislav
    
    > Ladislav,
    >
    > As I described to Tarun, the present version is unstable on IA32 and has
    > not been tried at all on Opteron.  When things progress a little more,
    > I'd be happy to send you something that is stable on IA32.  I'd be very
    > pleased if you could then help by testing on an Opteron, as I don't yet
    > have access to one where I have root access to load the blcr kernel
    > modules.
    >
    > -Paul
    >
    > Ladislav Subr wrote:
    > > Dear Paul,
    > >
    > > I'm interested in the 2.6 + Opteron support for BLCR. My aim is to use it
    > > as a migration & backup tool on an Opteron cluster. Is it possible to get
    > > your current version? I'm just about to start testing my wrappers and it
    > > would be helpful if I could do that directly on the target architecture.
    > >
    > > Best regards
    > >
    > > 	Ladislav
    > >
    > >>Tarun,
    > >>   I am still working on Linux 2.6 and Opteron support.  I had hope to
    > >>be done w/ 2.6 by Jan 1, but am running behind.  At this point blcr
    > >>passes the single threaded tests on an Athlon running SuSE Linux 9.2 (a
    > >>2.6.8 kernel), but gets a kernel Oops on the multi-threaded tests.  I
    > >>believe that there is an uninitialized pointer or a similar problem in
    > >>the kernel module, which is proving difficult to track down.
    > >>
    > >>   I am afraid I don't have a very accurate estimate on session or
    > >>process group support at this time.  I'd certainly like to see this
    > >>support done in time for an April release.
    > >>
    > >>   I am also sorry to tell you that currently there is no way to
    > >>checkpoint a process tree with the current BLCR.  The problem is that at
    > >>restart time there is presently no "resource naming" that would allow
    > >>identification of the shared file descriptors (such as the common
    > >>connection to stdin and stdout, or the pipes between processes).
    > >>
    > >>-Paul
    > >>
    > >>Tarun Agarwal wrote:
    > >>>Hi Paul,
    > >>>
    > >>>I had met you at SC2004. As I had said I am working on integrating
    > >>>checkpointing support using BLCR in a batch system here at UIUC. Saving
    > >>>sessions seems critical to using BLCR for checkpointing. You had put
    > >>> that in ongoing work at that time. I'd appreciate if you could tell me
    > >>> when can this support be expected?. Alternatively is there some way of
    > >>> checkpointing a process subtree (say a shell script and its forks) in
    > >>> the current version?
    > >>>
    > >>>Thanks
    > >>>Tarun
    > >>>
    > >>>On Wed, 3 Nov 2004, Paul H. Hargrove wrote:
    > >>>>I am hoping to have the 2.6 port for ia32 done by Jan 1.  I expect that
    > >>>>the Opteron-specifc support will be finished at about the same time, or
    > >>>>soon after that.  The speed with which we can get Opteron support
    > >>>>implemented will depend in part on availability of test platforms.
    > >>>>
    > >>>>-Paul
    > >>>>
    > >>>>Tarun Agarwal wrote:
    > >>>>>Thanks for the quick response. Is there some time frame that you have
    > >>>>> in mind for the 2.6 kernel compatible release of BLCR?
    > >>>>>
    > >>>>>Thanks
    > >>>>>Tarun
    > >>>>>
    > >>>>>On Tue, 2 Nov 2004, Paul H. Hargrove wrote:
    > >>>>>>BLCR does not support the Opteron at all at this time.
    > >>>>>>Support for Opteron will be for the 2.6 kernel only, and that work is
    > >>>>>>still in
    > >>>>>>progress.
    > >>>>>>
    > >>>>>>-Paul
    > >>>>>>
    > >>>>>>Tarun Agarwal wrote:
    > >>>>>>>Hi
    > >>>>>>>
    > >>>>>>>I am trying to use BLCR on Linux 2.4 running on Opteron machine.
    > >>>>>>> Does BLCR
    > >>>>>>>work on the AMD Opteron architecture running 2.4 kernel? I got the
    > >>>>>>>following error upon running make :
    > >>>>>>>
    > >>>>>>># make
    > >>>>>>>make  all-recursive
    > >>>>>>>make[1]: Entering directory `/home/kale/testmpi/tarun/blcr-0.2.3'
    > >>>>>>>Making all in man
    > >>>>>>>make[2]: Entering directory
    > >>>>>>> `/home/kale/testmpi/tarun/blcr-0.2.3/man' make[2]: Nothing to be
    > >>>>>>> done for `all'.
    > >>>>>>>make[2]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3/man'
    > >>>>>>>Making all in include
    > >>>>>>>make[2]: Entering directory
    > >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include'
    > >>>>>>>make[2]: Nothing to be done for `all'.
    > >>>>>>>make[2]: Leaving directory
    > >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/include' Making all in
    > >>>>>>> cr_module make[2]: Entering directory
    > >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module'
    > >>>>>>>if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -I../include
    > >>>>>>>-I../vmadump
    > >>>>>>>-I/usr/src/linux-2.4/include -D__KERNEL__ -DMODULE   -Wall
    > >>>>>>>-Wstrict-prototypes -O2 -fomit-frame-pointer  -g -O2 -MT
    > >>>>>>>cr_dump_self.o -MD
    > >>>>>>>-MP -MF ".deps/cr_dump_self.Tpo" \
    > >>>>>>>-c -o cr_dump_self.o `test -f 'cr_dump_self.c' || echo
    > >>>>>>>'./'`cr_dump_self.c; \
    > >>>>>>>then mv -f ".deps/cr_dump_self.Tpo" ".deps/cr_dump_self.Po"; \
    > >>>>>>>else rm -f ".deps/cr_dump_self.Tpo"; exit 1; \
    > >>>>>>>fi
    > >>>>>>>In file included from cr_dump_self.c:35:
    > >>>>>>>../vmadump/vmadump.h:84:2: #error VMADUMP does not support this
    > >>>>>>>architecture
    > >>>>>>>cr_dump_self.c: In function `cr_do_coredump':
    > >>>>>>>cr_dump_self.c:70: warning: implicit declaration of function
    > >>>>>>>`get_pt_regs'
    > >>>>>>>cr_dump_self.c:71: warning: passing arg 2 of pointer to function
    > >>>>>>> makes pointer from integer without a cast
    > >>>>>>>cr_dump_self.c: In function `cr_do_vmadump':
    > >>>>>>>cr_dump_self.c:1103: warning: passing arg 2 of
    > >>>>>>>`vmadump_freeze_threads' makes pointer from integer without a cast
    > >>>>>>>make[2]: *** [cr_dump_self.o] Error 1
    > >>>>>>>make[2]: Leaving directory
    > >>>>>>>`/home/kale/testmpi/tarun/blcr-0.2.3/cr_module'
    > >>>>>>>make[1]: *** [all-recursive] Error 1
    > >>>>>>>make[1]: Leaving directory `/home/kale/testmpi/tarun/blcr-0.2.3'
    > >>>>>>>make: *** [all] Error 2
    > >>>>>>>#
    > >>>>>>>
    > >>>>>>>Thnaks
    > >>>>>>>Tarun Agarwal
    > >>>>>>>Graduate Student, CS, UIUC.
    > >>>>>>
    > >>>>>>--
    > >>>>>>Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > >>>>>>Future Technologies Group
    > >>>>>>HPC Research Department                   Tel: +1-510-495-2352
    > >>>>>>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    > >>>>
    > >>>>--
    > >>>>Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > >>>>Future Technologies Group
    > >>>>HPC Research Department                   Tel: +1-510-495-2352
    > >>>>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Paolo Victor: "Question about the BLCR installation requirements"