lam/mpi blcr problem

From: 任明明 (0110018_at_mail.nankai.edu.cn)
Date: Tue Mar 22 2005 - 01:27:58 PST

  • Next message: Paul H. Hargrove: "Re: lam/mpi blcr problem"
    I can not use blcr to checkpoint a MPI program. who can help me?
    
    I used the following command to configure blcr:
    
    /configure --prefix=/usr/local/blcr/ --with-linux=/usr/src/linux-2.4.20-8/
    --with-system-map=/boot/System.map-2.4.20-8
    
    and used the following command to configure the lam/mpi:
    
    /configure --with-threads=posix --with-rpi=crtcp --with-cr-blcr=/usr/local/blcr/
    --prefix=/usr/local/lam-7.1.1/ --with-rsh='ssh -x' 
    
    but when i use cr_checkpoint to deal with a MPI program, it doesn't generate
    the checkpoint context file for each process, only generate a context file for 
    the mpirun command, and when i use cr_restart to the uniq context, it says
    
    [rmingming@node01 lam]$ cr_restart context.5981
    mpirun (rpwait): Bad file descriptor
    [rmingming@node01 lam]$
    
    by the way, i followed the instuctin on this url:
    http://mantis.lbl.gov/blcr/doc/html/BLCR_Users_Guide.html
    
    the following is the laminfo output:
     
    [rmingming@node01 lam]$ laminfo -all
                 LAM/MPI: 7.1.1
                SSI boot: globus (SSI v1.0, API v1.1, Module v0.6)
                SSI boot: rsh (SSI v1.0, API v1.1, Module v1.1)
                SSI boot: slurm (SSI v1.0, API v1.1, Module v1.0)
                SSI boot: tm (SSI v1.0, API v1.1, Module v1.1)
                SSI coll: lam_basic (SSI v1.0, API v1.1, Module v7.1)
                SSI coll: shmem (SSI v1.0, API v1.1, Module v1.0)
                SSI coll: smp (SSI v1.0, API v1.1, Module v1.2)
                 SSI rpi: crtcp (SSI v1.0, API v1.1, Module v1.1)
                 SSI rpi: lamd (SSI v1.0, API v1.0, Module v7.1)
                 SSI rpi: sysv (SSI v1.0, API v1.0, Module v7.1)
                 SSI rpi: tcp (SSI v1.0, API v1.0, Module v7.1)
                 SSI rpi: usysv (SSI v1.0, API v1.0, Module v7.1)
                  SSI cr: blcr (SSI v1.0, API v1.0, Module v1.1)
                  SSI cr: self (SSI v1.0, API v1.0, Module v1.0)
                  Prefix: /usr/local/lam-7.1.1/
                  Bindir: /usr/local/lam-7.1.1//bin
                  Libdir: /usr/local/lam-7.1.1//lib
                  Incdir: /usr/local/lam-7.1.1//include
               Pkglibdir: /usr/local/lam-7.1.1//lib/lam
              Sysconfdir: /usr/local/lam-7.1.1//etc
            Architecture: i686-pc-linux-gnu
           Configured by: root
           Configured on: Tue Mar 22 14:21:29 CST 2005
          Configure host: node01
          Memory manager: ptmalloc2
              C bindings: yes
            C++ bindings: yes
        Fortran bindings: yes
              C compiler: gcc
             C char size: 1
             C bool size: 1
            C short size: 2
              C int size: 4
             C long size: 4
            C float size: 4
           C double size: 8
          C pointer size: 4
            C char align: 1
            C bool align: 1
             C int align: 4
           C float align: 4
          C double align: 4
            C++ compiler: g++
        Fortran compiler: g77
         Fortran symbols: double_underscore
       Fort integer size: 4
          Fort real size: 4
      Fort dbl prec size: 4
          Fort cplx size: 4
      Fort dbl cplx size: 4
      Fort integer align: 4
         Fort real align: 4
     Fort dbl prec align: 4
         Fort cplx align: 4
     Fort dbl cplx align: 4
             C profiling: yes
           C++ profiling: yes
       Fortran profiling: yes
          C++ exceptions: no
          Thread support: yes
           ROMIO support: yes
            IMPI support: no
           Debug support: no
            Purify clean: no
                SSI base: parameter "verbose" (default value: <none>)
                 SSI mpi: parameter "mpi_hostmap" (default value:
                          "/usr/local/lam-7.1.1//etc/lam-hostmap.txt")
                SSI base: parameter "base_module_path" (default value:
                          "/usr/local/lam-7.1.1//lib/lam")
                SSI boot: parameter "boot_verbose" (default value: <none>)
                SSI boot: parameter "boot" (default value: <none>)
                SSI boot: parameter "boot_base_promisc" (default value: "0")
                SSI boot: parameter "boot_base_window_size" (default value: "5")
                SSI boot: parameter "boot_globus_priority" (default value: "3")
                SSI boot: parameter "boot_rsh_username" (default value: <none>)
                SSI boot: parameter "boot_rsh_agent" (default value: "ssh -x")
                SSI boot: parameter "boot_rsh_no_n" (default value: "0")
                SSI boot: parameter "boot_rsh_no_profile" (default value: "0")
                SSI boot: parameter "boot_rsh_fast" (default value: "0")
                SSI boot: parameter "boot_rsh_ignore_stderr" (default value: "0")
                SSI boot: parameter "boot_rsh_priority" (default value: "10")
                SSI boot: parameter "boot_slurm_priority" (default value: "50")
                SSI boot: parameter "boot_tm_priority" (default value: "50")
                SSI boot: parameter "boot_tm_first" (default value: "-1")
                 SSI rpi: parameter "rpi_verbose" (default value: <none>)
                 SSI rpi: parameter "rpi" (default value: <none>)
                 SSI rpi: parameter "rpi_crtcp_priority" (default value: "75")
                 SSI rpi: parameter "rpi_crtcp_short" (default value: "65536")
                 SSI rpi: parameter "rpi_crtcp_sockbuf" (default value: "-1")
                 SSI rpi: parameter "rpi_lamd_priority" (default value: "20")
                 SSI rpi: parameter "rpi_sysv_pollyield" (default value: "1")
                 SSI rpi: parameter "rpi_sysv_poolsize" (default value:
                          "16777216")
                 SSI rpi: parameter "rpi_sysv_maxalloc" (default value:
                          "1048576")
                 SSI rpi: parameter "rpi_sysv_short" (default value: "8192")
                 SSI rpi: parameter "rpi_tcp_short" (default value: "65536")
                 SSI rpi: parameter "rpi_tcp_sockbuf" (default value: "-1")
                 SSI rpi: parameter "rpi_sysv_priority" (default value: "30")
                 SSI rpi: parameter "rpi_tcp_priority" (default value: "20")
                 SSI rpi: parameter "rpi_usysv_readlockpoll" (default value:
                          "10000")
                 SSI rpi: parameter "rpi_usysv_writelockpoll" (default value:
                          "10")
                 SSI rpi: parameter "rpi_usysv_pollyield" (default value: "1")
                 SSI rpi: parameter "rpi_usysv_poolsize" (default value:
                          "16777216")
                 SSI rpi: parameter "rpi_usysv_maxalloc" (default value:
                          "1048576")
                 SSI rpi: parameter "rpi_usysv_short" (default value: "8192")
                 SSI rpi: parameter "rpi_usysv_priority" (default value: "40")
                SSI coll: parameter "coll_verbose" (default value: <none>)
                SSI coll: parameter "coll_shmem" (default value: "0")
                  SSI cr: parameter "cr_verbose" (default value: <none>)
                  SSI cr: parameter "cr" (default value: <none>)
                  SSI cr: parameter "cr_blcr_priority" (default value: "50")
                  SSI cr: parameter "cr_self_priority" (default value: "25")
                  SSI cr: parameter "cr_self_do_restart" (default value: "0")
                  SSI cr: parameter "cr_self_prefix" (default value:
                          "lam_cr_self")
                  SSI cr: parameter "cr_self_checkpoint" (default value: <none>)
                  SSI cr: parameter "cr_self_continue" (default value: <none>)
                  SSI cr: parameter "cr_self_restart" (default value: <none>)
    

  • Next message: Paul H. Hargrove: "Re: lam/mpi blcr problem"