program segfault after restart

From: Hongjia Cao (hjcao_at_nudt_dot_edu.cn)
Date: Mon Feb 23 2009 - 01:58:39 PST

  • Next message: Alexandre Strube: "What does this error message mean?"
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1
    
    I encountered a problem about BLCR 0.8.0.
    
    I run the NPB serial benchmarks on severl compute nodes of our cluster
    and make checkpoints of them. The checkpoint process is OK and the
    programs can be restarted from the context files from the same node
    where it is checkpointed. But if I try to restart the program from
    another node, which has the same architecture(x86_86), kernel(Linux
    2.6.28-8.1.8-el5), and executable(shared NFS directory), the program
    will report a segmentation fault after running successfully to the end:
    
    ...
     SP Benchmark Completed.
     Class           =                        B
     Size            =            102x 102x 102
     Iterations      =                      400
     Time in seconds =                   804.56
     Mop/s total     =                   441.25
     Operation type  =           floating point
     Verification    =               SUCCESSFUL
     Version         =                      3.3
     Compile date    =              19 Feb 2009
    
     Compile options:
        F77          = ifort
        FLINK        = $(F77)
        F_LIB        = (none)
        F_INC        = (none)
        FFLAGS       = -O
        FLINKFLAGS   = -O
        RAND         = (none)
    
    
     Please send all errors/feedbacks to:
    
     NPB Development Team
     npb_at_nas_dot_nasa_dot_gov
    
    
    Segmentation fault
    
    
    I wonder if anybody else has run into this problem before.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
    
    iD8DBQFJonNMVgdrmpB/quURAgxfAJ943N1rhRxRdx4idw2M/M7hrcDP1gCfZ4Jo
    JleMdwgccjETsAY0+A79LMY=
    =QbNq
    -----END PGP SIGNATURE-----
    

  • Next message: Alexandre Strube: "What does this error message mean?"