Checkpoint/Restart for Linux (update)

From: Eric Roman (eroman_at_lbl.gov)
Date: Tue Mar 26 2002 - 11:46:56 PST


All of you expressed some interest in checkpoint/restart on Linux.  Here's a
quick summary of what's going on.

Checkpoint/Restart web page is up
---------------------------------
The project now has a web page.
  http://www.nersc.gov/research/FTG/checkpoint/index.html

Requirements for Linux Checkpoint/Restart
-----------------------------------------
We've placed our requirements document online.  We'd like to get some feedback
from users, library developers, and kernel developers.  Please have a look!
  http://www.nersc.gov/research/FTG/checkpoint/LBNL-49659.pdf

Checkpoint/Restart for MPI
--------------------------
We've started working with Professor Andrew Lumsdaine and the LAM crew to
add a checkpoint/restart capability to LAM.  (LAM is a popular implementation
of MPI.)  This work will take place during summer 2002.

Checkpoint/Restart mailing list is now available
------------------------------------------------
We've established a mailing list for checkpoint/restart development.
An archive of the list are available on our web page.  To subscribe, send
a message to majordomo_at_lbl_dot_gov, with the words 
  subscribe checkpoint your-email-address
somewhere in the message body.

Current Work
------------
We are looking at the CRAK implementation of checkpoint/restart, and bproc's
vmadump (meant for process migration, but can do checkpoint/restart).
This work will lead to a technical report describing the work done in
checkpoint/restart for Linux to date.

Our kernel work is making good progress.  We're establishing entry points for
checkpoint/restart in the kernel and user processes, designing a format for
context files, and looking at our testing environment.  In a month or two,
we expect to be able to checkpoint simple processes.

-- 
Eric Roman  <eroman_at_lbl_dot_gov>     Future Technologies Group
510-486-6420                     Lawrence Berkeley National Laboratory