Re: Restarting asynchronous handlers

From: Paul H. Hargrove (PHHargrove_at_lbl.gov)
Date: Mon Jul 01 2002 - 10:26:49 PDT


Eric Roman wrote:
> 
> What's the story for synchronous and asynchronous handlers at restart?  I
> keep forgetting the answer.
> 
> So we want to have the async. handler running concurrently w/ the
> application threads during checkpoint time.  When the async
> handler thread completes its work, it calls back in, then the
> synchronous handlers callback, then the context is dumped.
> 
> True?  So it's
> 
> 1: Async handler runs
> 2: Async handler completes and calls back into kernel
> 3: Application threads interrupted w/ signal
> 4: Synchronous handlers execute
> 5: Synchronous handlers complete checkpoint and call back into kernel
> 6: Context for this thread is written

Yes, that is they way I've implemented it so far.  One key thing to note
(in response to Jeff's note) is that during #1 and #2 the app threads is
still running and may aquire or release locks for which it condends with
the async handler thread.

> Now on restart:
> 
> 1: Context for this thread is read
> 2: Synchronous handlers resume execution
> 3: Synchronous handlers complete restart and call back into kernel
> 4: All threads (app + async) allowed to continue execution
> 
> I think what's described above is the correct thing to do.  But, we might
> allow 2, 3, and 4 to take place concurrently, or even in the reverse order.
> 
> Thoughts anyone?

My thought was to allow 2,3,4 to run concurrently (let all threads begin
execution as soon as their shared state is restored).  As long as any
locks held before the checkpoint are restored the threads will sort out
their own exclusions.  If the checkpoint did not require any locking
then I see no need to add additional serialization at restart time.

In response to Jeff's comments - note that in #4 the application thread
might initially be blocked waiting for the async thread to complete the
restart code and release some lock.

-Paul

> --
> Eric Roman                       Future Technologies Group
> 510-486-6420                     Lawrence Berkeley National Laboratory

-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
NERSC Future Technologies Group           Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-495-2998