Re: question about "cr_save_mmaps_data" function

From: Balazs Gerofi (bgerofi_at_il.is.s.u-tokyo.ac.jp)
Date: Thu Mar 25 2010 - 23:08:41 PDT

  • Next message: Balazs Gerofi: "Re: question about "cr_save_mmaps_data" function"
    Hi Tao,
    
    if you go to cr_dump_self() you will see a call to cr_do_dump() after the
    leader
    thread is chosen. cr_do_dump() calls cr_do_vmadump() which calls a couple
    of vmadump functions.
    cr_save_mmaps_maps() is the one where non-shared mappings are dumped.
    
    I recommend you to use cscope or any other source code tagging package,
    so that you can easily follow the call stack.
    
    Regards,
    Balazs
    
    On Fri, Mar 26, 2010 at 2:34 PM, Tao Ke <cartoon.ke_at_gmail_dot_com> wrote:
    
    > Thank you so much for your patient and detailed explanation. It is very
    > helpful to me.
    > I have tracked the call path from the beginning. And it seems to me that
    > the context of the process if saved inside cr_save_mmaps_data, and the
    > checkpoint looks end here. I am confused here about when vmadump4 is used.
    > From you explanation, vmadump can be used to handle a single thread. I found
    > that blcr module initialize vmadump module, but I can not find when the
    > vmadump is used else where. Could you please give me some hints about when
    > vmadump module is used?
    > Thank you again for you time.
    >
    > On Thu, Mar 25, 2010 at 10:58 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
    >
    >> TK,
    >>
    >> It is not my intent to be rude or condescending but I don't have the time
    >> to describe everything that takes place in a checkpoint.
    >> The simple answer is that "the whole story" is in the source code - which
    >> you have available to examine.
    >>
    >> You have correctly determined that a checkpoint begins with an ioctl()
    >> that invokes cr_dump_self(), and you should be able to trace the rest using
    >> the source code.  I have not memorized which functions call which others in
    >> what order, even though I wrote most of it.  To give you the "whole story" I
    >> would have to take the time to read through the sources and trace the calls.
    >>  Instead, I encourage you to read them.  Doing so is likely to give you a
    >> deeper understanding than if I were to try to do it for you.  If after that
    >> you have some specific questions about "how" or "why" things are done, I may
    >> be able to help.
    >>
    >> You may want to look at tools like "cflow" to build a call graph for you,
    >> though I cannot be certain they work well with Linux kernel code.
    >>
    >> I CAN summarize the distinction between the code in cr_module/ and
    >> vmadump4/, which appears to be a significant point of your question.   The
    >> vmadump code is a heavily modified version of software from the BProc
    >> project that predates BLCD (and comes from a different organization).  It
    >> was never able to deal with shared memory, files or multiple processes; nor
    >> does it have the callback mechanisms of BLCR.  So the BLCR project began
    >> with the intent of keeping the changes made to files in vmadump to a minimum
    >> and building the other functionality (e.g. shared memory, files and multiple
    >> processes) separately.  That is why you will find that vmadump handles
    >> "anonymous" pages and non-shared mappings, while the cr_save_mmaps code
    >> handles the shared mappings.
    >>
    >> I hope my answer helps you some, even if I can't provide the answer you
    >> may have been looking for.
    >> -Paul
    >>
    >>
    >> TK wrote:
    >>
    >>> Thanks.
    >>> But when a checkpoint request is issued with "cr_checkpoint" command, a
    >>>  ioctl request is made to /proc/checkpoint/ctrl. I suppose it will be  the
    >>> "CR_OP_HAND_CHKPT" request. Then "cr_dump_self"  will be called, and finally
    >>> cr_save_mmaps_data will be called, and the memory will be saved here. Am I
    >>> correct? If so, when is  the whole story of checkpoint? When the "vmadump"
    >>> module is used then ?
    >>>
    >>> Thank you very much.
    >>>
    >>> On 03/25/2010 07:20 PM, Paul H. Hargrove wrote:
    >>>
    >>>> TK,
    >>>>
    >>>> I am sorry I didn't get the chance to answer this one when you asked me
    >>>> directly 2 days ago - I am up against some deadlines right now.
    >>>>
    >>>> To answer your question:
    >>>> In the function you ask about we are dealing only with memory regions
    >>>> created by mmap() of a file.  Therefore all the "clean" pages already exist
    >>>> somewhere on disk in the file that has been mmap()ed.  This includes the
    >>>> executable file and shared libraries that were mmap()ed in prior to the
    >>>> start of main().  As with open files, BLCR makes the (optimistic) assumption
    >>>> that the file will still exist, unmodified, at the time of the restart.
    >>>>  However, one can ensure that even the "clean" pages will be stored with the
    >>>> checkpoint by passing --save-all.
    >>>>
    >>>> -Paul
    >>>>
    >>>> TK wrote:
    >>>>
    >>>>> Hi , all. I am trying to adding my own code into BLCR for some
    >>>>> experiments.
    >>>>> When I was reading the code of "cr_save_mmaps_data" function in
    >>>>> cr_module/cr_mmaps.c, I found the comment /* dump the dirty pages */ . I am
    >>>>> wondering you dump only the dirty pages only? It will not be enough info for
    >>>>> restart. Or the other pages are dumped else where? If so, where is it?
    >>>>> Thank you.
    >>>>>
    >>>>
    >>>>
    >>>>
    >>>
    >>
    >> --
    >> Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >> Future Technologies Group                 Tel: +1-510-495-2352
    >> HPC Research Department                   Fax: +1-510-486-6900
    >> Lawrence Berkeley National Laboratory
    >>
    >
    >
    

  • Next message: Balazs Gerofi: "Re: question about "cr_save_mmaps_data" function"