kernel tracing

jcduell_at_lbl.gov
Date: Wed Feb 19 2003 - 13:43:45 PST


At long last, I've added kernel tracing to our checkpoint module.  The
most noticeable immeditate effect is that our module no longer
verbigerates incessantly into the system log everytime you do a
checkpoint or restart.  Instead, you have to enable its verbigerations
(via './configure --enable-kernel-tracing').  

The system now has 3 different types of ways of printing a
message: try to use the right one for new messages (I've changed all the
existing printk's to use whichever seemed appropriate):

1) Messages that should always be printed.  These are mainly messages
   that indicate an internal logic error, or something equally horrible,
   has happended.  There is now a set of CR_ERR/CR_WARN/CR_INFO macros
   for these.
2) Permanent statements in the code that print at some event that you
   might want to trace.  There are now a batch of different tracing
   event types, each with its own macro.  So I've got
   CR_KTRACE_FUNC_ENTRY/EXIT, which you can turn on if you want to see a
   tracing message every time a function is entered/exited (assuming
   you've added a tracing message to each function: not all functions
   have them right now).  This event is not one of the ones that is on
   by default.  The events that seemed to merit being on by default were
   "high-level" events ("phase 2 entered", etc.), bad parameter or
   system limit warnings, and "unexpected" events ("can't restore PID").
3) As a special case, I've created a CR_KTRACE_DEBUG() macro that is
   intended to be used only during debugging, i.e., you shouldn't check
   in code that has the macro still in it.  Just use it to find your own
   immediate bug.  This one is also on by default.

All the macros are like printf in that you can pass them a format string
and parameters.  You don't need to pass in a string, though, if there's
no need to (like for the function entry/exit macros).  The function
name, file, line number, and pid are all printed as part of each macro.

I may have gone into overkill mode in the number of different events I
came up with, but what the hell.  It shouldn't do any harm.

I'm trying to make sure I actually spend 1/2 my time this semester on
checkpoint/restart.  I figure the next things I ought to work on are

    1) documentation: we should talk about what we want done, and what
       format to use
    2) Getting UML to work with VMADump.  I think it would save us a
       lot of development time going forward if we could use gdb...
    3) Getting VMADump to not buffer pages as they are written.  This
       isn't as big a priority as getting file handles to work (which I
       assume Eric is going to do), but it's close.  People will want
       jobs that take up more than 1/2 the RAM on a machine to
       checkpoint in a reasonable amount of time...

Perhaps we ought to have one of our famous "checkpoint club" meetings to
set up a timeline for our development and the 1st release...

-- 
Jason Duell             Future Technologies Group
<jcduell_at_lbl_dot_gov>       High Performance Computing Research Dept.
Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory