Re: sparc implementation

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Sep 03 2008 - 13:58:40 PDT

  • Next message: Vincentius Robby: "Re: sparc implementation"
    I addressed the syscall question in a separate e-mail, and you indicate 
    the -I issue was resolved, however I need to address your atomic ops too.
    
    In my first e-mail reply to you on the subject a sparc port  (July 28, 
    2008) I stated:
    > In both the BLCR kernel and user-space code we make use of atomic 
    > counters for various purposes, including a reader-writer sort of lock 
    > in the user space code.  In all architectures we've supported so far 
    > we've had support for atomic operations on 32-bit integers.  However, 
    > on SPARC revisions prior to "SPARC V8+" (UltraSPARC) the only support 
    > for atomics is via spinlocks.  The spinlock approach can work in the 
    > kernel because one can block interrupts.  In user-space, however, the 
    > spinlock approach can lead to deadlock in the presence of signal 
    > handlers.  So, I believe that it is either impossible or impractical 
    > to port BLCR to chips older than UltraSPARC. (I am willing to be 
    > proven wrong, but am strongly discouraging you from trying for older 
    > cpus as I think success is unlikely) 
    
    The ldstub/stb is exactly the spinlock approach I cautioned against.  We 
    use these atomics in signal handlers and if a given thread receives a 
    signal between the ldstub and the stb, then the use of the atomics in 
    the signal handler will deadlock.  Worse, since there is ONE lock byte 
    for the entire library, use of atomics in signal handlers may deadlock 
    even if accessing a different atomic counter than the code it interrupted.
    
    Additionally, if we happen to link two versions of the library code 
    (e.g. via dlopen() and LD_PRELOAD) it is possible (I think) that two 
    distinct copies of the "lock" variable would exist.  In that case the 
    code in the two libraries would not be atomic with respect to each other.
    
    I don't think you should get hung up on the atomics YET.  If you can do 
    your development on UltraSPARC, then I recommend you use the CAS-based 
    code below.  If not, you can proceed with the ldstub-based code FOR NOW, 
    and we can try to deal LATER with the (rare) deadlocks by adding to BLCR 
    a kernel-level helper to perform CAS for you (in the kernel we can 
    eliminate the equivalent of signals by temporarily blocking interrupts).
    
    CR_INLINE int
    cri_cmp_swap(cri_atomic_t *p, unsigned int oldval, unsigned int newval)
    {
        __asm__ __volatile__ (
            /* if (*p == oldval) SWAP(*p,newval); else newval = *p; */
            "cas      [%3],%2,%0"
            : "+r"(newval), "=m"(*p)
            : "r"(oldval), "r"(p), "m"(*p) );
        return (int)(newval == oldval);
    }
    
    
    If you do keep your ldstub-based add-fetch, you can remove "oldval" from 
    it (the declaration and the "oldval = *p"), since they don't do 
    anything.  Other than that and the deadlock problem, your add-fetch 
    looks correct.
    
    Just to be 100% proper about this code I've included here:
    Signed-off-by: Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>
    
    -Paul
    
    Vincentius Robby wrote:
    > Hello Paul,
    >
    > This is my current code for compare_and_swap:
    > CR_INLINE unsigned int
    > cri_cmp_swap(cri_atomic_t *p, unsigned int oldval, unsigned int newval)
    > {
    >   register unsigned char ret;
    >   int tmp;
    >   static unsigned char lock;
    >   __asm__ __volatile__("1:      ldstub  [%1], %0\n\t"
    >                "        cmp     %0, 0\n\t"
    >                        "        bne     1b\n\t"
    >                        "         nop"
    >                        : "=&r" (tmp)
    >                        : "r" (&lock)
    >                        : "memory");
    >   if (*p != oldval)
    >     ret = 0;
    >   else {
    >     *p = newval;
    >     ret = 1;
    >   }
    >   __asm__ __volatile__("stb     %%g0, [%0]"
    >                        : /* no outputs */
    >                        : "r" (&lock)
    >                        : "memory");
    >
    >   return ret;
    > }
    >
    > The ldstub seems to be used for memory locks. The code that includes 
    > the cas instructions seems effective, although it may not be available 
    > for V8.
    [snip]
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Vincentius Robby: "Re: sparc implementation"