irix-657m-src/irix/kern/os/ksync/README


Locks and Synchronization
=========================

A few terms:

The terms "locks" and "mutual exclusion" are used pretty
interchangeable in this document.  A mutual exclusion lock is used to
protect data -- to provide a "lock" on or provide exclusive right to
some data.  A routine which has acquired a mutual exclusion lock "owns"
it or "holds" the lock.

The term "synchronization" here usually refers to the act of waiting
for an event (state change), not to mutual exclusion.  The thread of
execution that uses a synchronizing object to wait for an event does
not "own" or "hold" that object, though it may be queued on the object.

The term "thread of execution" means a register set and program counter.
The thread could represent a user process, a system daemon, anonymous
system services or an interrupt handler.


Attribution, blame and kudos:

A lot of these routines have mutated over the years, often for the
better, mostly for the bigger.

Engineers with special need have sometimes added clever new locking
primitives, especially (lately) in the atomic_ops category.  Most of
these are very useful, but some are redundant or not always generalized
as well as they might be (I have sinned thus, too), or unambiguous
w.r.t. return values.

Please at least check with the Lock Police (Mike Thompson, a.k.a.
yohn@engr) before adding new locking interfaces.  Maybe what you
want to do can be done already.

We're available to answer questions or take suggestions about this
document.  Note that Bob English has made some major changes to the
mutex, sync-variable and semaphore code in ficus, primarily dealing
with priority inheritance and scheduling decisions -- he is a good
contact for questions in these areas.


-mike


Locks and Synchronization Packages
==================================

There are debugging versions of most the lock and synchronization
primitives which provide sanity checking and metering.  They are split
into two packages, one of which implements the sleeping locks and
synchronization routines, and one of which implements the spinning
locks.

When the metered packages are installed, one can get information about
specific lock and synchronization objects through the idbg(1M) utility,
or with the kernel debugger.

The metered spinning locks package is "dhardlocks", the non-metered
version is "hardlocks".  Choose one of them in your system.gen file
(but note that neither of these exist for single-processor systems,
for reasons that should become apparent).

The metered sleeping locks/synchronizers package is "ksync_metered";
the non-metered version is "ksync".  Include one, exclude the other.


Spinning and Sleeping Locks
===========================

There are a variety of lock types and primitives supported in the IRIX
kernel, with varying semantics, but they can be divided into two basic
classes: locks which can sleep, and those which are guaranteed not to
sleep.

Sleeping locks, as the name implies, will put the calling thread of
execution to sleep if the lock isn't immediately available; spinning
locks will spin in the processor until the lock is available.

How do you choose which type of lock to use?

First, if the lock is to be (or might be) acquired by an interrupt
handler, there is (currently) no choice:  the lock must be a spinlock
-- interrupts currently execute without the necessary context which is
needed to be put to sleep.  (N.B. This rule may change in a future
release.)

If the lock can only be acquired only by "top-level" kernel routines
(never by interrupt handlers), a sleeping lock is usually the correct
choice.  The exception is when the lock is only and always held for a
very short duration, such as to bump a counter and set a couple of flag
bits.

Note that (some) interrupts are disabled as a side-effect of holding a
spinlock, for the following reason:

Say a "top-level" kernel routine wants to use a spinlock to coordinate
with an interrupt service routine.  If top-level routine successfully
acquired the spinlock without also disabling the interrupt with which
it is coordinating, it would be possible for that interrupt to happen
on the same CPU.  The interrupt routine could also try to acquire the
spinlock -- and it would spin forever since it has preempted and pinned
down the top-level routine which currently holds the lock.  You can
construct a similar scenario with two interrupt handlers at the same
interrupt level, if one of the handlers does not raise spl to block out
the other.


Spinning Locks
==============

There are, basically, two varieties of mutual-exclusion spinlocks,
generic spinlocks and bitlocks.  The generic spinlock uses the lock_t
data type, while bitlocks are built on unsigned integers or longs.

Spinlocks and bitlocks have similar interfaces, but while the lock_t
must be considered an "opaque" data type, the bitlock routines use only
the lock bit, specified in the lock and unlocks calls, to acquire and
release the bitlock -- the other bits in the lock-word are available to
be used by the client.

	void	spinlock_init(lock_t *lp, , char *name);
	void	init_spinlock(lock_t *lp, , char *name, long sequence);
	void	init_bitlock(uint_t *lp, uint_t lockbit,
			     char *name, long sequence);
	void	init_64bitlock(__uint64_t *lp, __uint64_t lockbit,
			       char *name, long sequence);

	void	spinlock_destroy(lock_t *lp);
	void	destroy_bitlock(uint *lp);
	void	destroy_64bitlock(__uint64_t *lp);

The 'name' and 'sequence' arguments above are only used when debugging
spinlocks are installed.  The name is stored in the lock debug structure;
the 'sequence', if not METER_NO_SEQ, is used to construct a numeric,
ascii suffix:

	init_spinlock(&lock, "MyLock", 12);

will generate a null-terminated ascii debug tag of "MyLock00012"
(number of in-filled 0's subject to change);

	init_spinlock(&lock, "MyLock", METER_NO_SEQ);

will tag the lock with "MyLock".

	int s = mutex_spinlock(lock_t *lp);
	int s = mutex_bitlock(uint_t *lp, uint_t bit);
	int s = mutex_64bitlock(__uint64_t *lp, __uint64_t bit);

	int s = mutex_spintrylock(lock_t *lp);
	int s = mutex_bittrylock(uint_t *lp, uint_t bit);
	int s = mutex_64bittrylock(__uint64_t *lp, __uint64_t bit);

	void mutex_spinunlock(lock_t *lp, int s);
	void mutex_bitunlock(uint_t *lp, uint_t bit, int s);
	void mutex_64bitunlock(__uint64_t *lp, __uint64_t bit, int s);

The lock routines return a "cookie" indicating the interrupt level
before the locking operation took place.  This cookie should be
passed to the corresponding unlock routine, which restores that
interrupt level.

Note that the the interrupt priority level is set to splhi when the
spin locks are acquired; if a priority other than splhi is required,
alternate interfaces exist:

	int s = mutex_spinlock_spl(lock_t *lp, splfunc_t func);
	int s = mutex_bitlock_spl(uint_t *lp, uint_t bit, splfunc_t func);

If a priority less than splhi is required, but the spinlock is to be
held for a very short duration, it is suggested that the non-_spl lock
routines (those that default to splhi) be used anyway -- the acquire
routines are faster, and blocking out higher interrupts for a very
short time is probably more ``cost effective'', in that there's less
chance that the caller will get interrupted while holding the lock.


Debugger Hooks
==============

From idbg(), all these locks can be examined as "lock ADDR" where
ADDR is the address of the lock.  The debugger knows the type of
lock based on its address and the information stored in its debug
structure when the lock was initialized.  (It is mostly for this
reason that the destroy functions should be called if a spinning
lock is being discarded -- the spinlocks metering code needs to
deallocate its metering information.)


Nested Spinlocks
================

There are two actions taken by the standard spinning lock primitives:

1) They raise the interrupt level on the current processor so that an
interrupt handler which might want to acquire the same spinning lock
isn't allowed to trigger on that cpu (remember that this could cause
a deadlock).

2) They "acquire" the lock by setting bits in memory -- after possibly
waiting for the bits to clear (using load-link/store-conditional
assembler instructions).

Note that on single-processor systems only the first task needs to be
(and is) performed.  This is true because not only does raising the
interrupt level stop interrupt handlers from running, but the system
scheduler won't reschedule another "top-level" routine on that
processor while the interrupt level is raised.

Note also that in the case of nested spinlock calls, the first action
is gratuitous -- if the interrupt level is already sufficiently high
because a spinlock is already being held, there is no reason to raise
it again.

For this reason, "nested bitlock" routines are available which perform
only action 2) above (and, as one might suspect, do nothing on single-
processor systems).  They may only be called when another spinlock is
already held, either at the default IPL level (splhi), or above.

	extern void	nested_spinlock(lock_t *lp);
	extern int	nested_spintrylock(lock_t *lp);
	extern void	nested_spinunlock(lock_t *lp);

	extern void	nested_bitlock(uint_t *lp, uint_t lockbit);
	extern int	nested_bittrylock(uint_t *lp, uint_t lockbit);
	extern void	nested_bitunlock(uint_t *lp, uint_t lockbit);

	extern void	nested_64bitlock(__uint64_t *lp, __uint64_t lockbit);
	extern int	nested_64bittrylock(__uint64_t *lp, __uint64_t lockbit);
	extern void	nested_64bitunlock(__uint64_t *lp, __uint64_t lockbit);

As one might suspect, these routines do nothing on single-processor systems
(and are actually null macros on these systems -- see "sys/sema.h").

The return values for the *trylock functions return a non-zero value if
the lock was acquired, 0 otherwise (and always success on
single-processor systems).


A Few Spinlock Rules
====================

A thread of execution that holds a spinlock may not go to sleep.

The system ensures that a spinlock holder doesn't get preempted while a
spinlock is held.  (A sufficiently high interrupt _may_ trigger while a
thread of execution holds a spinlock, and the interrupt handler run "on
top of" the thread, but the original thread continues to execute after
the interrupt handler has finished.)

And the owner of the spinlock cannot voluntarily go to sleep or switch
to a different processor.  The metered spinlock package will
assert-botch if a thread unlocks a spinlock while running on a
different processor than the one on which it acquired the lock.


If more than one spinlock are to be acquired at once, the locks must
always be acquired in the same order -- but there is no programming
or runtime assistance available to tell the programmer that she has
violated this rule, except that the system could get caught in a
spinning (AB/BA) deadlock).

If the second-in-order lock has already been acquired, one could use a
conditional-lock routine to try and acquire the first-in-order lock --
but if the conditional lock attempt fails, there's no choice but to
unlock the second-in-order, relock both locks in the correct order, and
recheck relevant state.

Also, while it is not important that spinlocks must be released in any
particular order when more than one has been acquired, it is very
important that the cookies returned by the lock routines be passed to
the unlock routines in reverse (LIFO) order.  If nested spinlocks are
used and only one cookie is received, the cookie must be passed back on
the last unlock call.

Let me repeat this in another way: the cookies are really independent
of the individual spinlocks -- the spinlocks can be unlocked in any
order as long as: 1) the cookies are returned in last-in-first-out
order; and 2) the first cookie received is returned with the last
unlock call (or, possibly with an splx() call after releasing the last
lock).


Single-Processor Spin-Locking Strategies
========================================

In general, locking is necessary any time data needs to be protected
against other threads of execution.

On systems prior to ficus/kudzu (that is, through 6.2 and bonzai), the
system provided certain non-preemption guarantees so that knowledgeable
programmers could avoid having to take locks in certain circumstances.

Two of these involve only single-processor systems.

The first is when only top-level routines manipulate the particular
data.  On a single-processor system (RUNNING PRE-FICUS SYSTEMS ONLY!!!)
there's no need to hold a lock when only top-level routines manipulate
the data -- because pre-ficus system would not preempt low-priority
threads of execution with higher-priority threads.

All of the "nested_*" spinlock routines named above are also available
in pre-ficus system in the form of "mp_*" routines (e.g.: mp_spinlock,
mp_spintrylock, mp_spinlock).  These routines function exactly as the
nested versions, except that the metering (debug) package doesn't
assert (demand) that the interrupt priority level is at least splhi at
the time of the lock calls -- as it _does_ for the "nested_" routines.

The second case is similar -- when only interrupt routines manipulate
the data, and on single-processor systems there's no need to protect
anything (the interrupt handler runs with interrupts at that level
disabled).

Note Well: the "mp_" routines have been removed in the ficus/kudzu
trees -- these systems don't guarantee non-preemption of either
top-level or interrupt-level threads of execution.


Sleeping Mutual Exclusion Locks
===============================

There are (currently) three varieties of sleeping mutual-exclusion
locks.  By "sleeping lock", we mean that the lock routines might queue
the caller and switch contexts to another thread of execution if the
lock was not available at the time of the acquisition request.


Mutex Sleep Locks
=================

Strict mutual exclusion locks are provided with mutex_t objects.
The only operations allowed are lock, trylock, and unlock:

	void	mutex_lock(mutex_t *mp, int flags);
	int	mutex_trylock(mutex_t *mp);
	void	mutex_unlock(mutex_t *mp);


These locks can only be used if they are unlocked by the same thread
of execution that locked them.  None of these routines can be called
from interrupt handlers: mutex_lock cannot be called because it could
sleep and there is no context in which to sleep; mutex_trylock cannot
be called because this package needs to assign an "owner" context to
an aquired lock, and the interrupt routine has no context; and
mutex_unlock cannot be called because these locks can only be unlocked
by the same "owner" that locked them, and the interrupt routine
cannot be the "owner."

Mutexes are not breakable by signals.  They also implement priority
inheritance:  if the caller must sleep waiting for the mutex, and the
current mutex owner has a less favorable priority than the caller, the
mutex owner will inherit the caller's priority until the owner releases
the lock.

Routines which can't follow the mutex_lock/mutex_unlock rules can use
the semaphore or mrlock interfaces (explained later).

There are various initialization and debugging routines associated with
mutexes:

	void	mutex_init(mutex_t *mp, int type, char *name);
	void	init_mutex(mutex_t *mp, int type, char *name, long sequence);
	void	mutex_destroy(mutex_t *mp);
	mutex_t	*mutex_alloc(int type, int flags, char *name);
	mutex_t	*mutex_dealloc(mutex_t *mp);

Currently, only mutex type "MUTEX_DEFAULT" is supported.

The init routines initialize already-allocated mutexes; the alloc
routine allocates and initializes a mutex -- the only interesting flag
value is KM_NOSLEEP (sys/kmem.h).  The name field can be null for any
of the init/alloc routines -- they is only used by the metering (debug)
package.

The destroy routine should be called if the mutex is decommissioned so
that the metering package can deallocate related debugging data
structures.

The following routines return non-zero if a) any thread currently holds
the mutex, or b) the calling thread owns the mutex:

	int	mutex_owned(mutex_t *mp)		(a)
	int	mutex_mine(mutex_t *mp)			(b)


Debugger Hooks
==============

From idbg(): "mutex ADDR"


Semaphore Sleep Locks
=====================

If a sleeping mutual exclusion lock is needed, but the strict mutex_t
rules cannot be followed, semaphores can be used as simple (sleeping)
mutual exclusion locks.  Semaphores can be locked by one thread of
execution and unlocked by another.  Interrupt handlers can
conditionally-acquire locks and, of course, unlock them (as long as the
interrupt handler isn't running at splprof!).

Semaphores don't implement priority inheritance, and can be broken,
depending on arguments passed in the lock call.

	void	initnsema_mutex(sema_t *sp, char *name);
	int	psema(sema_t *sp, int flags);
	int	cpsema(sema_t *sp);
	int	vsema(sema_t *sp);
	void	freesema(sema_t *sp);

The init routine initializes the counting semaphore to be used as a
mutual exclusion lock (other uses discussed below, under
"Synchronication").  The free routines returns any kernel resources
associated with the semaphore.

The psema routine acquires the semaphore; cpsema conditionally acquires
it; vsema releases the semaphore.

The bits in the flags word that correspond to PMASK (sys/param.h)
determine whether the psema call is breakable by a signal.

If those bits are greater than PZERO, the lock attempt will fail if
there is a signal pending at the time of the lock call; and if the
caller must sleep waiting to acqire the semaphore, and a signal
arrives, the signal will prematurely wake the caller.

If the PLTWAIT bit is set (ficus and beyond), the SLTWAIT bit in the
process structure will be set and the process wil not be considered for
purposes of load average computation -- PLTWAIT indicates that the
process is entering a long term wait.

In releases prior to ficus, the PMASK bits determines the priority that
the caller will run after it wakes _if_ it has to sleep waiting for the
semaphore; as of ficus, the scheduler uses only the thread's native
priority (and whatever scheduler earnings it has acquired) to
determine scheduling order.

Also in released prior to ficus, if the semaphore operation is broken
and PCATCH is _not_ set, the psema call will longjmp instead of
return to the caller.


Important Note:

It is really bad programming practise to call psema at a breakable
priority when the semaphore is being used as a mutual exclusion lock
(as opposed to a synchronizing object); the caller would have to check
the return value and, on error, retry the semaphore operation.


So: when using a semaphore for mutual exclusion, always use:

	(void) psema(&sema, PZERO);
	...
	(void) vsema(&sema);


Note that semaphores are stateful, so if vsema is called before psema,
the psema will return immediately.  This is useful when semaphores
are being used as synchronization objects (discussed later), but care
must be taken when using them as mutual exclusion objects: either have
only the psema caller "release" the lock (via vsema) or be very careful
that there is only one vsema for (and after) each psema.

An example of semaphores used as mutual exclusion locks: file system
buffers are acquired by the read or write caller (or by a system
daemon), but if a request is asyncronous (or a read-ahead call), the
buffer is marked ASYNC, and the interrupt handler releases the
semaphore after it has finished the disk transaction.

Debugger Hooks
==============

From idbg(): "sema ADDR"


Multi-Access/Single-Update Sleeping Locks
=========================================

Irix supports a version of multi-access/single-update sleeping locks
(also known as multi-reader locks):

	void	mrlock(mrlock_t *mrp, int type, int flags);
	int	cmrlock(mrlock_t *mrp, int type);
	void	mrunlock(mrlock_t *mrp);
	int	mrtrypromote(mrlock_t *mrp);
	void	mrdemote(mrlock_t *mrp);


The type is either MR_ACCESS or MR_UPDATE, and flags are those that
would be passed to psema.  As one might imagine, mrlock allows multiple
simultaneous accessers or one updater to hold a lock; callers will
sleep until the lock can be acquired in the desired mode.

A thread of execution which already holds a lock for access mode can
acquire the lock again (double-trip) in access mode; but an attempt to
acquire access mode when the caller already holds update mode, or to
acquire update mode when the caller alreads holds any mode, will hang.

As of ficus, there are also fast-path versions:

	void	mraccess(mrlock_t *mrp, int flags);
	void	mrupdate(mrlock_t *mrp, int flags);
	int	mrtryaccess(mrlock_t *mrp);
	int	mrtryupdate(mrlock_t *mrp);
	void	mraccunlock(mrlock_t *mrp);


The mrlocks don't provide priority inheritance and aren't interruptible
(breakable) by a signal.

Debugger Hooks
==============

From idbg(): "mrlock ADDR"


SYNCHRONIZATION
===============

Syncronization Variables
========================

Synchronization variables (sync variables) are used to wait for a
condition or event.

To initialize or destroy a sync variable:

	void	sv_init(sv_t *svp, int type, int name);
	void	init_sv(sv_t *svp, int type, int name, long sequence);
	sv_t	*sv_alloc(int type, int vmflags, char *name);
	void	sv_destroy(sv_t *svp);
	void	sv_dealloc(sv_t *svp);

By default (type SV_DEFAULT or SV_FIFO) callers are woken in first-in/
first-out order; SV_LIFO forces last-in/first-out.  The name and
sequence arguments are used only as debug/metering tags.

To wait for an event to occur, a thread can use one of several routines:

	int	sv_wait_sig(sv_t *svp, int flags, void *lockp, int rv);
	void	sv_wait(sv_t *svp, int flags, void *lockp, int rv);

	int	sv_sema_wait_sig(sv_t *svp, int flags, sema_t *sp);
	void	sv_sema_wait(sv_t *svp, int flags, sema_t *sp);

	int	sv_mrlock_wait_sig(sv_t *svp, int flags, mrlock_t *mrp);
	void	sv_mrlock_wait(sv_t *svp, int flags, mrlock_t *mrp);

	int	sv_bitlock_wait_sig(sv_t *svp, int flags,
			uint_t *lock, uint_t bit, int rv);
	void	sv_bitlock_wait(sv_t *svp, int flags,
			uint_t *lock, uint_t bit, int rv);

Sync-variables (a.k.a. condition-variables) carry no state, so the
caller must use locks to protect the state bits that are involved in
the decision to wait.

The first two arguments to the wait routines are always the address of
the sync variable and flags (explained later).

The remaining arguments refer to the mutual exclusion locks the callers
are using to protect the state or event data.  The address of a locked
mutual exclusion object is always passed, along with lock-dependent
"cookies".

For sv_wait and sv_wait_sig, either a spinning lock or sleeping mutex
lock address can be passed; if it is a spinning lock, the cookie
returned by the lock call must be passed; if it is a mutex (sleep)
lock, 0 must be passed.

The "_sig" routines can be broken (the caller awakened early) by signals,
just as with semaphore operations.

For all the wait calls, the caller is placed on the sync variable sleep
queue and the lock (whose address is passed as an argument) is
released; it is the caller's responsibility to reacquire the lock upon
return from the wait call.


The following routines wake up a) at most one, or b) all threads
waiting on the sync variable, and return the number of threads woken:

	int sv_signal(sv_t *svp);	(a)
	int sv_broadcast(sv_t *svp);	(b)


An example:

	rv = mutex_spinlock(&obj->o_lock);
	while (!(obj->o_flags & READY)) {
		obj->o_flags |= WAITING;
		sv_wait(&obj->o_wait, PZERO, &obj->o_lock, rv);
		mutex_spinlock(&obj->o_lock);
	}
	mutex_spinunlock(&obj->o_lock, rv);

	[] [] [] [] []

	rv = mutex_spinlock(&obj->o_lock);
	obj->o_flags |= READY;
	if (obj->o_flags & WAITING) {
		obj->o_flags &= ~WAITING;
		sv_broadcast(&obj->o_wait);
		/*
		 * ... or sv_signal(&obj->o_wait) if only one
		 * other thread of execution could sleep on this.
		 */
	}
	mutex_spinunlock(&obj->o_lock, rv);

In the above example, the caller of sv_wait will only return after
another thread calls sv_signal or sv_broadcast on the same sync
variable.


There are only two flags fields of any interest.  One is PNOSTOP,
which, for breakable calls, directs the sync-wait routine to return
early if a signal is pending, but not to actually field the signal.
This bit is also recognized by the semaphore waiting code.

The other bits indicate which kernel timer to use to account for the
caller's sleep.  A non-default timer is specified thusly:

	flags |= TIMER_SHIFT(AS_PHYSIO_WAIT);

where AS_PHYSIO_WAIT is a timer defined in sys/timers.h.  The timers
are used only for statistics gathering and don't affect any scheduling
decisions (these bits are also recognized by psema).


Debugger Hooks
==============

From idbg(): "sv ADDR"


Semaphores as Synchronizers
===========================

A semaphore can be used as a simple sync variable.  It should be
initialized as a syncronizer, not a mutual exclusion semaphore:

	void	initnsema_synch(sema_t *sp, char *name);
or,	void	initnsema(sema_t *sp, 0, name);
or,	void	initsema(sema_t *sp, 0);

	int	psema(sema_t *sp, int flags);
	int	cpsema(sema_t *sp);
	int	vsema(sema_t *sp);

As discussed earlier (in the Mutual Exclusion section), the flags
argument to psema determines whether the call can be broken by a
signal.  If the PMASK bits are > PZERO, the psema call will return
early if the caller gets a signal -- AND IN THIS CASE WILL LONGJMP if
not passed PCATCH!!! -- pre-ficus systems only).

Further, semaphores are stateful, so if vsema is called before the
psema call, the psema will return immediately.


Atomic Operators
================

There are a number of simple routines available which can be used to
manipulate data, without acquiring mutual exclusion locks, in such a
way the the data is always consistent.

The routines all use the load-link/store-conditional or
load-link-double/store-conditional-double machine instructions, and
as such can only operate on 32-bit or 64-bit words.

To understand why an atomic operator is needed to, say, increment an
integer counter, we need to understand what happens in a simple
memory operation.  To effect:

	extern int prince;
	prince++;

The compiler must generate a load instruction to put the value of (the
memory location referenced by) "prince" in a register, add one to the
register, then store it in (the memory location still known as) "prince".

But what if an interrupt occurred between the load and the store, and
the interrupt handler also incremented "prince", or if another thread
of execution was running the same code, incrementing "prince" at the
same time?  One of the increments would be lost.

The same is true of bit operations, such as:

	extern int zoobie;

	zoobie |= ZOOBIE_DONE;
	zoobie &= ~ZOOBIE_WAIT;

Most of the atomic operators are defined in "sys/atomic_ops.h" and
implemented in "ml/atomic_ops.h".

And note that some of these operations are in-line functions in the
Mongoose compiler!


Atomic Increment / Decrement
============================

	extern int atomicAddInt(int *, int);
	extern uint atomicAddUint(uint *, uint);
	extern long atomicAddLong(long *, long);
	extern unsigned long atomicAddUlong(unsigned long *, unsigned long);
	extern int64_t atomicAddInt64(int64_t *, int64_t);
	extern uint64_t atomicAddUint64(uint64_t *, uint64_t);

Each of the add routines take a pointer to the word and the value
to be added (the values to be 'added' can be negative, of course,
for the signed-type routines).


Setting / Clearing Bits
=======================

The set- or clear-bit are similar:  the first argument is the address
of the word, and the second is the bits in the word to set or clear.

	extern long atomicSetLong(long *, long);
	extern long atomicClearLong(long *, long);
	extern int atomicSetInt(int *, int);
	extern int atomicClearInt(int *, int);
	extern uint atomicSetUint(uint *, uint);
	extern uint atomicClearUint(uint *, uint);
	extern unsigned long atomicSetUlong(unsigned long *, unsigned long);
	extern unsigned long atomicClearUlong(unsigned long *, unsigned long);


There are also bit-set and bit-clear operations that interoperate with
the mutex_bitlock routines.  Say you are using a mutex_bitlock to
protect, among other things, the bits in the word in which the bitlock
resides.  If in some cases you only wanted to set or clear some of the
bits in that word, you would _always_ have to use the atomicSet/atomicClear
routines, or, you could use mutex_bitlock routines for some of them and
mutex_bitlock-compatible bit set/clear routines otherwise:

	extern uint_t	bitlock_clr(uint_t bits*, uint_t lockbit, uint_t clr);
	extern uint_t	bitlock_set(uint_t bits*, uint_t lockbit, uint_t set);

The arguments to the bitlock_{clr,set} routines are: the address of the
memory word containing the bits; the lock bit used by mutex_bitlock;
and the bits to clear or set.  These routines work much as the
atomic{Set,Clear} routines, except that they won't update the target
word while the lockbit is set.


Test_and_Set / Compare_and_Swap
===============================

To arbitrarily set the value of a word, and return the previous value:

	extern int	test_and_set_int(int * loc, int new);
	extern long	test_and_set_long(long * loc, long new);
	extern void *	test_and_set_long(long * loc, long new);

To set the value of a word, but only if it matches the passed value:

	extern int	compare_and_swap_int(int *, int old, int new);
	extern int	compare_and_swap_long(long *, long old, long new);
	extern int	compare_and_swap_ptr(void **, void *old, void *new);