wernermisc/m1/patches/rtems/coremsgsubmit-race.patch

This issue is under review:
https://www.rtems.org/bugzilla/show_bug.cgi?id=1961

If it's permissible to call rtems_message_queue_send from an
interrupt, then there is at least one race condition in the core
message subsystem.

This created the MIDI/mouse hang we love so much on M1.

The problem is as follows: RTEMS queues use pre-allocated message
buffers that are kept on an "inactive" (free) list. When enqueuing
a message, a buffer is first removed from the inactive list, data
it copied to it, and it is then added to the pending list.

The reverse happens when dequeuing. Besides these two queues, there
is also a counter called number_of_pending_messages keeping track,
as the name suggests, of the number of pending messages. It is
updated atomically together with changes to the pending buffers
list.

From the above it is clear that the counter will be out of sync with
the inactive list between the beginning and the end of an enqueue or
dequeue operation.

In order to minimize interrupt latency, RTEMS disables interrupts
only when adding and removing buffers from lists, but not throughout
the whole enqueuing/dequeuing operation. Instead, it disables the
scheduler during these operations, but this doesn't prevent
interrupts.

This means that the inconsistency between number_of_pending_messages
and the inactive list can be observed from an interrupt handler if
enqueuing or dequeuing is in progress.

_CORE_message_queue_Submit checks whether there is still room in the
queue by reading number_of_pending_messages. If there is room, it
then calls _CORE_message_queue_Allocate_message_buffer to obtain a
free buffer.

Given that number_of_pending_messages and the list of inactive
buffers can disagree, e.g., if _CORE_message_queue_Seize or another
_CORE_message_queue_Submit is executing concurrently,
_CORE_message_queue_Allocate_message_buffer may fail to obtain a
free buffer despite the prior test.

_CORE_message_queue_Allocate_message_buffer can detect a lack of
free buffers and indicates it by returning a NULL pointer. Checking
whether NULL has been returned instead of a buffer is optional and
depends on RTEMS_DEBUG.

If no check is performed, _CORE_message_queue_Submit will then try
to use the buffer. In the absence of hardware detecting the
de-referencing of NULL pointers, the wounded system will limp on a
little further until, at least in the case of M1, it finally hangs
somewhere.

The patch below avoids the problem in the scenario described above
by not using number_of_pending_messages as an indicator of whether
free buffers are available, but by simply trying to get a buffer,
and handling the result of failure.

This is similar to how _CORE_message_queue_Seize works.

Another possibility would be to make testing of the_message no
longer optional. But then, there would basically be two tests for
the same condition, which is ugly.

- Werner

Index: rtems/cpukit/score/src/coremsgsubmit.c
===================================================================
--- rtems.orig/cpukit/score/src/coremsgsubmit.c	2011-11-12 09:15:12.000000000 -0300
+++ rtems/cpukit/score/src/coremsgsubmit.c	2011-11-12 09:15:17.000000000 -0300
@@ -101,21 +101,9 @@
    *  No one waiting on the message queue at this time, so attempt to
    *  queue the message up for a future receive.
    */
-  if ( the_message_queue->number_of_pending_messages <
-       the_message_queue->maximum_pending_messages ) {
-
-    the_message =
-        _CORE_message_queue_Allocate_message_buffer( the_message_queue );
-
-    #if defined(RTEMS_DEBUG)
-      /*
-       *  NOTE: If the system is consistent, this error should never occur.
-       */
-
-      if ( !the_message )
-        return CORE_MESSAGE_QUEUE_STATUS_UNSATISFIED;
-     #endif
-
+  the_message =
+      _CORE_message_queue_Allocate_message_buffer( the_message_queue );
+  if ( the_message ) {
     _CORE_message_queue_Copy_buffer(
       buffer,
       the_message->Contents.buffer,