-----------------------------------------------
Field-Accurate Video Deck Control and Emulation
OS Requirements and Solutions
Rev 1.1
Chris Pirazzi
-----------------------------------------------

Just the simplified basics that OS folks need to know in order to
understand the requirements for video deck control.

Then the details of the OS requirements and solutions we have so far.


REAL-TIME REQUIREMENTS
----------------------

First and foremost: all the time requirements we give in this document
are hard, guaranteed requirements.  When we say "X must happen M
milliseconds after Y" we mean all the time, 100% of the time,
guaranteed, and supported.  We are not talking about best-case or
average-case performance.  We are giving bounds on worst-case
performance.

For video and decks, violating the timing requirements below always
results in failures which are just as serious as SCSI parity errors or
unrecoverable memory ECC errors.  Some of these failures will
permanently destroy customer video material on videotape.  All of
these failures will cause video customers to do what they would do
with any damaged video equipment: they will return the machine to SGI
and buy another brand.

Therefore, in order to say we support deck control and emulation as
described below, SGI must tell the customer precisely which SGI
configurations support the guarantees, and then SGI must commit the
engineering resources to deliver those guarantees on the target
configurations and and fix the bug if the machine fails to deliver the
guarantees.


VIDEO
-----

Video enters and leaves an SGI machine in an industry-standard analog
or digital signal format.  Video is divided up into video fields of
equal duration.  You can think of a field as one image, roughly like
one frame of a cinema film.  There is one video field every 20ms or
16.68333...ms, depending on the flavor of video.  Each field has a
well-defined starting point in time.  All video equipment (decks, SGI
machines, ...) at a customer site is connected to a common per-field
heartbeat signal called "house sync;" video fields always start on one
of these heartbeats.


THE DECK CONTROL PROTOCOL
-------------------------

Video decks (also called Video Tape Recorders, VTRs, VCRs, ...) we are
concerned with have an RS-422 serial port.  A controller can send
a command to the deck over the serial port, like:

	- start rolling the tape now
	- stop rolling the tape now
	- start recording now
	- stop recording now
	- tell me which field you are playing or recording now
          (the fields on tape are numbered sequentially)

using the industry-standard Sony 9-Pin protocol (always 38400 baud,
odd parity, 8 bits, 1 stop bit).  The deck always sends one response
(usually ACK or NAK, but sometimes information) to each command from
the controller.  The controller can then send another command.  The
deck never initiates; it only transmits on the serial line in response
to a command from the controller.  Commands and responses range from 2
bytes to 18 bytes each.

A serial line is idling when it is in marking state between the
trailing edge of a stop bit and the leading edge of a start bit.  The
serial line may not idle for more than 10ms between any two adjacent
bytes of a command or response.

The start of a command or response is the leading edge of the start
bit of its first byte on the serial line.  The end of a command or
response is the trailing edge of the stop bit of its last byte on the
serial line.

The deck must start a response 0 to 9 milliseconds after the end of
any command from the controller.  The deck cannot predict what
commands the controller will send.

The controller must be able to pair up any response with the
corresponding command.

One of the commands, Status Sense, causes the deck to indicate whether
the user has overridden the controller by pushing buttons on the deck's
front panel, whether the deck is servolocked (rolling forward at a
stable, 1x speed), and whether any error conditions (end of tape,
mechanical failure, ...) have arisen on the deck.  The controller must
send a Status Sense command at least every 300ms.  The delay between a
change of status on the deck and the start of the controller's command
in reaction to this change can be at most 300ms.

Certain commands can behave field-accurately.  To get the field
accuracy, the deck must be servolocked, the controller must start
field-accurate-capable commands between D1 ms and D2 ms after the
start of a field, and the command and its response must end before the
start of the next field.  D1 and D2 depend on the deck.  D2-D1 >= 2ms.

Current Time Sense is a field-accurate-capable command. A
field-accurate Current Time Sense command causes the deck to respond
with the number of the field that is currently playing out its output
jack or recording at its input jack.  Certain state-changing commands,
such as record on/off, are field-accurate-capable.  A field-accurate
state-changing command will cause the deck to change its state after a
fixed, guaranteed number of fields E.  E depends on the deck and the
command.  The controller must have a field-accurate Current Time Sense
response in order to compute the correct field in which to send a
field-accurate state-changing command.  Assuming that the field
accuracy conditions are maintained throughout, the delay between the
end of a Current Time Sense response and the earliest time at which a
controller is ready to issue a state-changing command can be at most
300ms.

A note about the two 300ms latency figures above: 300ms is a very,
very bad latency.  It is the cutoff point after which a customer will
declare the controller broken and send it back (hence it being a
requirement).  Seeing as dedicated 422 controllers deliver latencies
more on the order of 50ms, it will be extremely embarrassing if SGI
cannot ship a machine that offers a guarantee less than 300ms.  We'll
examine below how these latency figures translate into IRIX user
thread scheduling requirements.

Current industry practice adds a few more constraints not found in the
Sony spec.  Many decks will ignore commands sent during a particular
(525-2*243)/525==7.42...% or (625-288*2)/625==7.84% of the video field
(depending on the flavor of video).  This means the controller must
start all commands, not just field-accurate commands, between a
deck-specific D1ms and D2ms after the start of field (D2-D1 >= 2ms).
Many controllers assume that they can send at least two 3-byte
commands and receive at least two 3-byte responses in one video field
time.  This tightens the constraint on command and response idle time
and deck response time.  Assuming no idle time at all during commands
and responses, this means that the sum of the deck's response time to
both commands cannot exceed 12ms.  This is calculated based on the
shortest field time (16.68333...ms), minus the 7.42...% above, minus
two 3-character command times, minus two 3-character response times.
We make a simplifying assumption and say the deck must respond to each
command within 6ms.


VIDEO DECK CONTROL ON SGI: PROBLEMS
-----------------------------------

Video deck control is when an SGI machine is the controller.

This is the most common configuration in animation houses and smaller
video production setups, where the SGI machine is the customer's main
control console for all devices.  Typically, the SGI's video output
connects to the deck's video input, and the SGI's video input
connects to the deck's video output.

As stated above, the controller needs to send serial commands
precisely relative to video field boundaries, and measure when serial
responses come relative to those boundaries.  Since the controller
(the SGI machine) in this case is also doing video I/O, there is one
more requirement.  When it brings a field from a video wire into
memory, the SGI must be able to match up that field with the serial
commands and responses that coincided with the video field over the
serial wire.  When it prepares to output an in-memory video field over
a video wire, the SGI must be able to match up that field with the
commands that the SGI will transmit at the same time over the serial
wire.

A solution to deck control must meet all of the above requirements.

The simplest solution to deck control would be this:

/* see frame-accurate commands above for the definition of D1 and D2 */
while (1)
{
  rightnow = the time right now;
  t = ask the Video Library what time the next field starts;
  /* we want to wake up at t+D1 */
  /* INVARIANT: the difference between rightnow and the time which 
   *            nanosleep will use to determine our user thread's actual
   *            wakeup time is less than:
   *              D2-serial_transmit_latency-length_of_code_path_below-D1. 
   */
  nanosleep(t+D1 - rightnow);

  /* INVARIANT: the current time is >= t+D1 */

  if (we sent a command on the last field)
    {
      the deck will have sent us a response by now;
      /* INVARIANT: our user thread can now read all the bytes 
       *            of one serial response 
       */
      read one complete Sony protocol response from the deck;
    }
  
  if (we're receiving video from the deck)
    {
      based on the most recent Sony protocol responses,
        compute the field number of the video image from the last field;

      /* INVARIANT: the video image from the last field is available */
      grab the video image from the last field;
      store the image and its field number somewhere;
    } 
  else /* we're sending video to the deck */
    {
      based on the most recent Sony protocol responses,
        decide which video image to send out the video port next field;
      send that image to the Video Library;
      /* INVARIANT: the Video Library will begin to output that field
       *            once the current one is done
       */
    }
  
  compute what command to send out the serial port in this field;
  send that command out the serial port;
  /* INVARIANT: the bytes will start coming out the serial port
   *            at the latest in serial_transmit_latency
   */

  /* INVARIANT: the current time is <= t+D2-serial_transmit_latency */
}

Unfortunately, none of the invariants above are supported on any SP
SGI platform, and some of them are not even supported on MP platforms.

For the above code to work, we would need a guarantee that our code
executes between time t+D1 and t+D2-serial_transmit_latency.
Currently, SGI offers no lower bound on the amount of time a user
thread will be running during any interval whatsoever on any SP SGI
system, even if it is the highest priority thread.  We would need a
guarantee that we can run for long enough to execute the code path
above in a window of time (of length D2-D1-serial_transmit_latency)
which could be as small as 2ms-serial_transmit_latency.

The percent of that window during which we need to execute is tiny.
The code path above involves little more than poking a couple of
hardware registers and accessing maybe 200 cache lines.  SGI needs to
provide enough systems information (cycle counting utilities,
worst-case cache and TLB numbers) so that a developer can compute an
upper bound on their code's cycle requirement.  However, CPU
throughput is not likely to be an issue in practice.  The tough issue
is likely to be user thread scheduling latency (ie, getting that tiny
bit of code to run at all).

Say the user thread scheduling problems are solved, as they are on MP
systems with REACT/Pro or some of the MP kudzu systems.  This is still
not enough:

- Currently, SGI offers no lower bound on the amount of time between
when a user thread sends a byte to the serial port and when that byte
will actually go out the serial jack.  The code above refers to this
as serial_transmit_latency.  We would need a guaranteed lower bound on
serial_transmit_latency which is less than

  D2-length_of_code_path_above-D1

Since length_of_code_path_above is a small fraction of D2-D1, this
should be just under 2ms in the worst case.

- Currently, SGI offers no lower bound on the amount of time between
when a byte arrives at the serial jack and when we can read that byte
from a user thread.  We would need a guaranteed lower bound on this
serial receive latency which is less than the worst-case amount of
time between the end of the deck's response and the point D1 ms into
the next field.  The worst-case command and response take 15.01...ms
so this is at least 1.66...ms for the shortest field time.

Both of these I/O latencies are easily achievable in all current SGI
serial hardware designs.  However, SGI needs to support some software
interface which guarantees these latency bounds.


VIDEO DECK CONTROL ON SGI: TSERIALIO SOLUTION (?)
-------------------------------------------------

We needed to support deck control on O2.  The cleanest, simplest, and
cheapest way would have been the method described above.  Our pleas to
support these guarantees in IRIX 6.3 were unsuccessful.

So we were forced to develop tserialio, a hack which relied on the
following observations:

- Since serial commands have to be sent D1 to D2 ms after the start of
a field, and D2-D1 can be as small as 2ms, then this means we need to
be able to schedule serial bytes relative to video fields with plus or
minus one millisecond accuracy.

- Since our serial hardware does not support timestamping or
scheduling (nor should it!), some piece of software has to run during
those crucial milliseconds to do the serial RX and TX.

- The only event on IRIX 6.3 which is guaranteed to occur every
millisecond is the nasty kernel profiler tick.  It would be
unacceptably burdensome to hang deck control code off the profiler
tick.  It is really necessary to put all the deck control code there?

- No.  Accuracy is not the same as latency.  The deck control code
needs to timestamp RX bytes and schedule TX bytes accurately, but its
maximum latency---the maximum time it must take to react to an
incoming serial signal by producing a corresponding outgoing serial
signal---is more like 300ms.  As explained above, 300ms is the cutoff
latency where the customer declares the machine broken and sends it
back.  We should really offer a guarantee much less than 300ms, but
for this document 300ms is the hard requirement.

The accuracy comes from tserialio.  It is a very simple driver and
user-mode library which gives a user thread a way to schedule serial
bytes for transmission out the serial jack in the future, and measure
the time at which bytes from the past arrived at the serial jack.  The
measuring and timestamping is accurate to plus or minus one
millisecond relative to the start of video fields.  The tserialio
driver (a serialio upper layer) hangs off the profiler tick doing
serial RX and TX.  It accounts for hardware and kernel software I/O
latencies, so the user can think in terms of times at the serial jack.

An application that does deck control using tserialio looks more like this:

while (1)
{
  nanosleep(any amount of time less than 300ms);

  /* INVARIANT: at most 300ms have transpired since we last emptied out
   *            the tserialio input port.
   */
  for(each response that is waiting from the deck)
    {
      get the response and its starting time from tserialio;
      map that starting time to a particular field;
      interpret the response in the context of that field;
    }

  /* INVARIANT: at most 300ms have transpired since we last enqueued
   *            300ms worth of commands on the tserialio output port.
   */
  use tserialio to send enough serial commands so that we've got 300ms 
    worth of serial commands buffered up.  tell tserialio to send each
    command at the start of field + D1 ms;

  if (we're receiving video from the deck)
    {
      /* INVARIANT: at most 300ms have transpired since we last emptied out
       *            the Video Library input port.
       */
      for(each video image that is waiting from the deck)
        {
          get the video image and its starting time from the Video Library;
          map that starting time to a particular Sony protocol response;
          based on the corresponding Sony protocol response,
            compute the field number of that video image;
          store the image and its field number somewhere;
        }
    } 
  else /* we're sending video to the deck */
    {
      based on the most recent Sony protocol responses,
        decide which video images to send out the video port next field;
      /* INVARIANT: at most 300ms have transpired since we last enqueued
       *            300ms worth of commands on the tserialio output port.
       */
      use the Video Library to send enough images so we have 300ms buffered up;
    }
}

As you can see, it is much more complex.  Each time around the loop,
the code has to schedule up to 300ms of commands in the future, and
deal with up to 300ms worth of responses from the past.

You may wonder how the application can relate the input and output
times of video fields and serial signals.  The tserialio library and
driver know nothing whatsoever about video.  Both tserialio and the
Video Library place all incoming and outgoing data on the common UST
timeline.  UST is a systemwide, unadjusted timebase with microsecond
resolution.  The application asks the Video Library to tell it the UST
of each video field, and then it tells tserialio to send serial bytes
at that UST plus D1 ms.  The application receives responses from
tserialio stamped with UST, and it uses the video field USTs from the
Video Library to figure out which field that response came from.

This deck control "solution" ships with Adobe Premiere on O2.  It
works most of the time.

But it is still not a solution: we've changed the latency requirement
from 2ms to 300ms, but as we said before, SGI neither guarantees nor
supports any particular latency!

Even before we go and worry about millisecond scheduling of a user
thread, it would help the situation if SGI would pick and publish some
easily manageable user thread scheduling latency number which we can
guarantee and support today on each system, so that developers inside
and outside SGI know how far ahead they need to buffer in their deck
control applications.  That number should be at most 300ms.

At the moment, we code in peril and developers are turned away by our
failure to guarantee any number in particular.


VIDEO DECK EMULATION ON SGI
---------------------------

So, you might think that the OS is out of the woods in terms of
delivering low latencies.  Unfortunately, you have to also consider
the other half of the problem:

Video deck emulation is when an SGI machine is the deck instead of the
controller.

This is the most common configuration in video production studios,
where the customer controls tens or hundreds of decks (only some of
which are actually SGI machines) simultaneously from a large,
custom-made physical console.  The console also controls video
switching networks and video signal processing boxes.  The console
instructs the decks to play and record, and switches their video
signals to each other and to and from the effects units.

Now that the SGI machine is the deck, it must respond to commands from
the controller with a corresponding response in 6ms.  The deck cannot
predict what commands the controller will send it.  There is simply no
way of getting around it: in order to do deck emulation, a user thread
must be able to wake up, read a serial command, and transmit a serial
response, such that the time from the end of the command on the serial
line to the beginning of the response on the serial line does not
exceed 6ms.  This requires both user thread scheduling latency and
I/O latency guarantees.

Since the SGI machine is also doing video I/O, it shares the
requirements of matching up video fields with serial commands and
responses, described above under video deck control.  For example, the
deck must be able to match a video field up with a Current Time Sense
command from the controller field-accurately, so that it can produce a
field-accurate response.

Since the controller will be sending state-changing commands to the
SGI machine (the deck), it must be able to react to these commands on
a precise field.  The deck gets to choose the latency E in fields
which it takes to execute a given state-changing command.  This
latency can be as high as 10 fields (around 170ms or 200ms, depending
on video flavor).  So as long as we can guarantee user thread
scheduling on a finer granularity than 170ms, then the state-changing
problem is an accuracy problem.  It could be solved either by user
thread scheduling latency and I/O latency guarantees, or some kernel
hack to the video driver like tserialio was to the serial driver.
Since we need 6ms user thread scheduling guarantees for the serial
case above, we should only need an additional video I/O latency
guarantee to achieve proper behavior for state-changing commands.


SUMMARY: CONTROL
----------------

SGI does not currently offer the guarantees it takes to do correct
deck control on any currently shipping platform, but in some cases
this would be easy.  To support deck control, a platform must follow
one of these two recipes:

RECIPE A:

- support tserialio, and
- guarantee that a piece of code in a user thread can run at least once 
  within each 300ms interval.

or

RECIPE B:

- guarantee that a piece of code in a user thread can run once 
  within a 2ms-serial_transmit_latency interval relative to the start 
  of each video field, and
- guarantee a serial_transmit_latency of at most 1ms, and
- guarantee a serial receive latency of at most 1ms.

Many developers wish to run on more than one SGI platform, and for the
most part will be unwilling to rewrite their code to do this.  These
developers would choose whatever recipe was available on most platforms.

It would be greatly preferable to make recipe B available on all
platforms.  It yields less OS code to test and less developer code to
write.

Recipe A may be easier to port in the short term, though in the long
term it will have a higher continuing support cost than recipe B.
Users access tserialio through a user-mode shared library, not by
talking to a UNIX device node.  On systems where we guarantee that a
user thread can run once in every:

  1ms - serial_receive_latency - serial_transmit_latency

interval, we can easily implement tserialio in user-mode only and need
no kernel work.  Notice that these requirements are stricter than those
for recipe B, so such a platform would also be able to support recipe B.

Here is what we have now:

Indigo, Indy, Indigo2: neither
O2:                    recipe A minus the 300ms scheduling guarantee
Octane:                neither
Challenge/Onyx:        recipe B minus both serial latency guarantees
Origin/Onyx2:          recipe B minus both serial latency guarantees


SUMMARY: EMULATION
------------------

SGI does not currently offer the guarantees it takes to do correct
deck emulation on any currently shipping platform.  To support deck
emulation, a platform must follow this recipe:

RECIPE

- guarantee that a piece of code in a user thread can wake up, read a
serial command, and transmit a serial response, such that the time
from the end of the command on the serial line to the beginning of the
response on the serial line does not exceed 6ms.  many mixes of user
thread scheduling guarantees and serial I/O latency guarantees could
achieve this.

This recipe assumes the user thread can use these guarantees to
execute state-changing commands from the controller in a timely
manner.  In reality there may be some extra VL work (video I/O latency
work) to make this happen.

Here is what we have now:

Indigo, Indy, Indigo2: neither scheduling nor I/O latency guarantees
O2:                    neither scheduling nor I/O latency guarantees
Octane:                neither scheduling nor I/O latency guarantees
Challenge/Onyx:        scheduling, but no I/O latency guarantees
Origin/Onyx2:          scheduling, but no I/O latency guarantees