----------------------------------------------- Field-Accurate Video Deck Control and Emulation OS Requirements and Solutions Rev 1.1 Chris Pirazzi ----------------------------------------------- Just the simplified basics that OS folks need to know in order to understand the requirements for video deck control. Then the details of the OS requirements and solutions we have so far. REAL-TIME REQUIREMENTS ---------------------- First and foremost: all the time requirements we give in this document are hard, guaranteed requirements. When we say "X must happen M milliseconds after Y" we mean all the time, 100% of the time, guaranteed, and supported. We are not talking about best-case or average-case performance. We are giving bounds on worst-case performance. For video and decks, violating the timing requirements below always results in failures which are just as serious as SCSI parity errors or unrecoverable memory ECC errors. Some of these failures will permanently destroy customer video material on videotape. All of these failures will cause video customers to do what they would do with any damaged video equipment: they will return the machine to SGI and buy another brand. Therefore, in order to say we support deck control and emulation as described below, SGI must tell the customer precisely which SGI configurations support the guarantees, and then SGI must commit the engineering resources to deliver those guarantees on the target configurations and and fix the bug if the machine fails to deliver the guarantees. VIDEO ----- Video enters and leaves an SGI machine in an industry-standard analog or digital signal format. Video is divided up into video fields of equal duration. You can think of a field as one image, roughly like one frame of a cinema film. There is one video field every 20ms or 16.68333...ms, depending on the flavor of video. Each field has a well-defined starting point in time. All video equipment (decks, SGI machines, ...) at a customer site is connected to a common per-field heartbeat signal called "house sync;" video fields always start on one of these heartbeats. THE DECK CONTROL PROTOCOL ------------------------- Video decks (also called Video Tape Recorders, VTRs, VCRs, ...) we are concerned with have an RS-422 serial port. A controller can send a command to the deck over the serial port, like: - start rolling the tape now - stop rolling the tape now - start recording now - stop recording now - tell me which field you are playing or recording now (the fields on tape are numbered sequentially) using the industry-standard Sony 9-Pin protocol (always 38400 baud, odd parity, 8 bits, 1 stop bit). The deck always sends one response (usually ACK or NAK, but sometimes information) to each command from the controller. The controller can then send another command. The deck never initiates; it only transmits on the serial line in response to a command from the controller. Commands and responses range from 2 bytes to 18 bytes each. A serial line is idling when it is in marking state between the trailing edge of a stop bit and the leading edge of a start bit. The serial line may not idle for more than 10ms between any two adjacent bytes of a command or response. The start of a command or response is the leading edge of the start bit of its first byte on the serial line. The end of a command or response is the trailing edge of the stop bit of its last byte on the serial line. The deck must start a response 0 to 9 milliseconds after the end of any command from the controller. The deck cannot predict what commands the controller will send. The controller must be able to pair up any response with the corresponding command. One of the commands, Status Sense, causes the deck to indicate whether the user has overridden the controller by pushing buttons on the deck's front panel, whether the deck is servolocked (rolling forward at a stable, 1x speed), and whether any error conditions (end of tape, mechanical failure, ...) have arisen on the deck. The controller must send a Status Sense command at least every 300ms. The delay between a change of status on the deck and the start of the controller's command in reaction to this change can be at most 300ms. Certain commands can behave field-accurately. To get the field accuracy, the deck must be servolocked, the controller must start field-accurate-capable commands between D1 ms and D2 ms after the start of a field, and the command and its response must end before the start of the next field. D1 and D2 depend on the deck. D2-D1 >= 2ms. Current Time Sense is a field-accurate-capable command. A field-accurate Current Time Sense command causes the deck to respond with the number of the field that is currently playing out its output jack or recording at its input jack. Certain state-changing commands, such as record on/off, are field-accurate-capable. A field-accurate state-changing command will cause the deck to change its state after a fixed, guaranteed number of fields E. E depends on the deck and the command. The controller must have a field-accurate Current Time Sense response in order to compute the correct field in which to send a field-accurate state-changing command. Assuming that the field accuracy conditions are maintained throughout, the delay between the end of a Current Time Sense response and the earliest time at which a controller is ready to issue a state-changing command can be at most 300ms. A note about the two 300ms latency figures above: 300ms is a very, very bad latency. It is the cutoff point after which a customer will declare the controller broken and send it back (hence it being a requirement). Seeing as dedicated 422 controllers deliver latencies more on the order of 50ms, it will be extremely embarrassing if SGI cannot ship a machine that offers a guarantee less than 300ms. We'll examine below how these latency figures translate into IRIX user thread scheduling requirements. Current industry practice adds a few more constraints not found in the Sony spec. Many decks will ignore commands sent during a particular (525-2*243)/525==7.42...% or (625-288*2)/625==7.84% of the video field (depending on the flavor of video). This means the controller must start all commands, not just field-accurate commands, between a deck-specific D1ms and D2ms after the start of field (D2-D1 >= 2ms). Many controllers assume that they can send at least two 3-byte commands and receive at least two 3-byte responses in one video field time. This tightens the constraint on command and response idle time and deck response time. Assuming no idle time at all during commands and responses, this means that the sum of the deck's response time to both commands cannot exceed 12ms. This is calculated based on the shortest field time (16.68333...ms), minus the 7.42...% above, minus two 3-character command times, minus two 3-character response times. We make a simplifying assumption and say the deck must respond to each command within 6ms. VIDEO DECK CONTROL ON SGI: PROBLEMS ----------------------------------- Video deck control is when an SGI machine is the controller. This is the most common configuration in animation houses and smaller video production setups, where the SGI machine is the customer's main control console for all devices. Typically, the SGI's video output connects to the deck's video input, and the SGI's video input connects to the deck's video output. As stated above, the controller needs to send serial commands precisely relative to video field boundaries, and measure when serial responses come relative to those boundaries. Since the controller (the SGI machine) in this case is also doing video I/O, there is one more requirement. When it brings a field from a video wire into memory, the SGI must be able to match up that field with the serial commands and responses that coincided with the video field over the serial wire. When it prepares to output an in-memory video field over a video wire, the SGI must be able to match up that field with the commands that the SGI will transmit at the same time over the serial wire. A solution to deck control must meet all of the above requirements. The simplest solution to deck control would be this: /* see frame-accurate commands above for the definition of D1 and D2 */ while (1) { rightnow = the time right now; t = ask the Video Library what time the next field starts; /* we want to wake up at t+D1 */ /* INVARIANT: the difference between rightnow and the time which * nanosleep will use to determine our user thread's actual * wakeup time is less than: * D2-serial_transmit_latency-length_of_code_path_below-D1. */ nanosleep(t+D1 - rightnow); /* INVARIANT: the current time is >= t+D1 */ if (we sent a command on the last field) { the deck will have sent us a response by now; /* INVARIANT: our user thread can now read all the bytes * of one serial response */ read one complete Sony protocol response from the deck; } if (we're receiving video from the deck) { based on the most recent Sony protocol responses, compute the field number of the video image from the last field; /* INVARIANT: the video image from the last field is available */ grab the video image from the last field; store the image and its field number somewhere; } else /* we're sending video to the deck */ { based on the most recent Sony protocol responses, decide which video image to send out the video port next field; send that image to the Video Library; /* INVARIANT: the Video Library will begin to output that field * once the current one is done */ } compute what command to send out the serial port in this field; send that command out the serial port; /* INVARIANT: the bytes will start coming out the serial port * at the latest in serial_transmit_latency */ /* INVARIANT: the current time is <= t+D2-serial_transmit_latency */ } Unfortunately, none of the invariants above are supported on any SP SGI platform, and some of them are not even supported on MP platforms. For the above code to work, we would need a guarantee that our code executes between time t+D1 and t+D2-serial_transmit_latency. Currently, SGI offers no lower bound on the amount of time a user thread will be running during any interval whatsoever on any SP SGI system, even if it is the highest priority thread. We would need a guarantee that we can run for long enough to execute the code path above in a window of time (of length D2-D1-serial_transmit_latency) which could be as small as 2ms-serial_transmit_latency. The percent of that window during which we need to execute is tiny. The code path above involves little more than poking a couple of hardware registers and accessing maybe 200 cache lines. SGI needs to provide enough systems information (cycle counting utilities, worst-case cache and TLB numbers) so that a developer can compute an upper bound on their code's cycle requirement. However, CPU throughput is not likely to be an issue in practice. The tough issue is likely to be user thread scheduling latency (ie, getting that tiny bit of code to run at all). Say the user thread scheduling problems are solved, as they are on MP systems with REACT/Pro or some of the MP kudzu systems. This is still not enough: - Currently, SGI offers no lower bound on the amount of time between when a user thread sends a byte to the serial port and when that byte will actually go out the serial jack. The code above refers to this as serial_transmit_latency. We would need a guaranteed lower bound on serial_transmit_latency which is less than D2-length_of_code_path_above-D1 Since length_of_code_path_above is a small fraction of D2-D1, this should be just under 2ms in the worst case. - Currently, SGI offers no lower bound on the amount of time between when a byte arrives at the serial jack and when we can read that byte from a user thread. We would need a guaranteed lower bound on this serial receive latency which is less than the worst-case amount of time between the end of the deck's response and the point D1 ms into the next field. The worst-case command and response take 15.01...ms so this is at least 1.66...ms for the shortest field time. Both of these I/O latencies are easily achievable in all current SGI serial hardware designs. However, SGI needs to support some software interface which guarantees these latency bounds. VIDEO DECK CONTROL ON SGI: TSERIALIO SOLUTION (?) ------------------------------------------------- We needed to support deck control on O2. The cleanest, simplest, and cheapest way would have been the method described above. Our pleas to support these guarantees in IRIX 6.3 were unsuccessful. So we were forced to develop tserialio, a hack which relied on the following observations: - Since serial commands have to be sent D1 to D2 ms after the start of a field, and D2-D1 can be as small as 2ms, then this means we need to be able to schedule serial bytes relative to video fields with plus or minus one millisecond accuracy. - Since our serial hardware does not support timestamping or scheduling (nor should it!), some piece of software has to run during those crucial milliseconds to do the serial RX and TX. - The only event on IRIX 6.3 which is guaranteed to occur every millisecond is the nasty kernel profiler tick. It would be unacceptably burdensome to hang deck control code off the profiler tick. It is really necessary to put all the deck control code there? - No. Accuracy is not the same as latency. The deck control code needs to timestamp RX bytes and schedule TX bytes accurately, but its maximum latency---the maximum time it must take to react to an incoming serial signal by producing a corresponding outgoing serial signal---is more like 300ms. As explained above, 300ms is the cutoff latency where the customer declares the machine broken and sends it back. We should really offer a guarantee much less than 300ms, but for this document 300ms is the hard requirement. The accuracy comes from tserialio. It is a very simple driver and user-mode library which gives a user thread a way to schedule serial bytes for transmission out the serial jack in the future, and measure the time at which bytes from the past arrived at the serial jack. The measuring and timestamping is accurate to plus or minus one millisecond relative to the start of video fields. The tserialio driver (a serialio upper layer) hangs off the profiler tick doing serial RX and TX. It accounts for hardware and kernel software I/O latencies, so the user can think in terms of times at the serial jack. An application that does deck control using tserialio looks more like this: while (1) { nanosleep(any amount of time less than 300ms); /* INVARIANT: at most 300ms have transpired since we last emptied out * the tserialio input port. */ for(each response that is waiting from the deck) { get the response and its starting time from tserialio; map that starting time to a particular field; interpret the response in the context of that field; } /* INVARIANT: at most 300ms have transpired since we last enqueued * 300ms worth of commands on the tserialio output port. */ use tserialio to send enough serial commands so that we've got 300ms worth of serial commands buffered up. tell tserialio to send each command at the start of field + D1 ms; if (we're receiving video from the deck) { /* INVARIANT: at most 300ms have transpired since we last emptied out * the Video Library input port. */ for(each video image that is waiting from the deck) { get the video image and its starting time from the Video Library; map that starting time to a particular Sony protocol response; based on the corresponding Sony protocol response, compute the field number of that video image; store the image and its field number somewhere; } } else /* we're sending video to the deck */ { based on the most recent Sony protocol responses, decide which video images to send out the video port next field; /* INVARIANT: at most 300ms have transpired since we last enqueued * 300ms worth of commands on the tserialio output port. */ use the Video Library to send enough images so we have 300ms buffered up; } } As you can see, it is much more complex. Each time around the loop, the code has to schedule up to 300ms of commands in the future, and deal with up to 300ms worth of responses from the past. You may wonder how the application can relate the input and output times of video fields and serial signals. The tserialio library and driver know nothing whatsoever about video. Both tserialio and the Video Library place all incoming and outgoing data on the common UST timeline. UST is a systemwide, unadjusted timebase with microsecond resolution. The application asks the Video Library to tell it the UST of each video field, and then it tells tserialio to send serial bytes at that UST plus D1 ms. The application receives responses from tserialio stamped with UST, and it uses the video field USTs from the Video Library to figure out which field that response came from. This deck control "solution" ships with Adobe Premiere on O2. It works most of the time. But it is still not a solution: we've changed the latency requirement from 2ms to 300ms, but as we said before, SGI neither guarantees nor supports any particular latency! Even before we go and worry about millisecond scheduling of a user thread, it would help the situation if SGI would pick and publish some easily manageable user thread scheduling latency number which we can guarantee and support today on each system, so that developers inside and outside SGI know how far ahead they need to buffer in their deck control applications. That number should be at most 300ms. At the moment, we code in peril and developers are turned away by our failure to guarantee any number in particular. VIDEO DECK EMULATION ON SGI --------------------------- So, you might think that the OS is out of the woods in terms of delivering low latencies. Unfortunately, you have to also consider the other half of the problem: Video deck emulation is when an SGI machine is the deck instead of the controller. This is the most common configuration in video production studios, where the customer controls tens or hundreds of decks (only some of which are actually SGI machines) simultaneously from a large, custom-made physical console. The console also controls video switching networks and video signal processing boxes. The console instructs the decks to play and record, and switches their video signals to each other and to and from the effects units. Now that the SGI machine is the deck, it must respond to commands from the controller with a corresponding response in 6ms. The deck cannot predict what commands the controller will send it. There is simply no way of getting around it: in order to do deck emulation, a user thread must be able to wake up, read a serial command, and transmit a serial response, such that the time from the end of the command on the serial line to the beginning of the response on the serial line does not exceed 6ms. This requires both user thread scheduling latency and I/O latency guarantees. Since the SGI machine is also doing video I/O, it shares the requirements of matching up video fields with serial commands and responses, described above under video deck control. For example, the deck must be able to match a video field up with a Current Time Sense command from the controller field-accurately, so that it can produce a field-accurate response. Since the controller will be sending state-changing commands to the SGI machine (the deck), it must be able to react to these commands on a precise field. The deck gets to choose the latency E in fields which it takes to execute a given state-changing command. This latency can be as high as 10 fields (around 170ms or 200ms, depending on video flavor). So as long as we can guarantee user thread scheduling on a finer granularity than 170ms, then the state-changing problem is an accuracy problem. It could be solved either by user thread scheduling latency and I/O latency guarantees, or some kernel hack to the video driver like tserialio was to the serial driver. Since we need 6ms user thread scheduling guarantees for the serial case above, we should only need an additional video I/O latency guarantee to achieve proper behavior for state-changing commands. SUMMARY: CONTROL ---------------- SGI does not currently offer the guarantees it takes to do correct deck control on any currently shipping platform, but in some cases this would be easy. To support deck control, a platform must follow one of these two recipes: RECIPE A: - support tserialio, and - guarantee that a piece of code in a user thread can run at least once within each 300ms interval. or RECIPE B: - guarantee that a piece of code in a user thread can run once within a 2ms-serial_transmit_latency interval relative to the start of each video field, and - guarantee a serial_transmit_latency of at most 1ms, and - guarantee a serial receive latency of at most 1ms. Many developers wish to run on more than one SGI platform, and for the most part will be unwilling to rewrite their code to do this. These developers would choose whatever recipe was available on most platforms. It would be greatly preferable to make recipe B available on all platforms. It yields less OS code to test and less developer code to write. Recipe A may be easier to port in the short term, though in the long term it will have a higher continuing support cost than recipe B. Users access tserialio through a user-mode shared library, not by talking to a UNIX device node. On systems where we guarantee that a user thread can run once in every: 1ms - serial_receive_latency - serial_transmit_latency interval, we can easily implement tserialio in user-mode only and need no kernel work. Notice that these requirements are stricter than those for recipe B, so such a platform would also be able to support recipe B. Here is what we have now: Indigo, Indy, Indigo2: neither O2: recipe A minus the 300ms scheduling guarantee Octane: neither Challenge/Onyx: recipe B minus both serial latency guarantees Origin/Onyx2: recipe B minus both serial latency guarantees SUMMARY: EMULATION ------------------ SGI does not currently offer the guarantees it takes to do correct deck emulation on any currently shipping platform. To support deck emulation, a platform must follow this recipe: RECIPE - guarantee that a piece of code in a user thread can wake up, read a serial command, and transmit a serial response, such that the time from the end of the command on the serial line to the beginning of the response on the serial line does not exceed 6ms. many mixes of user thread scheduling guarantees and serial I/O latency guarantees could achieve this. This recipe assumes the user thread can use these guarantees to execute state-changing commands from the controller in a timely manner. In reality there may be some extra VL work (video I/O latency work) to make this happen. Here is what we have now: Indigo, Indy, Indigo2: neither scheduling nor I/O latency guarantees O2: neither scheduling nor I/O latency guarantees Octane: neither scheduling nor I/O latency guarantees Challenge/Onyx: scheduling, but no I/O latency guarantees Origin/Onyx2: scheduling, but no I/O latency guarantees