	Proposed TFP Diagnostic Error Message Format
        ================================================

If a diagnostic fails, a message of the following format should be
displayed:

ERROR abbccc: <Diagnostic name> FAILED, (slot xx, adap yy, dev zz), ssssss
+      <Human readable subdiagnostic message>
+      HARDWARE STATE:
+          <Pertinent hardware registers>
+      Possible additional infomation
+

The first two lines will always be printed, but the rest of the message
depends on the verbosity level set at either the PROM or IDE level.

The "HARDWARE STATE:" part is not mandatory.  We may want to give a longer
descriptive message such as "Wrote 0xaaaaaaaa to 0xa0100000, read back
0xaaaaaaba - Failure in SIMM 23", but everything below the first two
lines is test specific.  Functions to print out error register state will
be provided as described in Bhanu's prototypical IDE test.

The code for diagnostic hints is abbccc, a 24 bit hexadecimal number, where 'a'
is the type of board being tested.  'bb' is the "major component diag,"
which is basically the part of the board being tested (e.g. Secondary
cache, SCSI, VME).  Finally, ccc is a number specifying the failure mode
of the component.  A cache test might use different ccc values for stuck
bits, stuck address lines, writeback logic failures, etc.  4096 values
should be enough for just about any component.  That is,

	abbccc
        ^ ^  ^
        | |  |
        | |  +------ ccc = A number unique to this failure mode, ie, a
	| |                given test may have several of these numbers
        | |                for different points of failure.
        | |                There can be 4096 per subsystem.
        | +---------- bb = The subsystem being tested.  For each board type, 
        |                  there can be 256 different subsystems.  Examples
        |                  include the secondary cache subsystem of the cpu
        |                  board, and the EPC chip on an IO board.
        +------------- a = The kind of board (e.g. IP21, IO4, MC3).  Sixteen
			   are possible for future expansion.

The advantage of these hints is that the test designer, manufacturing 
personnel and SEs can associate additional information with each type
of failure to facilitate diagnosis (with less board swapping).  It's important
to refer to these as diagnostic hints rather than diagnostic numbers for those
customers unfamiliar with numeric bases other than ten.

The physical location information printed would be based on the first three
digits of the diag hint.  For example, a memory test might print the
slot, bank, and SIMM number whereas a VME test might print the slot, adapter,
and VME slot number.

The first two lines will be automatically generated by a function called
error_msg() which prints out the message and creates an entry in the
diagnostic status log.  The "hardware state" is printed by the test function
itself along with any other information the test designer deems relevant.
This printing should go through a function such as error_log that can either
print and log or merely log  it depending on the current verbosity level.
It's also possible for a test to reset the verbosity level after the first
fail when in continuous operation.  Diag_header might be set up to pause
after errors if an NVRAM variable is set.  This would probably be better
done by some other function, but that wouldn't be as universal.

The diag hint values should be in a single header file, but could be separated
into POD, IDE, etc.  The diag hints must be consistent, and must not be reused.

make_log(log_t *log_buf, uint log_size, catalog_t *log_catalog)
       Initialize a log capable of holding log_size charactters.
       The log_space parameter points to a buffer lareg enough to hold the
       number of characters  specified.  The msg_catalog parameter
       specifies the message catalog which should be used when looking up
       the strings for error hints.

void clear_log(log_t*)
       Erases the log.

void display_log(log_t*)
       Prints a list containing all of the diagnostic messages which
       have been stored in the log.  This includes the parts that were
       not printed due to the verbosity level.

void log_printf(log_t*, char *, /* args */)
       Prints and logs the format and arguments provided (like printf)
       depending on the verbosity level.

void error_msg(log_t* log, uint hint, void *test_spec, ulong *sn)
       Prints and logs the header described above.  The test specific
       structure will most often be an array of integers specifying the
       physical location of the device reporting the error, but it could
       be expanded to contain other information.  The formatting of this
       information in the message is determined by the board and subsystem
       fields of the hint.

Example
-------

A sample memory test might use code something like this:

	int physloc[3];

/* Initialization */

	physloc[0] = slot;
	physloc[1] = bank;

/* In the middle of the test */

	if (read_val != expected_val) {
		physloc[2] = calc_simm(address, slot, bank);
		error_msg(global_log, MY_HINT, physloc, sequence_num);
		dump_mem_registers(global_log);
		return 1;
	}

	
Resulting in something like this output in verbose mode:

ERROR 3061A1: sample_test, (slot 5, bank 2, simm 4), 15326
+	Memory address test read back bad data
+	HARDWARE STATE:
+		MC3 slot 5:
+			Ebus error: 4, Errhi: 0, Errlo: 2000

or this in terse mode:

ERROR 3061A1: sample_test, (slot 5, bank 2, simm 4), 15326
+       Memory address test read back bad data

Note that that all error lines either begin in "ERROR" or in "+" so one
can readily grep through output for error messages.

The actual function name ("sample_test" here) and descriptive message
would be associated with the hint code in the catalog.  The format
for printing out the physical location would be in a table based on the
subsystem  (the hint without the last three digits).

The actual HARDWARE STATE message format would be determined by the
library routines.  Some of this information is available in the prototypical
IDE test.  I am not attempting to specify it here.

NVRAM support:
-------------

       verbose --   Boolean flag indicating whether to go beyond the initial
		    two lines of diagnostic message.
       diaglevel -- Controls whether we do minimal or maximal diags.
       nonstop   -- Boolean flag indicating whether we should stop on
                    memory bank/cpu errors which can be configured out.
       diagprompt-- Boolean flag indicating whether we should prompt
                    the user to press return after a diag fails.

Virtual DIP switch requirements:
-------------------------------

       auto-POD  -- If set, drops us into POD asap.
       no-diags  -- If set, diags aren't run (we want to conceal the
                       existence of this one from our users if possible).
       no-clear  -- Don't clear memory.

For the terminally curious: 
---------------------------

Somtehing like these structure definitions would be used:

/* The table of hint numbers mapped to messages contains entries of this form */
typedef struct _hint {
	uint		hint_num;
	char		hint_funcname;
	char		hint_msg;
} hint_t;

/* The table of subsystem names mapped to physical location printing functions
 * looks like this:
 */
typedef struct _catfunc {
	uint		subsystem;		/* hint_num & 0xfff000 */
	void		(*printfn)(void *)	/* pointer to print function */
}

/* Each catalog itself looks like this */
typedef struct _catalog {
	catfunc_t	*cat_fntable;	/* Pointer to a table of printing
					   functions */
	hint_t 		*hints;		/* Pointer to the table of hints */
} catalog_t;

/* This is the structure of an error log */
typedef struct _log {
       uint		log_size;		/* Size of entries array */
       uint		log_curptr;		/* The next line to use */
       catalog_t	*log_catalog;		/* Message catalog */
       uint		log_flags;		/* Additional information */
       char		entries[1];		/* Array imm. following */
} log_t;


