irix-657m-src/stand/arcs/IP19prom/docs/userguide.txt

--------------------------------------------------------------------------
NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE

This document is obsolete!  Please refer to the Frame Maker or PostScript
files in this same directory!

NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE
--------------------------------------------------------------------------


		Everest IP19 PROM release 7 notes
		---------------------------------

Version 1.4

	Please send comments to Steve Whitney, stever@wpd, x1525

-----------------------------------------------------------------------------

Contents:

	* Summary of the Power-on Process

	* The System Controller Debug Switches (contains new debug switches)

	* POD mode (contains two new rev 7 prom commands)

	* Niblet

	* PROM LEDs (NOW IN BINARY! plus some changes)

	* PROM Diagnostic Messages (new messages and their diagnostic codes
				    have been added)

	* PROM System Controller Messages (now contains error messages)

	* Hints

-----------------------------------------------------------------------------

MAJOR CHANGES

	NEW IN REV 6:

	Plugging this prom in and loading an appropriate flash rom and
	kernel will move your serial console from the leftmost DB-9
	connector on the filter board to the rightmost!  Be sure to use
	an up-to-date kernel and IO4 PROM version 0.93 or higher.

	A workaround for a new A3 bug is enabled in this prom.

	Memory configuration handles broken banks now.  If a bank fails
	to respond after it has been configured in, we reconfigure memory
	without it.

	The system controller now displays much more helpful messages for
	error conditions.

	Three new debug switches are available.  (See the System Controller
	section).

	Processor type is stored correctly by the prom.

	NEW IN REV 7:

	DISABLED PROCESSORS NOW FLASH ALL 6 LEDS

	Completely new memory configuration algorithm handles three
	board situations properly and interleaves different SIMM types with the
	same size together

	Fixed the NO_DIAGS won't boot Unix bug.

	Fixed the "fail to detect broken scaches" bug

	Fixed ECC disabling bug

	Removed the A chip rev 3 workaround

	Hard disable bad CPUs unless the NO_DIAGS debug switch is set

	Handle more bytes of debug switch info from the system controller

	Fixed a bug in flash_leds

	Fixed bus tag testing for four megabyte secondary caches and improved
	bus tag failure diagnostic information.

	Fixed a manufacturing mode debug switch bug.

	NEW IN REV 10:

	Rearbitration when the master CPU fails now works!

	There are now basic IO diagnostics.

	Cool new scrolling messages on failures.

-----------------------------------------------------------------------------

			Power-on Summary
			----------------

In this document, I will present some basic user's information about the
IP19 prom as well as some insight as to what's going on under the hood.
We'll start with a brief description of the events that transpire after
you turn on the keyswitch or press reset.

At this point, the only way you can tell what a processor is doing is via the
six LEDs it has on the edge of the processor card.  Each processor "slice"
of the IP19 board has its own independently-controlled LEDs which represent
its status.  At reset, all LEDs are on.  After that, the CPU sets them to
various other values.

			Per Processor Tests
			-------------------

First, each processor goes through some really basic processor initialization-
clearing registers, cache tags, etc.  After that, it begins to run very simple
power-on diagnostics to check out its local ASICs (CC chip, A chip).  If any
of these tests fail in a detectable way (i.e. not a hang), the processor will
flash a pattern on its LEDs.

After testing its ASICs, the processor attempts to access the Everest bus
by sending interrupts to itself.  Even looped-back interrupts go out to the
bus.  Again, failure results in flashing LEDs.

After we have checked out our local ASICs and the interrupt channels, we
have everything necessary to arbitrate for bootmaster.  Only one processor
can initialize the shared system resources such as memory and IO boards so
we elect a "boot master."

Boot master selection is based on a time delay scaled by the slot number
a processor number is in and also by its slice.  It works in such a way that
the active processor in the lowest numbered slice of the lowest numbered
slot becomes master.  For example, if there are processors in board 2 -
slices 1 and 3 and board 4 - slices 0, 1, and 2, the processor in slot 2,
slice 1 becomes boot master.

If the master processor reads the NVRAM hardware inventory information and
discovers that it is supposed to be disabled, it abdicates, allowing the next
eligible processor to become master.  If the master is the last processor
active, it will not abdicate unless it fails diagnostics.

			Boot Master Code
			----------------

Next the boot master broadcasts an interrupt to all of the other processors
telling them that they are slave processors (the slave code is described later).
The master can then go on to configure the system.  The master starts
initiates a protocol with the system controller to let it know which CPU is
master.  The system controller can only communicate with one processor at a
time so we cannot send any messages to it until it knows where the master is.

Now, we test the primary data cache and use it as a stack to run C code.
Without a stack, we can only use the processor's registers for data storage
so nested procedure calls and manipulation of complicated data (e.g. memory
configuration) is very difficult.  A failure here causes boot master
rearbitration.

The first global resource the master must configure is the system console,
one of the serial ports on the IO4 board.  In order to do this, it must
choose a master IO4 board.  The master IO4 is always the IO board in the
highest numbered slot (unless the "use second IO4 switch is set").  We run
the IA/ID chip tests, check this board for a working EPC chip, check the
integrity of the NVRAM (to get the baud rate and enable/disable information),
initialize the console there, and print our header message.  The IP19 prom
is smart enough to re-enable the last CPU if all are disabled or re-enable
all memory banks if fewer than 32 megabytes of RAM are enabled.

Next we initialize the "evconfig" structure.  Evconfig contains information
about what's in each EBus slot, the state of these things, and whether or
not they are enabled.  A user may set an NVRAM variable which will disable
one or more system resources (currently a memory bank or processor).
This can be done from POD or the IO4 prom command monitor via the enable and
disable commands.  The POD versions only change the RAM settings while the
IO4 prom versions actually set the NVRAM.

After locating and configuring the master IO4 board, we're ready to set up
memory.  The prom records the amount of memory in each bank of all available
memory boards in the system and performs a memory configuration algorithm.
It groups together banks of each size and attempts to interleave the
memory to improve performance.  In order to make performance uniform
across all system memory, the prom uses the highest interleave factor
that will allow it to configure all banks uniformly.

There is a poorly tested NVRAM variable called "fastmem" which tells the prom
to try to interleave memory "optimally" rather than uniformly.  This may or
may not improve performance, but if it does, it also makes performance less
stable.  Multiple runs of the same code will be more likely to run at
different speeds.  "Fastmem" can be set from the IO4 prom command monitor.

Before configuring memory, the prom runs the memory board's built-in self-test,
a.k.a. BIST, on all boards.  Banks that fail BIST are not included in the
configuration.  BIST has the side-effect of zeroing memory and storing good ECC
bits in it.  On older revisions of the MA chip (rev. 0), it is necessary to
run the BIST, let it fail, reset the machine, and decline to run BIST.  The
prom only presents the "BIST?" prompt on such machines.

After configuring memory, the PROM runs several memory tests on the
configured RAM.  Upon a memory failure, the PROM reconfigures memory
without the affected banks.  This continues until all tests pass or there's
no memory left to configure out.  If all memory is configured out in
this way, there may be a problem with the master CPU.  Try again with the
first CPU (or board) disabled - see fdisable/fenable commands in POD mode -
you wont' be able to get to the IO4 prom.

Next, we test and initialize the CC chip's "bus tags," and write the
evconfig structure out to memory.  At this point, we move our stack into
uncached memory and continue testing the system.

Now that the stack is no longer in the data cache, we can test the caches.
Current IP19 proms run a single secondary cache test that tests the
scache as a one megabyte RAM.  Future IO4 proms will test the tags
independently and will test write-backs.  Future IO4 proms will also test
the i-cache.

Finally, we're ready to check on the slave processors.  We wait for
them to finish their testing (or for a timeout) and display the results
of the testing.   If the processors have not stored a result value, we
assume that they cannot access memory.  If they hang in a particular test, we
assume that that test failed.

Processors that fail diagnostics are disabled, and we go on to load the
IO4 prom.

			SLAVE CODE
			----------

Until now, we have only discussed the role of the master processor.  The
slave processors enter a loop where they wait for instructions from the
master processor.  The first instruction they are given is a request to
update their entries in evconfig.  The slaves store their processor type,
cache size, speed, etc.  Next they start their power-on diagnostics updating
their diagnostic result value, or "diagval," as they go.  The slaves
currently run the same set of tests as the master: data cache, secondary
cache, and bus tag tests.  Future versions will run the same set of tests
mentioned for the master.

Slaves can also be sent interrupts to launch them into a given piece of
code.  This technique is used by the "niblet" tests (mentioned below) as
well as by the IO4 prom to prepare slaves to run Unix.

			FATAL FAILURES
			--------------

Upon fatal failures such as no enable memory or an IA chip test failure,
the PROM scrolls a descriptive message across the system controller LCD
display and displays a "disgnostic code."  These codes can be used to
diagnose the problem in more depth than the message alone.  Upon IO failures,
the CPU goes into POD mode on the CC UART.  On memory failures, the master
CPU goes into POD mode.  Pressing Enter on the serial console clears the
scrolling message and allow the user to type commands.

The diagnostic codes are listed below.

----------------------------------------------------------------------------

			The Debug Switches
			------------------

The system controller has a mode, accessible only from "manager mode,"
which allows the user to set a number of "virtual dipswitches."
As of the December 17th release of the system controller firmware,
there have are sixteen debug switches instead of eight.  The "lab settings"
are the leftmost eight, and the "software" settings are the rightmost.
The switches are numbered from the right starting from zero as shown:


		f e d c b a 9 8 7 6 5 4 3 2 1 0

The software switches work as follows:

7
The "Manu-mode" switch is used to send all IP19prom console output to
the external UART on the system controller.  This mode will eventually
be used in manufacturing to debug systems which can't reach the IO4 UART.
In current PROMS, this switch also forces the POD mode switch since the
IO4 prom doesn't use the system controller UART.

6
The "No boot master arbitration" switch keeps the system controller from
selecting a processor with which to communicate.  This makes it possible
to communicate with individual processors via the "CC UART" connectors on
the edge of the IP19 boards.

5
The "POD mode" switch forces the IP19 prom to stop initialization just
before it would have loaded the IO4 prom and jump to POD mode instead.
This is useful on a system with a bad IO4 prom or a bad IO board.

4
The "No diagnostics" switch prevents the system from running power-on
diagnostics.  This switch should only be used by software developers
who are constantly bringing systems up and down.  Otherwise, it can
mask failures and cause system damage.

3
In PROM release 6, this becomes the "use defaults" switch.  This switch
will override the console baudrate setting and use 9600 instead.  It may
also override certain other settings.  (Not implemented in PROM revision 4.)

2
In PROM release 6, this becomes the "don't clear memory" switch.  It's
useful for debugging things like the machines that won't take an NMI.

1
In PROM release 6, this becomes the "boot from second IO4" switch.  If you
have a machine with a bad flash ROM in the highest slot, simply move the
console connection, flip this switch, and boot from the next IO4.

0
In PROM release 7, this becomes the debug switch.  So far, it is used only
by the IO4 prom.

-----------------------------------------------------------------------------

			    POD Mode

During bring-up and in the event of unexpected exceptions or diagnostic
failures,  the PROM can drop into a special command interpreter.  This
interface, known as the Power-On Diagnostics mode, or POD mode, provides a
simple interface through which a user can examine and modify the state of
the machine.  The commands provided by POD MODE are listed below.  All
numerical inputs should be entered in hex and need not be prefixed with '0x'.

The POD mode prompt is "POD xx/yy>" where xx is the slot number of the
current processor, and yy is its "slice" on the IP19 board.

Commands with an asterisk (*) are new to release 6 or 7.

        wb ADDRESS VALUE
        wh ADDRESS VALUE
        ww ADDRESS VALUE
        wd ADDRESS VALUE -- Write the value into a byte, half, word,
                            or doubleword at the given address.
                            Currently the values written must be 32-bit or
                            smaller values.
        db ADDRESS
        dh ADDRESS
        dw ADDRESS
        dd ADDRESS       -- Display the contents of the byte, halfword, word,
                            or doubleword at the given address.

			NOTE: The display and write memory commands continue
				to the next address by default.  To quit,
				type a "q" and return instead of just
				return.  Typing a period causes the read
				or write to march on through memory on its
				own.

        wr REG VALUE     -- Write the given value into the register specified.
	dr REG		 -- Display the value in the specified register.

				Register names for read/write include:

				  sp: stack pointer
				  sr: r4000 status register
				  cause: r4000 cause register
				  epc: Exception program counter
				  eepc: error exception program counter
				  config: r4000 config register
				  wh: watchhi register
				  wl: watchlo register

				Registers that can only be displayed:

				  all: all r4000 general purpose registers
					and selected coprocessor 0 registers.
				  rX: where 0 <= X <= 31

				Please note that some of the general purpose
				registers are not saved correctly in the
				current version of the IP19prom.

        dc SLOT REG      -- Displays the value of the specified Everest
                            configuration register.
	wc SLOT REG VAL	 -- Writes the value to the specified Everest config
			    register.
	j ADDRESS	 -- Jumps to the specified address
	j1 ADDRESS PARM	 -- Jumps to the address passing the parameter supplied.
	j2 ADDRESS...	 -- Jumps to the address passing two parameters.
        info             -- Displays the slot and processor number of the
                            processor and prints out a description of the
                            system configuration (as provided by SenseConfig).
        reset            -- Reset the system.
        sload            -- Download Motorola S-record 3 code through the
			    serial port.
	srun		 -- Like sload but it runs too.
        sloop (COMMAND)  -- Performs a 'scope loop of the following single
			    command.  Sloop runs the specified command until a
                            key is pressed.
        loop TIMES (COMMAND)
                         -- Performs a nonzero number of iterations of COMMAND,
			    which can be any legal command line (semicolon
			    separated).
*       mem START END    -- Performs a memory test starting with address START.
			    END is the first address not tested.  Now if
			    you specify an address with the high bit unset,
			    POD ors in 0xa0000000.  No more TLB misses.
	scache ITER DE	 -- Performs ITER iterations of a basic secondary
			    cache test with the r4000's DE bit set to the
			    value provided.
	dmc SLOT	 -- Displays the memory board configuration for
			    the board in the specified slot
	dio SLOT	 -- Displays the IO4 board configuration for
			    the board in the specified slot
*	devc SLOT | all	 -- Display the "evconfig" structure entry for this slot
			    or all slots.  This structure contains what the
			    prom believes to be in that slot and its current
			    status.  Now, it also displays total memory
			    and the current debug switch settings as well
			    as strings explaining "diagvals" and prom
			    revision numbers.
	disable SLOT UNIT
			 -- Disables UNIT of the board in SLOT (see enable).
*	fdisable SLOT UNIT
			 -- Forcibly disable a unit.  For CPUs, this means
			    writing the A chip enable register.  For memory
			    and IO adapters, it means removing the unit from
			    the evconfig structure.
	enable SLOT UNIT -- Enable UNIT of the board in SLOT specified.
			    This command changes the "enable" field of the
			    evconfig structure for the chosen unit.
			    In future prom revisions, this will probably
			    change the value in NVRAM.
*	fenable SLOT UNIT
			 -- Forcibly enable a unit.  For CPUs, this means
			    writing the A chip enable register.  For memory
			    and IO adapters, it means forcing the correct
			    value to be stored in the evconfig structure.
	reconf		 -- Reconfigures memory using the currently enabled
			    banks.
	bist		 -- Runs the memory Built-In Self-Test.
*	ecc		 -- Decode the information in the Cache_error
			    register after a cache error exception
			    (This works again in rev 6.)
	si SLOT CPU LVL  -- Send a level LVL interrupt to the processor in
			    the specified slot and slice.
	td PARM		 -- Displays the specified TLB entry.  "lo" and "hi"
			    display the low and high halves of the TLB
			    to make the output fit on a 24-line terminal.
			    "all" displays the entire TLB.
	clear		 -- Clears memory and CC chip error registers.
			    CC chip errors are printed after each prompt
			    until cleared.
*	decode		 -- Displays the memory slot and bank number a given
			    physical address belongs to.  (Can now accept
			    up to just under 4 gigabyte addresses!)
	walk LO HI CONT	 -- Walks a bit through every word of the address
			    range specified with HI being the first address
			    not tested.  CONT indicates whether to continue
			    after failures (1 = continue, 0 = stop on errors).
	slave		 -- Causes this processor to enter slave mode.
	wx BLOC OFF VAL  -- Write VAL to the address created by adding the
			    value of OFF to the value of BLOC multiplied by 256.
			    This command uses R4000 64-bit addressing to
			    allow uncached access to all of memory.
	dx		 -- Prints the value contained in the address created
			    by combining BLOC and OFF as above.
	io		 -- Loads and executes the IO4 prom in the master IO4
			    board.
	why		 -- Prints a string explaining why we entered POD mode.
	niblet SET	 -- Run the specified set of Niblet tests (see below).
			    The usual test sets are numbered 0-9.
	gm		 -- Go to memory mode.  This moves the stack into
			    cached memory instead of an "isolated."  Niblet
			    requires you to execute this command before
			    running it.  This command changes the prompt to
			    "Mem xx/xx>"
	select SLICE	 -- When the system is in "manu-mode," all processors
			    on the board selected by the system controller
			    receive any input intended for the selected
			    processor.  This is due to a design limitation
			    of the IP19.  This results in four processors
			    executing any command intended for just one.
			    POD handles this by providing the select command.
			    Select allows the user to select a "slice" which
			    will be able to answer commands.  All other CPUs
			    on the board will be temporarily disabled until
			    the next select.  Selecting slice ff disables
			    selection and allows all CPUs to respond to
			    input.
 	?                -- Displays the list of commands.


------------------------------------------------------------------------------

				Niblet
				------

Niblet is a very small, symmetric multiprocessing kernel with separate
virtual address spaces for its processes.  It was originally intended
as a verification tool, but we have found it useful for testing new
boards as well.  Eventually, it will be called automatically from the
IO4 prom, but it is also available from the POD prompt in the IP19 prom.

NOTE:
        Niblet may not run as intended if the various processors on
the system are running different versions of the IP19 prom.  You're
okay if the processors launch successfully.

The various tests available from the "niblet n" command are really
combinations of niblet tests.  That's why Niblet reports "Supertest passed"
and "Supertest FAILED."  A list of the basic Niblet tests and a table
of which tests are contained in each supertest follows.

Basic Niblet Tests:
-------------------

INVALID:
        Invalidates random TLB entries to cause more varied interactions.
COUNTER:
        Just runs until a certain instruction count is reached and passes.
        The count is proportional to the niblet process ID.
MPMON:
        Test monotonicity of Everest reads and writes.
MPINTADD:
        Two processors add values to a common variable, hit a barrier,
        and check the final sum.
MPINTADD_4:
        Four processor version of MPINTADD.
MPSLOCK:
        A software locking protocol test.
MPHLOCK:
        Tests load-linked and store-conditional by grabbing a lock, storing
        a process ID into a protected location, waiting for a delay to
        expire, and checking that the correct process ID is still there.
        Multiple processors try this so a failure should result in a CPU
        reading the wrong PID.
MEMTEST:
        Tests a range of memory by writing a value based on a process ID
        to a range of memory and then checking it.  This version's range
        is small enough to fit in a secondary cache.
BIGMEM:
        Same as above but the set is larger than one megabyte.
PRINTTEST:
        Tests Niblet context-switching.  Runs very quickly.  Mostly a sanity
        test.
BIGINTADD_4:
        Same as MPINTADD_4 but runs for many, many iterations.
BIGROVE:
        A roving producer-consumer test that runs for many, many iterations.
BIGHLOCK:
        Same as MPHLOCK but runs for many, many iterations.

Niblet "Supertests":
	(Only tests 0-9 are useful without a connection to the system controller
	UART.)

niblet 0:
        Runs one copy of the "INVALID" process.  Should always pass almost
        immediately.

niblet 1:
        Runs {INVALID, COUNTER, COUNTER}.  Takes some time.  One process will
        finish in about half the time that the other two take.

niblet 2:
        Runs {MPMON, MPMON}.  Takes disproportionately longer on a
        single processor than on an MP machine.

niblet 3:
        Runs {MPINTADD, INVALID, MPINTADD}.  Takes disproportionately longer
        on a single processor than on an MP machine.

niblet 4:
        Runs {MPSLOCK, MPSLOCK, INVALID}.

niblet 5:
        Runs {MPROVE, MPSLOCK, MPROVE, MPSLOCK, INVALID}.

niblet 6:
        Runs {MPSLOCK, MPMON, INVALID, MPSLOCK, MPMON}.  Takes disproportionate-        ly longer on a single processor than on an MP machine.

niblet 7:
        Runs {MPROVE, MPROVE}.

niblet 8:
        Runs {INVALID, MPMON, MPMON, MPROVE, MPROVE, MPROVE, MPINTADD,
        MPINTADD, MPHLOCK, MPHLOCK} for a total of 10 processes.
niblet 9:
        Runs {MPINTADD_4, MPINTADD_4, MPINTADD_4, MPINTADD_4, INVALID,
        MPROVE, MPROVE, MPROVE, MPHLOCK, MPHLOCK, MPSLOCK, MPSLOCK} for
        a total of 12 processes.

niblet a:
        Runs {MEMTEST, MEMTEST, MEMTEST, MEMTEST, MEMTEST}.  This test is
        designed as an overnight test.  It will take hours to complete.

niblet b:
        Runs {BIGMEM, BIGMEM, BIGMEM, INVALID, INVALID, INVALID} for a total
        of 6 processes.  It too, takes hours to complete the memory tests,
        but the supertest will never complete since there are three
        INVALID processes.  They exit when they are the last process on
        the system.

niblet c:
        Runs {PRINTTEST,  PRINTTEST, PRINTTEST, PRINTTEST,
        PRINTTEST,  PRINTTEST, PRINTTEST, PRINTTEST,
        PRINTTEST,  PRINTTEST, PRINTTEST, PRINTTEST,
        PRINTTEST,  PRINTTEST}  This is really a Niblet sanity test
	(as is niblet 0).

niblet d:
        This is the big MP stress test.  It runs {BIGINTADD_4, BIGINTADD_4,
        BIGINTADD_4, BIGINTADD_4, INVALID, BIGROVE, BIGROVE, BIGROVE,
        BIGHLOCK, BIGHLOCK, BIGMEM, BIGMEM, BIGMEM, INVALID}.  This
        test runs 14 processes for a number of hours.  It's intended as
        an overnight (or other long period of time) MP stress test.

                        Niblet Fundamentals
			-------------------

NOTE: Niblet displays all of its output (with the exception of the final
result) to the CC UART so it's only visible in "manu-mode" or "no boot master
arbitration" mode.


Number of CPUS to include:

        Niblet attempts to run its tests on all processors that were
present when the PROM set up the machine.  That means that if a processor
has been forced into POD mode by pressing control-P, that processor will
be included in Niblet's processor count and niblet will never pass its
first barrier.  The timeout code hasn't been implemented yet so this
results in a hung system.  A processor can be forced back into slave mode
by typing the POD "slave" command.

	Niblet is limited to 15 CPUs at a time.  The user can control which
CPUs run niblet with the enable and disable comamnds.

Scheduling and process migration:

        As long as there are more processes than processors, Niblet
processes will migrate.  This is the reason that there are three copies of
INVALID in "niblet b."  As long as that test is run on fewer than six
processors, tests will migrate eventually.  The timing has to be right, though.
On fewer processors, tests will migrate more often.

        If there are ever more processors than processes, one or more
processors will go into a loop waiting for the supertest to complete.  You
can tell that processors are in this state because they will print,
"No processes left to run - twiddling."

Test completion:

        Since Niblet is intended to run with one UART per processor, it
only prints failure messages to the processor on which a test fails.  The
processor hosting the failing process will print all pertinent information
and then send an interrupt to the other processors.  This means that the
other processors will only say, "Niblet FAILED on an interrupt."  The
real cause of the failure is available on the processor where it happened.
This is particularly important with a Niblet failure due to a nonzero
ERTOIP register since it can only be read by the processor on which the
error occurred.  That processor will print, "ERTOIP is nonzero!
(ERTOIP, CAUSE, EPC)" followed by the values of ERTOIP, CAUSE, and EPC.

        The master processor will always complete with a message of the
form, "Supertest PASSED/FAILED" followed by "Niblet Complete."

        None of the 13 Niblet tests in the IP19 prom should ever print
a "Supertest FAILED." message under normal circumstances.

	NOTE: Running a test in manufacturing mode yields more information
as processors print to their local UARTS.  In "manumode" you can selectively
watch CPUs.

-----------------------------------------------------------------------------

		PROM LEDS and What They Mean
		----------------------------

The values that follow are for the PROM LEDs that I mentioned above.
They are guaranteed to be valid for 19 PROM release 1 (12/15/92), but
they will most likely remain so.

If you see a constant value displayed on the LEDs, convert the binary
into a decimal number and look it up in the following list under PLED_xxx.
Flashing values should appear under FLED_xxx.

There are a couple of additional modes in addition to the constant and
single-flashing-value modes.  What a processor is in the IP19 slave loop,
it cycles between 9 and 6.  The master processor in POD mode cycles between
1 and 2 when it's using the UART on the CC chip.  On the EPC UART, it displays
a constant value.

Slave mode (five vertical slices are shown.  The topmost LED is most
significant):

	0	0	0	0	0
	0	0	0	0	0
	1	0	1	0	1	etc.
	0	1	0	1	0
	0	1	0	1	0
	1	0	1	0	1

		    Time ->

Master mode on CC UART:

	0	0	0	0	0
	0	0	0	0	0
	0	0	0	0	0	etc.
	0	0	0	0	0
	0	1	0	1	0
	1	0	1	0	1

		    Time ->

The following comes straight from a PROM header file so it's somewhat
raw.  Note that the most significant bit of an LED value is the top LED.

#define PLED_CLEARTAGS                  1	(000001)
/* Clearing the primary data cache tags */

#define PLED_CKCCLOCAL                  2	(000010)
/* Testing CC chip local registers */

#define PLED_CCLFAILED_INITUART         3	(000011)
/* Failed the local test but trying to initialize the UART anyway */

#define PLED_CCINIT1                    4	(000100)
/* Initializing the CC chip local registers */

#define PLED_CKCCCONFIG                 5	(000101)
/* Testing the CC chip config registers (requires a usable bus to pass) */
/* NOTE: Hanging in this test usually means that the bus clock has failed.
 *      Check the oscillator.
 */

#define PLED_CCCFAILED_INITUART         6	(000110)
/* Failed the config reg test but trying to initialize the UART anyway */

#define PLED_NOCLOCK_INITUART           7	(000111)
/* CC clock isn't running init uart anyway */

#define PLED_CCINIT2                    8	(001000)
/* Initializing the CC chip config registers */

#define PLED_UARTINIT                   9	(001001)
/* Initializing the CC chip UART */
/* NOTE: Hanging in this test usually means that the UART clock is bad.
 *      Check the connection to the system controller.
 */

#define PLED_CCUARTDONE                 10	(001010)
/* Finished initializing the CC chip UART */

#define PLED_CKACHIP                    11	(001011)
/* Testing the A chip registers */

#define PLED_AINIT                      12	(001100)
/* Initializing the A chip */

#define PLED_CKEBUS1                    13	(001101)
/* Checking the EBus with interrupts. */

#define PLED_SCINIT                     14	(001110)
/* Initializing the system controller */

#define PLED_BMARB                      15	(001111)
/* Arbitrating for a bootmaster */

#define PLED_BMASTER                    16	(010000)
/* This processor is the bootmaster */

#define PLED_CKEBUS2                    17	(010001)
/* In second EBus test.  Run only by the master */

#define PLED_POD                        18	(010010)
/* Setting up this CPU slice for POD mode */

#define PLED_PODLOOP                    19	(010011)
/* Entering POD loop */

#define PLED_CKPDCACHE1                 20	(010100)
/* Checking the primary data cache */

#define PLED_MAKESTACK                  21	(010101)
/* Creating a stack in the primary data cache */

#define PLED_MAIN                       22	(010110)
/* Jumping into C code - calling main() */

#define PLED_CKIAID                     23	(010111)
/* Check IA and ID chips on master IO4 */

#define PLED_CKEPC                      24	(011000)
/* Check EPC chip on master IO4 */

#define PLED_IO4INIT                    25	(011001)
/* Initializing the IO4 prom */

#define PLED_NVRAM                      26	(011010)
/* Getting NVRAM variables */

#define PLED_FINDCONS                   27	(011011)
/* Checking the path to the EPC chip which will contain the console UART */

#define PLED_CKCONS                     28	(011100)
/* Testing the console UART */

#define PLED_CONSINIT                   29	(011101)
/* Setting up the console UART */

#define PLED_CONFIGCPUS                 30	(011110)
/* Configuring out CPUs that are disabled */

#define PLED_CKRAWMEM                   31	(011111)
/* Checking out raw memory (running BIST) */

#define PLED_CONFIGMEM                  32	(100000)
/* Configuring memory */

#define PLED_CKMEM                      33	(100001)
/* Checking configured memory */

#define PLED_LOADPROM                   34	(100010)
/* Loading IO4 prom */

#define PLED_CKSCACHE1                  35	(100011)
/* First pass of secondary cache testing - test the scache like a RAM */

#define PLED_CKPICACHE                  36	(100100)
/* Check the primary instruction cache */

#define PLED_CKPDCACHE2                 37	(100101)
/* check the primary data cache writeback mechanism */

#define PLED_CKSCACHE2                  38	(100110)
/* check the secondary cache writeback mechanism */

#define PLED_CKBT                       39	(100111)
/* Check the bus tags */

#define PLED_BTINIT                     40	(101000)
/* Clear the bus tags */

#define PLED_CKPROM                     41	(101001)
/* Checksum the IO prom */

#define PLED_INSLAVE                    42	(101010)
/* This CPU is entering slave mode */

#define PLED_PROMJUMP                   43	(101011)
/* Jumping to the IO prom */

#define PLED_SLAVEJUMP                  44	(101100)
/* A slave is jumping to the IO4 PROM slave code */

#define PLED_NMIJUMP			45	(101101)
/* This CPU has jumped into the kernel's NMI handling code. */

/*
 * Failure mode LED values.  If the Power-On Diagnostics
 * find an unrecoverable problem with the hardware,
 * they will call the flash leds routine with one of
 * the following values as an argument.  There's one PLED LED
 * setting hiding down here because of an error made earlier.
 */

#define FLED_CANTSEEMEM                 46	(101110)
/* Flashed by slave processors if they take an exception while trying to
 * write their evconfig entries.  Often means the processor's getting D-chip
 * parity errors.
 */

#define FLED_NOUARTCLK                  47	(101111)
/* The CC UART clock is not running.  No system controller access is possible.
 */

#define FLED_IMPOSSIBLE1                48	(110000)
/* We fell through one of the supposedly unreturning subroutines.
 * Really shouldn't be possible.
 */

#define FLED_DEADCOP1                   49	(110001)
/* Coprocessor 1 is dead - not seeing this doesn't mean it works. */

#define FLED_CCCLOCK                    50	(110010)
/* The CC clock isn't running */

#define FLED_CCLOCAL                    51	(110011)
/* Failed CC local register tests */

#define FLED_CCCONFIG                   52	(110100)
/* Failed CC config register tests */

#define FLED_ACHIP                      53	(110101)
/* Failed A Chip register tests */

#define FLED_BROKEWB                    54	(110110)
/* By the time this CPU had arrived at the bootmaster arbitration barrier,
 * the rendezvous time had passed.  This implies that a CPU is running too
 * slowly, the ratio of bus clock to CPU clock rate is too high, or a bit
 * in the CC clock is stuck on.
 */

#define FLED_BADDCACHE                  55	(110111)
/* This CPU's primary data cache test failed */

#define FLED_BADIO4                     56	(111000)
/* The IO4 board is bad - can't get to the console. */

/* Exception failure mode values */
#define FLED_UTLBMISS                   57	(111001)
/* Took a TLB Refill exception */

#define FLED_XTLBMISS                   58	(111010)
/* Took an extended TLB Refill exception */

#define PLED_WRCONFIG                   59	(111011)
/* Writing evconfig structure:
 *      The master CPU writes the whole array
 *      The slaves only write their own entries.
 */

#define FLED_GENERAL                    60	(111100)
/* Took a general exception */

#define FLED_NOTIMPL                    61	(111101)
/* Took an unimplemented exception */

#define FLED_ECC                        62	(111110)
/* Took a cache error exception */

#define FLED_DISABLED			63	(111111)
/* Disabled processors will flash all of their LEDs */

-----------------------------------------------------------------------------

		PROM Diagnostic Messages
		------------------------

These messages can be printed as a result of prom diagnostics and as
reasons for entering POD mode.  The numbers on the left are "diagnostic"
codes which are displayed on the LCD panel.


CODE  MEANING

Success:
000    Device passed diagnostics.

Cache tests:
001    Failed dcache1 data test.
002    Failed dcache1 addr test.
003    Failed scache1 data test.
004    Failed scache1 addr test.
005    Failed icache data test.
006    Failed icache addr test.
007    Dcache test hung.
008    Scache test hung.
009    Icache test hung.

Memory tests:
040    Memory built-in self-test failed.
041    No working memory was found.
042    Memory address line test failed.
043    Memory data line test failed.
044    Bank failed configured memory test.
045    Slave hung writing to memory.
046    Bank disabled due to downrev MA chip.
047    A bus error occurred during MC3 config.
048    A bus error occurred during MC3 testing.
049    PROM attempted to disable the same bank twice.
050    Not enough memory to load the IO4 PROM.
051    No memory boards were recognized.
052    Bank forcibly re-enabled by the PROM.

Ebus tests:
060    CPU doesn't get interrupts from CC.
061    Group interrupt test failed.
062    Lost a loopback interrupt.
063    Bit in HPIL register stuck.

IO4 tests:
070    No working IO4 is present.
071    Bad checksum on IO4 PROM.
072    Bad entry point in IO4 PROM.
073    IO4 PROM claims to be too long.
074    Bad entry point in IO4 PROM.
075    Bad magic number in IO4 PROM.
078    Bus error while downloading IO4 PROM.
079    No EPC chip found on master IO4.
080    Bus error while configuring IO4.
081    Bus error during IA register test.
082    Bus error during IA PIO test.
083    IA chip register test failed.
084    Wrong error reported for bad PIO.
085    IA error didn't generate interrupt.
086    IA error generated wrong interrupt.
087    EPC register test failed.
088    Bus error on map RAM rd/wr test.
089    Bus error on map RAM address test.
090    Bus error on map RAM walking 1 test.
091    Bus error during map RAM testing.
092    Map RAM read/write test failed.
093    Map RAM address test failed.
094    Map RAM walking 1 test failed.
095    EPC UART loopback test failed.

IP19 tests:
120    CPU can't access memory
123    CC bus tag data test failed.
124    CC bus tag addr test failed.
125    CPU forcibly re-enabled by the PROM.

Miscellaneous:
240    CPU writing configuration info.
246    CPU testing dcache.
247    CPU testing icache.
248    CPU testing scache.
249    CPU initializing caches.
250    CPU returning from master's code.
251    Unexpected exception.
252    A nonmaskable interrupt occurred.
253    POD mode switch set or POD key pressed.
253    Unspecified diagnostic failure.
254    Diagnostic value unset.
255    Device not present.

The following messages appear on the system controller display when
diagnostics fail or as status:

CODE  System Controller Short Message

003   SCACHE FAILED!
004   SCACHE FAILED!
001   DCACHE FAILED!
002   DCACHE FAILED!
005   ICACHE FAILED!
006   ICACHE FAILED!
040   MC3 CONFIG FAILED!
041   NO GOOD MEMORY FOUND
042   MC3 CONFIG FAILED!
043   MC3 CONFIG FAILED!
044   MC3 READBACK ERROR!
047   MC3 CONFIG FAILED!
048   MC3 CONFIG FAILED!
049   MC3 CONFIG FAILED!
050   INSUFFICIENT MEMORY!
051   NO MEM BOARDS FOUND!
070   NO IO BOARDS FOUND!
071   IO4PROM FAILED!
072   IO4PROM FAILED!
073   IO4PROM FAILED!
074   IO4PROM FAILED!
075   IO4PROM FAILED!
078   IO4PROM FAILED!
079   NO EPC CHIP FOUND!
080   IO4 CONFIG FAILED!
081   MASTER IO4 FAILED!
082   MASTER IO4 FAILED!
083   MASTER IO4 FAILED!
084   MASTER IO4 FAILED!
085   MASTER IO4 FAILED!
086   MASTER IO4 FAILED!
088   MASTER IO4 FAILED!
089   MASTER IO4 FAILED!
090   MASTER IO4 FAILED!
091   MASTER IO4 FAILED!
092   MASTER IO4 FAILED!
093   MASTER IO4 FAILED!
094   MASTER IO4 FAILED!
087   EPC CHIP FAILED!
095   EPC UART FAILED!
123   BUS TAGS FAILED!
123   BUS TAGS FAILED!
124   BUS TAGS FAILED!
250   Reentering POD mode
251   PROM EXCEPTION!
252   PROM NMI HANDLER
253   CPU in POD mode.

These are the long, scrolling messages:

CODE  System Controller Long Message

040   Memory board configuration has failed.  Cannot load IO PROM.
041   All memory banks had to be disabled due to test failures.
042   The address line self-test failed.  Cannot continue.
043   Memory board configuration has failed.  Cannot load IO PROM.
044   Memory board configuration has failed.  Cannot load IO PROM.
047   Memory board configuration has failed.  Cannot load IO PROM.
048   Memory board configuration has failed.  Cannot load IO PROM.
049   The PROM was unable to disable failing memory banks.
050   You must have at least 32 megabytes of working memory to load the IO PROM
051   The IP19 PROM did not recognize any memory boards in the system.
070   The IP19 PROM did not recognize any IO4 boards in the system.
071   Diagnostics detected a problem with your IO4 PROM.
072   Diagnostics detected a problem with your IO4 PROM.
073   Diagnostics detected a problem with your IO4 PROM.
074   Diagnostics detected a problem with your IO4 PROM.
075   Diagnostics detected a problem with your IO4 PROM.
078   An exception occurred while downloading the IO4 PROM to memory.
079   There must be an EPC chip on the IO board in the highest-numbered slot.
080   An exception occurred while configuring an IO board.
081   The IA chip on the master IO4 board has failed diagnostics.
082   The IA chip on the master IO4 board has failed diagnostics.
083   The IA chip on the master IO4 board has failed diagnostics.
084   The IA chip on the master IO4 board has failed diagnostics.
085   The IA chip on the master IO4 board has failed diagnostics.
086   The IA chip on the master IO4 board has failed diagnostics.
088   The IA chip on the master IO4 board has failed diagnostics.
089   The IA chip on the master IO4 board has failed diagnostics.
090   The IA chip on the master IO4 board has failed diagnostics.
091   The IA chip on the master IO4 board has failed diagnostics.
092   The IA chip on the master IO4 board has failed diagnostics.
093   The IA chip on the master IO4 board has failed diagnostics.
094   The IA chip on the master IO4 board has failed diagnostics.
087   The EPC chip on the master IO4 board has failed diagnostics.
251   The PROM code took an unexpected exception.
252   The PROM received a nonmaskable interrupt.

-----------------------------------------------------------------------------

IP19 PROM SYSTEM CONTROLLER STANDARD MESSAGES:

Starting System...
  Displayed once bootmaster arbitration has completed.  Indicates that the
  master processor has started up correctly and is capable of communicating
  with the system controller.

EBUS diags 2..
  Displayed immediately before we run the secondary EBUS diagnostics.  The
  secondary EBUS diagnostics stress the interrupt logic and the EBUS.

PD Cache test..
  Displayed immediately before we run the primary data cache test.

Building stack..
  Displayed before we attempt to set up the cache as the stack.  If this
  is the last message displayed, there is probably something wrong with
  the master processor.

Jumping to MAIN
  Displayed before we switch into the C main subroutine.

Initing Config Info
  Displayed before we attempt to do initial hardware probing and set up
  the everest configuration information data structure.  In this phase,
  we simply read out the SYSCONFIG register and set the evconfig fields
  to rational default values.

Setting timeouts..
  Displayed before we attempt to write to the various board timeout registers.
  Everest requires that all of the boards be initialized with consistent
  timeout values, and that these timeout values be written before we actually
  do reads or writes to the boards (we're safe so far because we have only
  touched configuration registers; this will change when we start talking to
  IO4 devices).

Initing master IO4..
  Displayed before we attempt to do basic initialization for all of the
  IO4's in the system.  Basic initialization consists of writing the
  large and small window registers, setting the endianness, setting up
  error interrupts, clearing the IBUS and EBUS error registers, and
  examining the IO adapters.

Initing EPC...
  Displayed before we do the first writes to the master EPC.  This routine
  clears the EPC error registers and takes all EPC devices out of reset.

Initing EPC UART
  Displayed when we first enter the UART configuration code.

Initing UART Chan B
  Displayed before we begin initializing UART chan B's control registers.

Initing UART Chan A
  Displayed before we begin initializing UART chan A's control registers.

Reading inventory..
  Displayed before we attempt to read the system inventory out the IO4
  NVRAM.  If the inventory is invalid or we can't read it for some reason,
  we initialize the inventory fields with appropriate default values.

Running BIST..
  Displayed before we run the memory hardware's built-in self test.

Configuring memory..
  Displayed before we actually configure the banks into a legitimate
  state.

Testing memory..
  Printed before we start executing the memory post-configuration tests.
  These tests simply check that memory was configured correctly.

Testing Bus Tags..
  Checks and initializes the CC bus tags, which are used by the CC chip
  to determine whether it should pass a coherency transaction on to a
  particular processor.

Writing CFGINFO..
  Displayed before we try writing the everest configuration information
  into main memory.

Initing MPCONF blk..
  Displayed before we initialize the everest MP configuration blocks
  for all of the processors.

Testing S Cache...
  Displayed before we begin testing the secondary cache on all of the
  processors.

S Cache passed.
  Secondary cache test passed.

Checking slaves...
  Displayed when we check each slave processor to determine whether it
  is alive and whether it passed its diagnostics.

Loading IO4 PROM..
  Displayed when we download the IO4 PROM from the IO4 flash proms into
  main memory.

-----------------------------------------------------------------------------

MISCELLANEOUS HINTS:

	If a CPU hangs flashing its LEDs, it will still accept a
control-p (^p) character from its CC UART and go into POD mode.  To do
this, you must either connect to it through the system controller or
directly, via the four pin IPI9 connector (with "no boot master arbitration"
switched on in the system controller).

	Processors in the "slave" loop displaying a repeating pattern of
four LEDs with two on at a time can also be interrupted with a control-p
on their CC UART.  They will then attempt to enter POD mode.  Of course,
they may be too broken to do this, in which case, you'll see a different
failure LED value.

	The System controller displays the state of the various processors
on its display.  The characters associated with the processors are as
follows:

		'B' = processor is bootmaster.
		'+' = processor is operational.
		' ' = processor is not present or seriously broken.
		'X' = processor fails diagnostics.
		'D' = processor is disabled in NVRAM.

	There are some new addresses you can jump to in the IP19 prom to
get certain otherwise difficult effects:

	0xbfc00008:	Restart the PROM.
	0xbfc00010:	Go back to IP19 PROM slave mode.
	0xbfc00018:	Go into POD mode using the CC UART for I/O.
	0xbfc00020:	Go into POD mode using the IO4 UART for input.
	0xbfc00028:	Flash all LEDs and loop endlessly.

	The IO4 prom now has a POD command to get you to POD mode.  It's
no longer necessary to type "goto 0xbfc00020."

EAROM VARIABLES:

	The IP19 prom looks in various locations in EAROM to find
system configuration parameters.  Many of these also affect Unix.
Here are their names and addresses:

#define EV_EBUSRATE0_LOC        0xb9000100      /* EBUS freq (Hz) LSB */
#define EV_EBUSRATE1_LOC        0xb9000108      /* EBUS freq (Hz) byte 1 */
#define EV_EBUSRATE2_LOC        0xb9000110      /* EBUS freq (Hz) byte 2 */
#define EV_EBUSRATE3_LOC        0xb9000118      /* EBUS freq (Hz) MSB */
#define EV_PGBRDEN_LOC          0xb9000120      /* Piggyback Rd Enbl bit */
#define EV_CACHE_SZ_LOC         0xb9000128      /* Size of secondary cache
						 * 0x14 == 1M
						 * 0x16 == 4M
						 */
#define EV_IW_TRIG_LOC          0xb9000130      /* IW_TRIG value */
#define EV_RR_TRIG_LOC          0xb9000138      /* RR_TRIG value */
#define EV_EPROCRATE0_LOC       0xb9000140      /* CPU freqency (Hz) LSB */
#define EV_EPROCRATE1_LOC       0xb9000148      /* CPU freqency (Hz) byte 1 */
#define EV_EPROCRATE2_LOC       0xb9000150      /* CPU freqency (Hz) byte 2 */
#define EV_EPROCRATE3_LOC       0xb9000158      /* CPU freqency (Hz) MSB */
#define EV_RTCFREQ0_LOC         0xb9000160      /* RTC frequency (Hz) LSB */
#define EV_RTCFREQ1_LOC         0xb9000168      /* RTC frequency (Hz) byte 2 */
#define EV_RTCFREQ2_LOC         0xb9000170      /* RTC frequency (Hz) byte 3 */
#define EV_RTCFREQ3_LOC         0xb9000178      /* RTC frequency (Hz) MSB */
#define EV_WCOUNT0_LOC          0xb9000180      /* EAROM Write count LSB */
#define EV_WCOUNT1_LOC          0xb9000188      /* EAROM Write count MSB */
#define EV_ECCENB_LOC           0xb9000190      /* CC chip ECC enable flag */

------------------------------------------------------------------------------