1
0
Files
irix-657m-src/stand/arcs/IP19prom/docs/userguide.txt
2022-09-29 17:59:04 +03:00

1287 lines
49 KiB
Plaintext

--------------------------------------------------------------------------
NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE
This document is obsolete! Please refer to the Frame Maker or PostScript
files in this same directory!
NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE
--------------------------------------------------------------------------
Everest IP19 PROM release 7 notes
---------------------------------
Version 1.4
Please send comments to Steve Whitney, stever@wpd, x1525
-----------------------------------------------------------------------------
Contents:
* Summary of the Power-on Process
* The System Controller Debug Switches (contains new debug switches)
* POD mode (contains two new rev 7 prom commands)
* Niblet
* PROM LEDs (NOW IN BINARY! plus some changes)
* PROM Diagnostic Messages (new messages and their diagnostic codes
have been added)
* PROM System Controller Messages (now contains error messages)
* Hints
-----------------------------------------------------------------------------
MAJOR CHANGES
NEW IN REV 6:
Plugging this prom in and loading an appropriate flash rom and
kernel will move your serial console from the leftmost DB-9
connector on the filter board to the rightmost! Be sure to use
an up-to-date kernel and IO4 PROM version 0.93 or higher.
A workaround for a new A3 bug is enabled in this prom.
Memory configuration handles broken banks now. If a bank fails
to respond after it has been configured in, we reconfigure memory
without it.
The system controller now displays much more helpful messages for
error conditions.
Three new debug switches are available. (See the System Controller
section).
Processor type is stored correctly by the prom.
NEW IN REV 7:
DISABLED PROCESSORS NOW FLASH ALL 6 LEDS
Completely new memory configuration algorithm handles three
board situations properly and interleaves different SIMM types with the
same size together
Fixed the NO_DIAGS won't boot Unix bug.
Fixed the "fail to detect broken scaches" bug
Fixed ECC disabling bug
Removed the A chip rev 3 workaround
Hard disable bad CPUs unless the NO_DIAGS debug switch is set
Handle more bytes of debug switch info from the system controller
Fixed a bug in flash_leds
Fixed bus tag testing for four megabyte secondary caches and improved
bus tag failure diagnostic information.
Fixed a manufacturing mode debug switch bug.
NEW IN REV 10:
Rearbitration when the master CPU fails now works!
There are now basic IO diagnostics.
Cool new scrolling messages on failures.
-----------------------------------------------------------------------------
Power-on Summary
----------------
In this document, I will present some basic user's information about the
IP19 prom as well as some insight as to what's going on under the hood.
We'll start with a brief description of the events that transpire after
you turn on the keyswitch or press reset.
At this point, the only way you can tell what a processor is doing is via the
six LEDs it has on the edge of the processor card. Each processor "slice"
of the IP19 board has its own independently-controlled LEDs which represent
its status. At reset, all LEDs are on. After that, the CPU sets them to
various other values.
Per Processor Tests
-------------------
First, each processor goes through some really basic processor initialization-
clearing registers, cache tags, etc. After that, it begins to run very simple
power-on diagnostics to check out its local ASICs (CC chip, A chip). If any
of these tests fail in a detectable way (i.e. not a hang), the processor will
flash a pattern on its LEDs.
After testing its ASICs, the processor attempts to access the Everest bus
by sending interrupts to itself. Even looped-back interrupts go out to the
bus. Again, failure results in flashing LEDs.
After we have checked out our local ASICs and the interrupt channels, we
have everything necessary to arbitrate for bootmaster. Only one processor
can initialize the shared system resources such as memory and IO boards so
we elect a "boot master."
Boot master selection is based on a time delay scaled by the slot number
a processor number is in and also by its slice. It works in such a way that
the active processor in the lowest numbered slice of the lowest numbered
slot becomes master. For example, if there are processors in board 2 -
slices 1 and 3 and board 4 - slices 0, 1, and 2, the processor in slot 2,
slice 1 becomes boot master.
If the master processor reads the NVRAM hardware inventory information and
discovers that it is supposed to be disabled, it abdicates, allowing the next
eligible processor to become master. If the master is the last processor
active, it will not abdicate unless it fails diagnostics.
Boot Master Code
----------------
Next the boot master broadcasts an interrupt to all of the other processors
telling them that they are slave processors (the slave code is described later).
The master can then go on to configure the system. The master starts
initiates a protocol with the system controller to let it know which CPU is
master. The system controller can only communicate with one processor at a
time so we cannot send any messages to it until it knows where the master is.
Now, we test the primary data cache and use it as a stack to run C code.
Without a stack, we can only use the processor's registers for data storage
so nested procedure calls and manipulation of complicated data (e.g. memory
configuration) is very difficult. A failure here causes boot master
rearbitration.
The first global resource the master must configure is the system console,
one of the serial ports on the IO4 board. In order to do this, it must
choose a master IO4 board. The master IO4 is always the IO board in the
highest numbered slot (unless the "use second IO4 switch is set"). We run
the IA/ID chip tests, check this board for a working EPC chip, check the
integrity of the NVRAM (to get the baud rate and enable/disable information),
initialize the console there, and print our header message. The IP19 prom
is smart enough to re-enable the last CPU if all are disabled or re-enable
all memory banks if fewer than 32 megabytes of RAM are enabled.
Next we initialize the "evconfig" structure. Evconfig contains information
about what's in each EBus slot, the state of these things, and whether or
not they are enabled. A user may set an NVRAM variable which will disable
one or more system resources (currently a memory bank or processor).
This can be done from POD or the IO4 prom command monitor via the enable and
disable commands. The POD versions only change the RAM settings while the
IO4 prom versions actually set the NVRAM.
After locating and configuring the master IO4 board, we're ready to set up
memory. The prom records the amount of memory in each bank of all available
memory boards in the system and performs a memory configuration algorithm.
It groups together banks of each size and attempts to interleave the
memory to improve performance. In order to make performance uniform
across all system memory, the prom uses the highest interleave factor
that will allow it to configure all banks uniformly.
There is a poorly tested NVRAM variable called "fastmem" which tells the prom
to try to interleave memory "optimally" rather than uniformly. This may or
may not improve performance, but if it does, it also makes performance less
stable. Multiple runs of the same code will be more likely to run at
different speeds. "Fastmem" can be set from the IO4 prom command monitor.
Before configuring memory, the prom runs the memory board's built-in self-test,
a.k.a. BIST, on all boards. Banks that fail BIST are not included in the
configuration. BIST has the side-effect of zeroing memory and storing good ECC
bits in it. On older revisions of the MA chip (rev. 0), it is necessary to
run the BIST, let it fail, reset the machine, and decline to run BIST. The
prom only presents the "BIST?" prompt on such machines.
After configuring memory, the PROM runs several memory tests on the
configured RAM. Upon a memory failure, the PROM reconfigures memory
without the affected banks. This continues until all tests pass or there's
no memory left to configure out. If all memory is configured out in
this way, there may be a problem with the master CPU. Try again with the
first CPU (or board) disabled - see fdisable/fenable commands in POD mode -
you wont' be able to get to the IO4 prom.
Next, we test and initialize the CC chip's "bus tags," and write the
evconfig structure out to memory. At this point, we move our stack into
uncached memory and continue testing the system.
Now that the stack is no longer in the data cache, we can test the caches.
Current IP19 proms run a single secondary cache test that tests the
scache as a one megabyte RAM. Future IO4 proms will test the tags
independently and will test write-backs. Future IO4 proms will also test
the i-cache.
Finally, we're ready to check on the slave processors. We wait for
them to finish their testing (or for a timeout) and display the results
of the testing. If the processors have not stored a result value, we
assume that they cannot access memory. If they hang in a particular test, we
assume that that test failed.
Processors that fail diagnostics are disabled, and we go on to load the
IO4 prom.
SLAVE CODE
----------
Until now, we have only discussed the role of the master processor. The
slave processors enter a loop where they wait for instructions from the
master processor. The first instruction they are given is a request to
update their entries in evconfig. The slaves store their processor type,
cache size, speed, etc. Next they start their power-on diagnostics updating
their diagnostic result value, or "diagval," as they go. The slaves
currently run the same set of tests as the master: data cache, secondary
cache, and bus tag tests. Future versions will run the same set of tests
mentioned for the master.
Slaves can also be sent interrupts to launch them into a given piece of
code. This technique is used by the "niblet" tests (mentioned below) as
well as by the IO4 prom to prepare slaves to run Unix.
FATAL FAILURES
--------------
Upon fatal failures such as no enable memory or an IA chip test failure,
the PROM scrolls a descriptive message across the system controller LCD
display and displays a "disgnostic code." These codes can be used to
diagnose the problem in more depth than the message alone. Upon IO failures,
the CPU goes into POD mode on the CC UART. On memory failures, the master
CPU goes into POD mode. Pressing Enter on the serial console clears the
scrolling message and allow the user to type commands.
The diagnostic codes are listed below.
----------------------------------------------------------------------------
The Debug Switches
------------------
The system controller has a mode, accessible only from "manager mode,"
which allows the user to set a number of "virtual dipswitches."
As of the December 17th release of the system controller firmware,
there have are sixteen debug switches instead of eight. The "lab settings"
are the leftmost eight, and the "software" settings are the rightmost.
The switches are numbered from the right starting from zero as shown:
f e d c b a 9 8 7 6 5 4 3 2 1 0
The software switches work as follows:
7
The "Manu-mode" switch is used to send all IP19prom console output to
the external UART on the system controller. This mode will eventually
be used in manufacturing to debug systems which can't reach the IO4 UART.
In current PROMS, this switch also forces the POD mode switch since the
IO4 prom doesn't use the system controller UART.
6
The "No boot master arbitration" switch keeps the system controller from
selecting a processor with which to communicate. This makes it possible
to communicate with individual processors via the "CC UART" connectors on
the edge of the IP19 boards.
5
The "POD mode" switch forces the IP19 prom to stop initialization just
before it would have loaded the IO4 prom and jump to POD mode instead.
This is useful on a system with a bad IO4 prom or a bad IO board.
4
The "No diagnostics" switch prevents the system from running power-on
diagnostics. This switch should only be used by software developers
who are constantly bringing systems up and down. Otherwise, it can
mask failures and cause system damage.
3
In PROM release 6, this becomes the "use defaults" switch. This switch
will override the console baudrate setting and use 9600 instead. It may
also override certain other settings. (Not implemented in PROM revision 4.)
2
In PROM release 6, this becomes the "don't clear memory" switch. It's
useful for debugging things like the machines that won't take an NMI.
1
In PROM release 6, this becomes the "boot from second IO4" switch. If you
have a machine with a bad flash ROM in the highest slot, simply move the
console connection, flip this switch, and boot from the next IO4.
0
In PROM release 7, this becomes the debug switch. So far, it is used only
by the IO4 prom.
-----------------------------------------------------------------------------
POD Mode
During bring-up and in the event of unexpected exceptions or diagnostic
failures, the PROM can drop into a special command interpreter. This
interface, known as the Power-On Diagnostics mode, or POD mode, provides a
simple interface through which a user can examine and modify the state of
the machine. The commands provided by POD MODE are listed below. All
numerical inputs should be entered in hex and need not be prefixed with '0x'.
The POD mode prompt is "POD xx/yy>" where xx is the slot number of the
current processor, and yy is its "slice" on the IP19 board.
Commands with an asterisk (*) are new to release 6 or 7.
wb ADDRESS VALUE
wh ADDRESS VALUE
ww ADDRESS VALUE
wd ADDRESS VALUE -- Write the value into a byte, half, word,
or doubleword at the given address.
Currently the values written must be 32-bit or
smaller values.
db ADDRESS
dh ADDRESS
dw ADDRESS
dd ADDRESS -- Display the contents of the byte, halfword, word,
or doubleword at the given address.
NOTE: The display and write memory commands continue
to the next address by default. To quit,
type a "q" and return instead of just
return. Typing a period causes the read
or write to march on through memory on its
own.
wr REG VALUE -- Write the given value into the register specified.
dr REG -- Display the value in the specified register.
Register names for read/write include:
sp: stack pointer
sr: r4000 status register
cause: r4000 cause register
epc: Exception program counter
eepc: error exception program counter
config: r4000 config register
wh: watchhi register
wl: watchlo register
Registers that can only be displayed:
all: all r4000 general purpose registers
and selected coprocessor 0 registers.
rX: where 0 <= X <= 31
Please note that some of the general purpose
registers are not saved correctly in the
current version of the IP19prom.
dc SLOT REG -- Displays the value of the specified Everest
configuration register.
wc SLOT REG VAL -- Writes the value to the specified Everest config
register.
j ADDRESS -- Jumps to the specified address
j1 ADDRESS PARM -- Jumps to the address passing the parameter supplied.
j2 ADDRESS... -- Jumps to the address passing two parameters.
info -- Displays the slot and processor number of the
processor and prints out a description of the
system configuration (as provided by SenseConfig).
reset -- Reset the system.
sload -- Download Motorola S-record 3 code through the
serial port.
srun -- Like sload but it runs too.
sloop (COMMAND) -- Performs a 'scope loop of the following single
command. Sloop runs the specified command until a
key is pressed.
loop TIMES (COMMAND)
-- Performs a nonzero number of iterations of COMMAND,
which can be any legal command line (semicolon
separated).
* mem START END -- Performs a memory test starting with address START.
END is the first address not tested. Now if
you specify an address with the high bit unset,
POD ors in 0xa0000000. No more TLB misses.
scache ITER DE -- Performs ITER iterations of a basic secondary
cache test with the r4000's DE bit set to the
value provided.
dmc SLOT -- Displays the memory board configuration for
the board in the specified slot
dio SLOT -- Displays the IO4 board configuration for
the board in the specified slot
* devc SLOT | all -- Display the "evconfig" structure entry for this slot
or all slots. This structure contains what the
prom believes to be in that slot and its current
status. Now, it also displays total memory
and the current debug switch settings as well
as strings explaining "diagvals" and prom
revision numbers.
disable SLOT UNIT
-- Disables UNIT of the board in SLOT (see enable).
* fdisable SLOT UNIT
-- Forcibly disable a unit. For CPUs, this means
writing the A chip enable register. For memory
and IO adapters, it means removing the unit from
the evconfig structure.
enable SLOT UNIT -- Enable UNIT of the board in SLOT specified.
This command changes the "enable" field of the
evconfig structure for the chosen unit.
In future prom revisions, this will probably
change the value in NVRAM.
* fenable SLOT UNIT
-- Forcibly enable a unit. For CPUs, this means
writing the A chip enable register. For memory
and IO adapters, it means forcing the correct
value to be stored in the evconfig structure.
reconf -- Reconfigures memory using the currently enabled
banks.
bist -- Runs the memory Built-In Self-Test.
* ecc -- Decode the information in the Cache_error
register after a cache error exception
(This works again in rev 6.)
si SLOT CPU LVL -- Send a level LVL interrupt to the processor in
the specified slot and slice.
td PARM -- Displays the specified TLB entry. "lo" and "hi"
display the low and high halves of the TLB
to make the output fit on a 24-line terminal.
"all" displays the entire TLB.
clear -- Clears memory and CC chip error registers.
CC chip errors are printed after each prompt
until cleared.
* decode -- Displays the memory slot and bank number a given
physical address belongs to. (Can now accept
up to just under 4 gigabyte addresses!)
walk LO HI CONT -- Walks a bit through every word of the address
range specified with HI being the first address
not tested. CONT indicates whether to continue
after failures (1 = continue, 0 = stop on errors).
slave -- Causes this processor to enter slave mode.
wx BLOC OFF VAL -- Write VAL to the address created by adding the
value of OFF to the value of BLOC multiplied by 256.
This command uses R4000 64-bit addressing to
allow uncached access to all of memory.
dx -- Prints the value contained in the address created
by combining BLOC and OFF as above.
io -- Loads and executes the IO4 prom in the master IO4
board.
why -- Prints a string explaining why we entered POD mode.
niblet SET -- Run the specified set of Niblet tests (see below).
The usual test sets are numbered 0-9.
gm -- Go to memory mode. This moves the stack into
cached memory instead of an "isolated." Niblet
requires you to execute this command before
running it. This command changes the prompt to
"Mem xx/xx>"
select SLICE -- When the system is in "manu-mode," all processors
on the board selected by the system controller
receive any input intended for the selected
processor. This is due to a design limitation
of the IP19. This results in four processors
executing any command intended for just one.
POD handles this by providing the select command.
Select allows the user to select a "slice" which
will be able to answer commands. All other CPUs
on the board will be temporarily disabled until
the next select. Selecting slice ff disables
selection and allows all CPUs to respond to
input.
? -- Displays the list of commands.
------------------------------------------------------------------------------
Niblet
------
Niblet is a very small, symmetric multiprocessing kernel with separate
virtual address spaces for its processes. It was originally intended
as a verification tool, but we have found it useful for testing new
boards as well. Eventually, it will be called automatically from the
IO4 prom, but it is also available from the POD prompt in the IP19 prom.
NOTE:
Niblet may not run as intended if the various processors on
the system are running different versions of the IP19 prom. You're
okay if the processors launch successfully.
The various tests available from the "niblet n" command are really
combinations of niblet tests. That's why Niblet reports "Supertest passed"
and "Supertest FAILED." A list of the basic Niblet tests and a table
of which tests are contained in each supertest follows.
Basic Niblet Tests:
-------------------
INVALID:
Invalidates random TLB entries to cause more varied interactions.
COUNTER:
Just runs until a certain instruction count is reached and passes.
The count is proportional to the niblet process ID.
MPMON:
Test monotonicity of Everest reads and writes.
MPINTADD:
Two processors add values to a common variable, hit a barrier,
and check the final sum.
MPINTADD_4:
Four processor version of MPINTADD.
MPSLOCK:
A software locking protocol test.
MPHLOCK:
Tests load-linked and store-conditional by grabbing a lock, storing
a process ID into a protected location, waiting for a delay to
expire, and checking that the correct process ID is still there.
Multiple processors try this so a failure should result in a CPU
reading the wrong PID.
MEMTEST:
Tests a range of memory by writing a value based on a process ID
to a range of memory and then checking it. This version's range
is small enough to fit in a secondary cache.
BIGMEM:
Same as above but the set is larger than one megabyte.
PRINTTEST:
Tests Niblet context-switching. Runs very quickly. Mostly a sanity
test.
BIGINTADD_4:
Same as MPINTADD_4 but runs for many, many iterations.
BIGROVE:
A roving producer-consumer test that runs for many, many iterations.
BIGHLOCK:
Same as MPHLOCK but runs for many, many iterations.
Niblet "Supertests":
(Only tests 0-9 are useful without a connection to the system controller
UART.)
niblet 0:
Runs one copy of the "INVALID" process. Should always pass almost
immediately.
niblet 1:
Runs {INVALID, COUNTER, COUNTER}. Takes some time. One process will
finish in about half the time that the other two take.
niblet 2:
Runs {MPMON, MPMON}. Takes disproportionately longer on a
single processor than on an MP machine.
niblet 3:
Runs {MPINTADD, INVALID, MPINTADD}. Takes disproportionately longer
on a single processor than on an MP machine.
niblet 4:
Runs {MPSLOCK, MPSLOCK, INVALID}.
niblet 5:
Runs {MPROVE, MPSLOCK, MPROVE, MPSLOCK, INVALID}.
niblet 6:
Runs {MPSLOCK, MPMON, INVALID, MPSLOCK, MPMON}. Takes disproportionate- ly longer on a single processor than on an MP machine.
niblet 7:
Runs {MPROVE, MPROVE}.
niblet 8:
Runs {INVALID, MPMON, MPMON, MPROVE, MPROVE, MPROVE, MPINTADD,
MPINTADD, MPHLOCK, MPHLOCK} for a total of 10 processes.
niblet 9:
Runs {MPINTADD_4, MPINTADD_4, MPINTADD_4, MPINTADD_4, INVALID,
MPROVE, MPROVE, MPROVE, MPHLOCK, MPHLOCK, MPSLOCK, MPSLOCK} for
a total of 12 processes.
niblet a:
Runs {MEMTEST, MEMTEST, MEMTEST, MEMTEST, MEMTEST}. This test is
designed as an overnight test. It will take hours to complete.
niblet b:
Runs {BIGMEM, BIGMEM, BIGMEM, INVALID, INVALID, INVALID} for a total
of 6 processes. It too, takes hours to complete the memory tests,
but the supertest will never complete since there are three
INVALID processes. They exit when they are the last process on
the system.
niblet c:
Runs {PRINTTEST, PRINTTEST, PRINTTEST, PRINTTEST,
PRINTTEST, PRINTTEST, PRINTTEST, PRINTTEST,
PRINTTEST, PRINTTEST, PRINTTEST, PRINTTEST,
PRINTTEST, PRINTTEST} This is really a Niblet sanity test
(as is niblet 0).
niblet d:
This is the big MP stress test. It runs {BIGINTADD_4, BIGINTADD_4,
BIGINTADD_4, BIGINTADD_4, INVALID, BIGROVE, BIGROVE, BIGROVE,
BIGHLOCK, BIGHLOCK, BIGMEM, BIGMEM, BIGMEM, INVALID}. This
test runs 14 processes for a number of hours. It's intended as
an overnight (or other long period of time) MP stress test.
Niblet Fundamentals
-------------------
NOTE: Niblet displays all of its output (with the exception of the final
result) to the CC UART so it's only visible in "manu-mode" or "no boot master
arbitration" mode.
Number of CPUS to include:
Niblet attempts to run its tests on all processors that were
present when the PROM set up the machine. That means that if a processor
has been forced into POD mode by pressing control-P, that processor will
be included in Niblet's processor count and niblet will never pass its
first barrier. The timeout code hasn't been implemented yet so this
results in a hung system. A processor can be forced back into slave mode
by typing the POD "slave" command.
Niblet is limited to 15 CPUs at a time. The user can control which
CPUs run niblet with the enable and disable comamnds.
Scheduling and process migration:
As long as there are more processes than processors, Niblet
processes will migrate. This is the reason that there are three copies of
INVALID in "niblet b." As long as that test is run on fewer than six
processors, tests will migrate eventually. The timing has to be right, though.
On fewer processors, tests will migrate more often.
If there are ever more processors than processes, one or more
processors will go into a loop waiting for the supertest to complete. You
can tell that processors are in this state because they will print,
"No processes left to run - twiddling."
Test completion:
Since Niblet is intended to run with one UART per processor, it
only prints failure messages to the processor on which a test fails. The
processor hosting the failing process will print all pertinent information
and then send an interrupt to the other processors. This means that the
other processors will only say, "Niblet FAILED on an interrupt." The
real cause of the failure is available on the processor where it happened.
This is particularly important with a Niblet failure due to a nonzero
ERTOIP register since it can only be read by the processor on which the
error occurred. That processor will print, "ERTOIP is nonzero!
(ERTOIP, CAUSE, EPC)" followed by the values of ERTOIP, CAUSE, and EPC.
The master processor will always complete with a message of the
form, "Supertest PASSED/FAILED" followed by "Niblet Complete."
None of the 13 Niblet tests in the IP19 prom should ever print
a "Supertest FAILED." message under normal circumstances.
NOTE: Running a test in manufacturing mode yields more information
as processors print to their local UARTS. In "manumode" you can selectively
watch CPUs.
-----------------------------------------------------------------------------
PROM LEDS and What They Mean
----------------------------
The values that follow are for the PROM LEDs that I mentioned above.
They are guaranteed to be valid for 19 PROM release 1 (12/15/92), but
they will most likely remain so.
If you see a constant value displayed on the LEDs, convert the binary
into a decimal number and look it up in the following list under PLED_xxx.
Flashing values should appear under FLED_xxx.
There are a couple of additional modes in addition to the constant and
single-flashing-value modes. What a processor is in the IP19 slave loop,
it cycles between 9 and 6. The master processor in POD mode cycles between
1 and 2 when it's using the UART on the CC chip. On the EPC UART, it displays
a constant value.
Slave mode (five vertical slices are shown. The topmost LED is most
significant):
0 0 0 0 0
0 0 0 0 0
1 0 1 0 1 etc.
0 1 0 1 0
0 1 0 1 0
1 0 1 0 1
Time ->
Master mode on CC UART:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 etc.
0 0 0 0 0
0 1 0 1 0
1 0 1 0 1
Time ->
The following comes straight from a PROM header file so it's somewhat
raw. Note that the most significant bit of an LED value is the top LED.
#define PLED_CLEARTAGS 1 (000001)
/* Clearing the primary data cache tags */
#define PLED_CKCCLOCAL 2 (000010)
/* Testing CC chip local registers */
#define PLED_CCLFAILED_INITUART 3 (000011)
/* Failed the local test but trying to initialize the UART anyway */
#define PLED_CCINIT1 4 (000100)
/* Initializing the CC chip local registers */
#define PLED_CKCCCONFIG 5 (000101)
/* Testing the CC chip config registers (requires a usable bus to pass) */
/* NOTE: Hanging in this test usually means that the bus clock has failed.
* Check the oscillator.
*/
#define PLED_CCCFAILED_INITUART 6 (000110)
/* Failed the config reg test but trying to initialize the UART anyway */
#define PLED_NOCLOCK_INITUART 7 (000111)
/* CC clock isn't running init uart anyway */
#define PLED_CCINIT2 8 (001000)
/* Initializing the CC chip config registers */
#define PLED_UARTINIT 9 (001001)
/* Initializing the CC chip UART */
/* NOTE: Hanging in this test usually means that the UART clock is bad.
* Check the connection to the system controller.
*/
#define PLED_CCUARTDONE 10 (001010)
/* Finished initializing the CC chip UART */
#define PLED_CKACHIP 11 (001011)
/* Testing the A chip registers */
#define PLED_AINIT 12 (001100)
/* Initializing the A chip */
#define PLED_CKEBUS1 13 (001101)
/* Checking the EBus with interrupts. */
#define PLED_SCINIT 14 (001110)
/* Initializing the system controller */
#define PLED_BMARB 15 (001111)
/* Arbitrating for a bootmaster */
#define PLED_BMASTER 16 (010000)
/* This processor is the bootmaster */
#define PLED_CKEBUS2 17 (010001)
/* In second EBus test. Run only by the master */
#define PLED_POD 18 (010010)
/* Setting up this CPU slice for POD mode */
#define PLED_PODLOOP 19 (010011)
/* Entering POD loop */
#define PLED_CKPDCACHE1 20 (010100)
/* Checking the primary data cache */
#define PLED_MAKESTACK 21 (010101)
/* Creating a stack in the primary data cache */
#define PLED_MAIN 22 (010110)
/* Jumping into C code - calling main() */
#define PLED_CKIAID 23 (010111)
/* Check IA and ID chips on master IO4 */
#define PLED_CKEPC 24 (011000)
/* Check EPC chip on master IO4 */
#define PLED_IO4INIT 25 (011001)
/* Initializing the IO4 prom */
#define PLED_NVRAM 26 (011010)
/* Getting NVRAM variables */
#define PLED_FINDCONS 27 (011011)
/* Checking the path to the EPC chip which will contain the console UART */
#define PLED_CKCONS 28 (011100)
/* Testing the console UART */
#define PLED_CONSINIT 29 (011101)
/* Setting up the console UART */
#define PLED_CONFIGCPUS 30 (011110)
/* Configuring out CPUs that are disabled */
#define PLED_CKRAWMEM 31 (011111)
/* Checking out raw memory (running BIST) */
#define PLED_CONFIGMEM 32 (100000)
/* Configuring memory */
#define PLED_CKMEM 33 (100001)
/* Checking configured memory */
#define PLED_LOADPROM 34 (100010)
/* Loading IO4 prom */
#define PLED_CKSCACHE1 35 (100011)
/* First pass of secondary cache testing - test the scache like a RAM */
#define PLED_CKPICACHE 36 (100100)
/* Check the primary instruction cache */
#define PLED_CKPDCACHE2 37 (100101)
/* check the primary data cache writeback mechanism */
#define PLED_CKSCACHE2 38 (100110)
/* check the secondary cache writeback mechanism */
#define PLED_CKBT 39 (100111)
/* Check the bus tags */
#define PLED_BTINIT 40 (101000)
/* Clear the bus tags */
#define PLED_CKPROM 41 (101001)
/* Checksum the IO prom */
#define PLED_INSLAVE 42 (101010)
/* This CPU is entering slave mode */
#define PLED_PROMJUMP 43 (101011)
/* Jumping to the IO prom */
#define PLED_SLAVEJUMP 44 (101100)
/* A slave is jumping to the IO4 PROM slave code */
#define PLED_NMIJUMP 45 (101101)
/* This CPU has jumped into the kernel's NMI handling code. */
/*
* Failure mode LED values. If the Power-On Diagnostics
* find an unrecoverable problem with the hardware,
* they will call the flash leds routine with one of
* the following values as an argument. There's one PLED LED
* setting hiding down here because of an error made earlier.
*/
#define FLED_CANTSEEMEM 46 (101110)
/* Flashed by slave processors if they take an exception while trying to
* write their evconfig entries. Often means the processor's getting D-chip
* parity errors.
*/
#define FLED_NOUARTCLK 47 (101111)
/* The CC UART clock is not running. No system controller access is possible.
*/
#define FLED_IMPOSSIBLE1 48 (110000)
/* We fell through one of the supposedly unreturning subroutines.
* Really shouldn't be possible.
*/
#define FLED_DEADCOP1 49 (110001)
/* Coprocessor 1 is dead - not seeing this doesn't mean it works. */
#define FLED_CCCLOCK 50 (110010)
/* The CC clock isn't running */
#define FLED_CCLOCAL 51 (110011)
/* Failed CC local register tests */
#define FLED_CCCONFIG 52 (110100)
/* Failed CC config register tests */
#define FLED_ACHIP 53 (110101)
/* Failed A Chip register tests */
#define FLED_BROKEWB 54 (110110)
/* By the time this CPU had arrived at the bootmaster arbitration barrier,
* the rendezvous time had passed. This implies that a CPU is running too
* slowly, the ratio of bus clock to CPU clock rate is too high, or a bit
* in the CC clock is stuck on.
*/
#define FLED_BADDCACHE 55 (110111)
/* This CPU's primary data cache test failed */
#define FLED_BADIO4 56 (111000)
/* The IO4 board is bad - can't get to the console. */
/* Exception failure mode values */
#define FLED_UTLBMISS 57 (111001)
/* Took a TLB Refill exception */
#define FLED_XTLBMISS 58 (111010)
/* Took an extended TLB Refill exception */
#define PLED_WRCONFIG 59 (111011)
/* Writing evconfig structure:
* The master CPU writes the whole array
* The slaves only write their own entries.
*/
#define FLED_GENERAL 60 (111100)
/* Took a general exception */
#define FLED_NOTIMPL 61 (111101)
/* Took an unimplemented exception */
#define FLED_ECC 62 (111110)
/* Took a cache error exception */
#define FLED_DISABLED 63 (111111)
/* Disabled processors will flash all of their LEDs */
-----------------------------------------------------------------------------
PROM Diagnostic Messages
------------------------
These messages can be printed as a result of prom diagnostics and as
reasons for entering POD mode. The numbers on the left are "diagnostic"
codes which are displayed on the LCD panel.
CODE MEANING
Success:
000 Device passed diagnostics.
Cache tests:
001 Failed dcache1 data test.
002 Failed dcache1 addr test.
003 Failed scache1 data test.
004 Failed scache1 addr test.
005 Failed icache data test.
006 Failed icache addr test.
007 Dcache test hung.
008 Scache test hung.
009 Icache test hung.
Memory tests:
040 Memory built-in self-test failed.
041 No working memory was found.
042 Memory address line test failed.
043 Memory data line test failed.
044 Bank failed configured memory test.
045 Slave hung writing to memory.
046 Bank disabled due to downrev MA chip.
047 A bus error occurred during MC3 config.
048 A bus error occurred during MC3 testing.
049 PROM attempted to disable the same bank twice.
050 Not enough memory to load the IO4 PROM.
051 No memory boards were recognized.
052 Bank forcibly re-enabled by the PROM.
Ebus tests:
060 CPU doesn't get interrupts from CC.
061 Group interrupt test failed.
062 Lost a loopback interrupt.
063 Bit in HPIL register stuck.
IO4 tests:
070 No working IO4 is present.
071 Bad checksum on IO4 PROM.
072 Bad entry point in IO4 PROM.
073 IO4 PROM claims to be too long.
074 Bad entry point in IO4 PROM.
075 Bad magic number in IO4 PROM.
078 Bus error while downloading IO4 PROM.
079 No EPC chip found on master IO4.
080 Bus error while configuring IO4.
081 Bus error during IA register test.
082 Bus error during IA PIO test.
083 IA chip register test failed.
084 Wrong error reported for bad PIO.
085 IA error didn't generate interrupt.
086 IA error generated wrong interrupt.
087 EPC register test failed.
088 Bus error on map RAM rd/wr test.
089 Bus error on map RAM address test.
090 Bus error on map RAM walking 1 test.
091 Bus error during map RAM testing.
092 Map RAM read/write test failed.
093 Map RAM address test failed.
094 Map RAM walking 1 test failed.
095 EPC UART loopback test failed.
IP19 tests:
120 CPU can't access memory
123 CC bus tag data test failed.
124 CC bus tag addr test failed.
125 CPU forcibly re-enabled by the PROM.
Miscellaneous:
240 CPU writing configuration info.
246 CPU testing dcache.
247 CPU testing icache.
248 CPU testing scache.
249 CPU initializing caches.
250 CPU returning from master's code.
251 Unexpected exception.
252 A nonmaskable interrupt occurred.
253 POD mode switch set or POD key pressed.
253 Unspecified diagnostic failure.
254 Diagnostic value unset.
255 Device not present.
The following messages appear on the system controller display when
diagnostics fail or as status:
CODE System Controller Short Message
003 SCACHE FAILED!
004 SCACHE FAILED!
001 DCACHE FAILED!
002 DCACHE FAILED!
005 ICACHE FAILED!
006 ICACHE FAILED!
040 MC3 CONFIG FAILED!
041 NO GOOD MEMORY FOUND
042 MC3 CONFIG FAILED!
043 MC3 CONFIG FAILED!
044 MC3 READBACK ERROR!
047 MC3 CONFIG FAILED!
048 MC3 CONFIG FAILED!
049 MC3 CONFIG FAILED!
050 INSUFFICIENT MEMORY!
051 NO MEM BOARDS FOUND!
070 NO IO BOARDS FOUND!
071 IO4PROM FAILED!
072 IO4PROM FAILED!
073 IO4PROM FAILED!
074 IO4PROM FAILED!
075 IO4PROM FAILED!
078 IO4PROM FAILED!
079 NO EPC CHIP FOUND!
080 IO4 CONFIG FAILED!
081 MASTER IO4 FAILED!
082 MASTER IO4 FAILED!
083 MASTER IO4 FAILED!
084 MASTER IO4 FAILED!
085 MASTER IO4 FAILED!
086 MASTER IO4 FAILED!
088 MASTER IO4 FAILED!
089 MASTER IO4 FAILED!
090 MASTER IO4 FAILED!
091 MASTER IO4 FAILED!
092 MASTER IO4 FAILED!
093 MASTER IO4 FAILED!
094 MASTER IO4 FAILED!
087 EPC CHIP FAILED!
095 EPC UART FAILED!
123 BUS TAGS FAILED!
123 BUS TAGS FAILED!
124 BUS TAGS FAILED!
250 Reentering POD mode
251 PROM EXCEPTION!
252 PROM NMI HANDLER
253 CPU in POD mode.
These are the long, scrolling messages:
CODE System Controller Long Message
040 Memory board configuration has failed. Cannot load IO PROM.
041 All memory banks had to be disabled due to test failures.
042 The address line self-test failed. Cannot continue.
043 Memory board configuration has failed. Cannot load IO PROM.
044 Memory board configuration has failed. Cannot load IO PROM.
047 Memory board configuration has failed. Cannot load IO PROM.
048 Memory board configuration has failed. Cannot load IO PROM.
049 The PROM was unable to disable failing memory banks.
050 You must have at least 32 megabytes of working memory to load the IO PROM
051 The IP19 PROM did not recognize any memory boards in the system.
070 The IP19 PROM did not recognize any IO4 boards in the system.
071 Diagnostics detected a problem with your IO4 PROM.
072 Diagnostics detected a problem with your IO4 PROM.
073 Diagnostics detected a problem with your IO4 PROM.
074 Diagnostics detected a problem with your IO4 PROM.
075 Diagnostics detected a problem with your IO4 PROM.
078 An exception occurred while downloading the IO4 PROM to memory.
079 There must be an EPC chip on the IO board in the highest-numbered slot.
080 An exception occurred while configuring an IO board.
081 The IA chip on the master IO4 board has failed diagnostics.
082 The IA chip on the master IO4 board has failed diagnostics.
083 The IA chip on the master IO4 board has failed diagnostics.
084 The IA chip on the master IO4 board has failed diagnostics.
085 The IA chip on the master IO4 board has failed diagnostics.
086 The IA chip on the master IO4 board has failed diagnostics.
088 The IA chip on the master IO4 board has failed diagnostics.
089 The IA chip on the master IO4 board has failed diagnostics.
090 The IA chip on the master IO4 board has failed diagnostics.
091 The IA chip on the master IO4 board has failed diagnostics.
092 The IA chip on the master IO4 board has failed diagnostics.
093 The IA chip on the master IO4 board has failed diagnostics.
094 The IA chip on the master IO4 board has failed diagnostics.
087 The EPC chip on the master IO4 board has failed diagnostics.
251 The PROM code took an unexpected exception.
252 The PROM received a nonmaskable interrupt.
-----------------------------------------------------------------------------
IP19 PROM SYSTEM CONTROLLER STANDARD MESSAGES:
Starting System...
Displayed once bootmaster arbitration has completed. Indicates that the
master processor has started up correctly and is capable of communicating
with the system controller.
EBUS diags 2..
Displayed immediately before we run the secondary EBUS diagnostics. The
secondary EBUS diagnostics stress the interrupt logic and the EBUS.
PD Cache test..
Displayed immediately before we run the primary data cache test.
Building stack..
Displayed before we attempt to set up the cache as the stack. If this
is the last message displayed, there is probably something wrong with
the master processor.
Jumping to MAIN
Displayed before we switch into the C main subroutine.
Initing Config Info
Displayed before we attempt to do initial hardware probing and set up
the everest configuration information data structure. In this phase,
we simply read out the SYSCONFIG register and set the evconfig fields
to rational default values.
Setting timeouts..
Displayed before we attempt to write to the various board timeout registers.
Everest requires that all of the boards be initialized with consistent
timeout values, and that these timeout values be written before we actually
do reads or writes to the boards (we're safe so far because we have only
touched configuration registers; this will change when we start talking to
IO4 devices).
Initing master IO4..
Displayed before we attempt to do basic initialization for all of the
IO4's in the system. Basic initialization consists of writing the
large and small window registers, setting the endianness, setting up
error interrupts, clearing the IBUS and EBUS error registers, and
examining the IO adapters.
Initing EPC...
Displayed before we do the first writes to the master EPC. This routine
clears the EPC error registers and takes all EPC devices out of reset.
Initing EPC UART
Displayed when we first enter the UART configuration code.
Initing UART Chan B
Displayed before we begin initializing UART chan B's control registers.
Initing UART Chan A
Displayed before we begin initializing UART chan A's control registers.
Reading inventory..
Displayed before we attempt to read the system inventory out the IO4
NVRAM. If the inventory is invalid or we can't read it for some reason,
we initialize the inventory fields with appropriate default values.
Running BIST..
Displayed before we run the memory hardware's built-in self test.
Configuring memory..
Displayed before we actually configure the banks into a legitimate
state.
Testing memory..
Printed before we start executing the memory post-configuration tests.
These tests simply check that memory was configured correctly.
Testing Bus Tags..
Checks and initializes the CC bus tags, which are used by the CC chip
to determine whether it should pass a coherency transaction on to a
particular processor.
Writing CFGINFO..
Displayed before we try writing the everest configuration information
into main memory.
Initing MPCONF blk..
Displayed before we initialize the everest MP configuration blocks
for all of the processors.
Testing S Cache...
Displayed before we begin testing the secondary cache on all of the
processors.
S Cache passed.
Secondary cache test passed.
Checking slaves...
Displayed when we check each slave processor to determine whether it
is alive and whether it passed its diagnostics.
Loading IO4 PROM..
Displayed when we download the IO4 PROM from the IO4 flash proms into
main memory.
-----------------------------------------------------------------------------
MISCELLANEOUS HINTS:
If a CPU hangs flashing its LEDs, it will still accept a
control-p (^p) character from its CC UART and go into POD mode. To do
this, you must either connect to it through the system controller or
directly, via the four pin IPI9 connector (with "no boot master arbitration"
switched on in the system controller).
Processors in the "slave" loop displaying a repeating pattern of
four LEDs with two on at a time can also be interrupted with a control-p
on their CC UART. They will then attempt to enter POD mode. Of course,
they may be too broken to do this, in which case, you'll see a different
failure LED value.
The System controller displays the state of the various processors
on its display. The characters associated with the processors are as
follows:
'B' = processor is bootmaster.
'+' = processor is operational.
' ' = processor is not present or seriously broken.
'X' = processor fails diagnostics.
'D' = processor is disabled in NVRAM.
There are some new addresses you can jump to in the IP19 prom to
get certain otherwise difficult effects:
0xbfc00008: Restart the PROM.
0xbfc00010: Go back to IP19 PROM slave mode.
0xbfc00018: Go into POD mode using the CC UART for I/O.
0xbfc00020: Go into POD mode using the IO4 UART for input.
0xbfc00028: Flash all LEDs and loop endlessly.
The IO4 prom now has a POD command to get you to POD mode. It's
no longer necessary to type "goto 0xbfc00020."
EAROM VARIABLES:
The IP19 prom looks in various locations in EAROM to find
system configuration parameters. Many of these also affect Unix.
Here are their names and addresses:
#define EV_EBUSRATE0_LOC 0xb9000100 /* EBUS freq (Hz) LSB */
#define EV_EBUSRATE1_LOC 0xb9000108 /* EBUS freq (Hz) byte 1 */
#define EV_EBUSRATE2_LOC 0xb9000110 /* EBUS freq (Hz) byte 2 */
#define EV_EBUSRATE3_LOC 0xb9000118 /* EBUS freq (Hz) MSB */
#define EV_PGBRDEN_LOC 0xb9000120 /* Piggyback Rd Enbl bit */
#define EV_CACHE_SZ_LOC 0xb9000128 /* Size of secondary cache
* 0x14 == 1M
* 0x16 == 4M
*/
#define EV_IW_TRIG_LOC 0xb9000130 /* IW_TRIG value */
#define EV_RR_TRIG_LOC 0xb9000138 /* RR_TRIG value */
#define EV_EPROCRATE0_LOC 0xb9000140 /* CPU freqency (Hz) LSB */
#define EV_EPROCRATE1_LOC 0xb9000148 /* CPU freqency (Hz) byte 1 */
#define EV_EPROCRATE2_LOC 0xb9000150 /* CPU freqency (Hz) byte 2 */
#define EV_EPROCRATE3_LOC 0xb9000158 /* CPU freqency (Hz) MSB */
#define EV_RTCFREQ0_LOC 0xb9000160 /* RTC frequency (Hz) LSB */
#define EV_RTCFREQ1_LOC 0xb9000168 /* RTC frequency (Hz) byte 2 */
#define EV_RTCFREQ2_LOC 0xb9000170 /* RTC frequency (Hz) byte 3 */
#define EV_RTCFREQ3_LOC 0xb9000178 /* RTC frequency (Hz) MSB */
#define EV_WCOUNT0_LOC 0xb9000180 /* EAROM Write count LSB */
#define EV_WCOUNT1_LOC 0xb9000188 /* EAROM Write count MSB */
#define EV_ECCENB_LOC 0xb9000190 /* CC chip ECC enable flag */
------------------------------------------------------------------------------