2011-09-07 10:20:08 +03:00
|
|
|
--- Tue 2011-09-06 ------------------------------------------------------------
|
|
|
|
|
2011-09-07 15:20:37 +03:00
|
|
|
Running "loop": power-cycle, sleep 2 s, jtag-boot, sleep 70 seconds,
|
|
|
|
which is enough to boot into FN and render "The Tunnel" for a moment,
|
|
|
|
then power-cycle again (off-time is 5 s).
|
|
|
|
|
|
|
|
Note that the test loop is "open-loop" and will cycle also past any
|
|
|
|
problems. The first time a corrupt standby (or any other issue) is
|
|
|
|
observed may therefore be well after the actual event.
|
|
|
|
|
2011-09-07 10:20:08 +03:00
|
|
|
1: started around 11:53 (M1 configuration is original, without locking)
|
|
|
|
(around 500) visually checked boot process; standby was reached normally
|
|
|
|
|
|
|
|
--- Wed 2011-09-07 ------------------------------------------------------------
|
|
|
|
|
|
|
|
645: neocon stopped working (around 01:58)
|
|
|
|
666: detected neocon failure at run 666: restarted neocon; urjtag failed
|
|
|
|
this cycle; back to normal at 667
|
|
|
|
684: checked LEDs again (first time since ~500) and found that standby
|
|
|
|
may be failing. stopping test at 685 (around 02:50) for
|
|
|
|
investigation.
|
|
|
|
|
|
|
|
Downloaded the standby bitstream:
|
|
|
|
|
|
|
|
wget https://raw.github.com/milkymist/scripts/master/scripts/reflash_m1.sh
|
|
|
|
chmod 755 reflash_m1.sh
|
|
|
|
|
|
|
|
./reflash_m1.sh --read-flash
|
|
|
|
|
|
|
|
Found two corruptions in the standby bitstream:
|
|
|
|
|
|
|
|
diff -u <(hexdump -C standby.fpg) <(hexdump -C /home/root/.qi/milkymist/read-flash/2011...)
|
|
|
|
|
|
|
|
-00000080 00 00 4c 83 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |..L...L......G.C|
|
|
|
|
+00000080 00 00 4c 83 00 00 4c 87 00 00 c4 80 d8 47 cc 43 |..L...L......G.C|
|
|
|
|
|
|
|
|
-00002840 00 08 cc 26 00 00 00 00 00 00 00 00 0c 44 00 98 |...&.........D..|
|
|
|
|
+00002840 00 00 cc 26 00 00 00 00 00 00 00 00 0c 44 00 98 |...&.........D..|
|
|
|
|
|
|
|
|
CRC-checked the partitions:
|
|
|
|
|
|
|
|
git clone git://github.com/milkymist/milkymist
|
|
|
|
cd milkymist/tools/
|
|
|
|
gcc -Wall -I. -o flterm flterm.c
|
|
|
|
wget http://milkymist.org/updates/current/for-rc3/boot.4e53273.bin
|
|
|
|
./flterm --port /dev/ttyUSB0 --kernel boot.4e53273.bin
|
|
|
|
|
2011-09-07 15:20:37 +03:00
|
|
|
serialboot
|
|
|
|
a
|
|
|
|
|
2011-09-07 10:20:08 +03:00
|
|
|
only standby.fpg failed the CRC check
|
|
|
|
|
|
|
|
Reflashed the standby bitstream:
|
|
|
|
|
|
|
|
wget http://milkymist.org/updates/2011-07-13/for-rc3/fjmem.bit
|
|
|
|
(or http://milkymist.org/updates/fjmem.bit.bz2)
|
|
|
|
wget http://milkymist.org/updates/current/standby.fpg
|
|
|
|
|
|
|
|
jtag
|
|
|
|
|
|
|
|
cable milkymist
|
|
|
|
detect
|
|
|
|
instruction CFG_OUT 000100 BYPASS
|
|
|
|
instruction CFG_IN 000101 BYPASS
|
|
|
|
pld load fjmem.bit
|
|
|
|
initbus fjmem opcode=000010
|
|
|
|
frequency 6000000
|
|
|
|
detectflash 0
|
|
|
|
endian big
|
|
|
|
flashmem 0 standby.fpg noverify
|
|
|
|
|
|
|
|
M1 enters standby normally again.
|
2011-09-07 15:20:37 +03:00
|
|
|
|
|
|
|
Running "loop2": power-cycle, sleep 2 s, jtag-boot, sleep 10 seconds,
|
|
|
|
which is enough to begin (but not finish) booting RTEMS, then
|
|
|
|
power-cycle again (off-time is 5 s).
|
|
|
|
|
|
|
|
1: started around 05:01. Observed until about 200-300 (06:00-06:30)
|
|
|
|
that standby was okay.
|
|
|
|
~730 (08:48): observed that standby didn't load anymore (note: due to
|
|
|
|
a bug in labsw, power is not turned on in about 5-10% of the cycles,
|
|
|
|
so the real cycle count should be around 650-700.)
|
|
|
|
|
|
|
|
Standby bitstream difference:
|
|
|
|
|
|
|
|
-00000080 00 00 4c 83 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |..L...L......G.C|
|
|
|
|
+00000080 00 00 00 00 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |......L......G.C|
|
|
|
|
|
|
|
|
Reflashed standby and locked the NOR. Testing with loop2 again.
|
|
|
|
|
|
|
|
1 (09:18): started
|
2011-09-08 14:34:40 +03:00
|
|
|
... continuing through the night ...
|
|
|
|
|
|
|
|
--- Thu 2011-09-08 ------------------------------------------------------------
|
|
|
|
|
|
|
|
3483 (03:18): standby is good so far
|
|
|
|
4325 (07:40): manually ended test. Standby is still good, but starting
|
|
|
|
with cycle 3704, booting RTEMS failed with
|
|
|
|
|
|
|
|
I: Booting from flash...
|
|
|
|
I: Loading 1889692 bytes from flash...
|
|
|
|
E: CRC failed (expected aa12a56a, got 68ec25e6)
|
|
|
|
|
|
|
|
A CRC check yielded:
|
|
|
|
|
|
|
|
Images CRC:
|
|
|
|
Checking : standby.fpg CRC passed (got c58e8905)
|
|
|
|
Checking : soc-rescue.fpg CRC passed (got 30dcc535)
|
|
|
|
Checking : bios-rescue.bin(CRC) CRC passed (got c78353fa)
|
|
|
|
Checking : splash-rescue.raw CRC passed (got e8ff824f)
|
|
|
|
Checking : flickernoise.fbi(rescue)(CRC) CRC passed (got aa12a56a)
|
|
|
|
Checking : soc.fpg CRC passed (got 3a31e737)
|
|
|
|
Checking : bios.bin(CRC) CRC passed (got 86e23684)
|
|
|
|
Checking : splash.raw CRC passed (got 978f860c)
|
|
|
|
Checking : flickernoise.fbi(CRC) CRC failed (expected aa12a56a, got 68ec25e6)
|
|
|
|
|
|
|
|
Read back the FlickerNoise partition with
|
|
|
|
|
|
|
|
readmem 0x920000 0x0400000 fn.bin
|
|
|
|
|
|
|
|
Compare with the original:
|
|
|
|
|
|
|
|
wget http://www.milkymist.org/updates/2011-07-13/flickernoise.fbi
|
|
|
|
md5sum flickernoise.fbi
|
|
|
|
5b7367e71bda306b080bde124615859b flickernoise.fbi
|
|
|
|
|
|
|
|
diff -u <(hexdump -C flickernoise.fbi) <(hexdump -C fn.bin)
|
|
|
|
|
|
|
|
...
|
|
|
|
-0008a380 28 43 00 00 34 64 00 01 58 44 00 00 5c 60 00 1e |(C..4d..XD..\`..|
|
|
|
|
+0008a380 28 43 00 00 00 00 00 01 58 44 00 00 5c 60 00 1e |(C......XD..\`..|
|
|
|
|
...
|
2011-09-09 01:07:18 +03:00
|
|
|
|
|
|
|
Recovered the FN partition and unlocked the NOR:
|
|
|
|
|
|
|
|
flashmem 0x920000 flickernoise.fbi noverify
|
|
|
|
unlockflash 0 55
|
|
|
|
|
|
|
|
New test series with script loop4. This differs from loop2 in that
|
|
|
|
it uses "pld reconfigure" to return to standby, instead of
|
|
|
|
power-cycling. If we still observe corruption with this test, then
|
|
|
|
a software problem would be to blame.
|
|
|
|
|
|
|
|
1 (09:11): started
|
2011-09-09 02:06:58 +03:00
|
|
|
2509 (19:33): standby looks good
|
|
|
|
|
|
|
|
All CRC checks pass. Verified that NOR was unlocked:
|
|
|
|
|
|
|
|
(load fjmem, etc.)
|
|
|
|
peek 0 # show old value
|
|
|
|
poke 0 0x40 0 0x0000 # Word Program
|
|
|
|
peek 0 # read back status (0x80 if okay, 0x92 if locked)
|
|
|
|
poke 0 0xff # Read Array (switch back to normal operation)
|
2011-09-09 17:07:06 +03:00
|
|
|
|
2011-09-10 22:33:18 +03:00
|
|
|
Took labsw offline to analyze occasional failure to switch. Failure
|
|
|
|
was difficult to reproduce. Also opened labsw to tighten a loose nut.
|
|
|
|
Afterwards (Friday run), labsw showed much fewer switch failures.
|
|
|
|
|
2011-09-09 17:07:06 +03:00
|
|
|
--- Fri 2011-09-09 ------------------------------------------------------------
|
|
|
|
|
|
|
|
New test with script "loop5". This time, we only power cycle but don't
|
|
|
|
try to boot out of standby. The purpose of this test is to confirm that
|
|
|
|
NOR corruption does not occur when powering down while in standby.
|
|
|
|
|
|
|
|
1 (11:04): started
|
2011-09-10 22:33:18 +03:00
|
|
|
200 (11:28:): stopped to issue "unlockflash 0 105" to make sure all of
|
|
|
|
the NOR is unlocked, just in case
|
|
|
|
|
|
|
|
Also checked CRCs. All is well.
|
|
|
|
|
|
|
|
1 (11:31): started
|
|
|
|
2637 (16:53): stopped. standby looks good.
|
|
|
|
|
|
|
|
All partitions pass the CRC check.
|
|
|
|
|
|
|
|
Repeating loop2 to make sure the NOR corruption hasn't disappeared for
|
|
|
|
an unrelated reason. System is connected to oscilloscope monitoring the
|
|
|
|
M1 DC in voltage. This connection provides grounding of DC in.
|
|
|
|
|
|
|
|
1 (16:56): started
|
|
|
|
|
|
|
|
--- Sat 2011-09-10 ------------------------------------------------------------
|
|
|
|
|
|
|
|
2428 (04:57): standby still okay
|
|
|
|
2440 (05::01): disconnected oscilloscope
|
|
|
|
2463 (05:08): stopped test
|
|
|
|
|
|
|
|
All partitions pass the CRC check. Read back the standby partition and
|
|
|
|
also found no corruption in bitwise comparison. Furthermore, the unused
|
|
|
|
area showed the expected 0xffff pattern.
|
|
|
|
|
|
|
|
1 (05:14): restarted test, without oscilloscope.
|
|
|
|
2213 (16:11): standby still okay
|
|
|
|
|
|
|
|
All partitions pass the CRC check. Unused area of standby shows 0xffff.
|
|
|
|
|
|
|
|
Prepared new test (loop7): like loop2, but make a "false start" of
|
|
|
|
turning on both channels and immediately turn them off again, wait 16
|
|
|
|
seconds, and only then power up properly. This would roughly correspond
|
|
|
|
to labsw failing to turn on, as observed in the test runs in which NOR
|
|
|
|
corruption occurred.
|
|
|
|
|
|
|
|
1 (16:27): started loop7 test
|
2011-09-11 18:42:11 +03:00
|
|
|
... continuing through the night ...
|
|
|
|
|
|
|
|
--- Sun 2011-09-11 ------------------------------------------------------------
|
|
|
|
|
|
|
|
2001 (11:58): standby okay
|
|
|
|
|
|
|
|
All partitions pass the CRC check. Unused area of standby shows 0xffff.
|
|
|
|
|
|
|
|
Confirmed writability of NOR at address 0x80000 and at address 0.
|
|
|
|
Instructions used at address 0x80000:
|
|
|
|
|
|
|
|
jtag> peek 0x80000
|
|
|
|
URJ_BUS_READ(0x00080000) = 0xFFFF (65535)
|
|
|
|
jtag> poke 0x80000 0x40 0x80000 0xffee
|
|
|
|
jtag> peek 0x80000
|
|
|
|
URJ_BUS_READ(0x00080000) = 0x0080 (128)
|
|
|
|
jtag> poke 0 0xff
|
|
|
|
jtag> peek 0x80000
|
|
|
|
URJ_BUS_READ(0x00080000) = 0xFFEE (65518)
|