--- Tue 2011-09-06 ------------------------------------------------------------ Running "loop": power-cycle, sleep 2 s, jtag-boot, sleep 70 seconds, which is enough to boot into FN and render "The Tunnel" for a moment, then power-cycle again (off-time is 5 s). Note that the test loop is "open-loop" and will cycle also past any problems. The first time a corrupt standby (or any other issue) is observed may therefore be well after the actual event. 1: started around 11:53 (M1 configuration is original, without locking) (around 500) visually checked boot process; standby was reached normally --- Wed 2011-09-07 ------------------------------------------------------------ 645: neocon stopped working (around 01:58) 666: detected neocon failure at run 666: restarted neocon; urjtag failed this cycle; back to normal at 667 684: checked LEDs again (first time since ~500) and found that standby may be failing. stopping test at 685 (around 02:50) for investigation. Downloaded the standby bitstream: wget https://raw.github.com/milkymist/scripts/master/scripts/reflash_m1.sh chmod 755 reflash_m1.sh ./reflash_m1.sh --read-flash Found two corruptions in the standby bitstream: diff -u <(hexdump -C standby.fpg) <(hexdump -C /home/root/.qi/milkymist/read-flash/2011...) -00000080 00 00 4c 83 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |..L...L......G.C| +00000080 00 00 4c 83 00 00 4c 87 00 00 c4 80 d8 47 cc 43 |..L...L......G.C| -00002840 00 08 cc 26 00 00 00 00 00 00 00 00 0c 44 00 98 |...&.........D..| +00002840 00 00 cc 26 00 00 00 00 00 00 00 00 0c 44 00 98 |...&.........D..| CRC-checked the partitions: git clone git://github.com/milkymist/milkymist cd milkymist/tools/ gcc -Wall -I. -o flterm flterm.c wget http://milkymist.org/updates/current/for-rc3/boot.4e53273.bin ./flterm --port /dev/ttyUSB0 --kernel boot.4e53273.bin serialboot a only standby.fpg failed the CRC check Reflashed the standby bitstream: wget http://milkymist.org/updates/2011-07-13/for-rc3/fjmem.bit (or http://milkymist.org/updates/fjmem.bit.bz2) wget http://milkymist.org/updates/current/standby.fpg jtag cable milkymist detect instruction CFG_OUT 000100 BYPASS instruction CFG_IN 000101 BYPASS pld load fjmem.bit initbus fjmem opcode=000010 frequency 6000000 detectflash 0 endian big flashmem 0 standby.fpg noverify M1 enters standby normally again. Running "loop2": power-cycle, sleep 2 s, jtag-boot, sleep 10 seconds, which is enough to begin (but not finish) booting RTEMS, then power-cycle again (off-time is 5 s). 1: started around 05:01. Observed until about 200-300 (06:00-06:30) that standby was okay. ~730 (08:48): observed that standby didn't load anymore (note: due to a bug in labsw, power is not turned on in about 5-10% of the cycles, so the real cycle count should be around 650-700.) Standby bitstream difference: -00000080 00 00 4c 83 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |..L...L......G.C| +00000080 00 00 00 00 00 00 4c 87 00 00 cc 85 d8 47 cc 43 |......L......G.C| Reflashed standby and locked the NOR. Testing with loop2 again. 1 (09:18): started