Third time lucky, I hope. -fno-tree-cselim is much more specific than
disabling all optimization and results in a considerably less severe
performance reduction (about 30-40% of -O0).
While -O1 gets rid of the unexpected read in the simple code of a synthetic
test, it's still there in the more complex environment we have in ubb-la.c
Turning off optimization completely seems to do the trick.
Note that the pull-ups on DAT1 through DAT3 and the pull-whichever-way on
DAT0 are likely to get in the way of any real-life use. But it's good enough
for exploring the system's characteristics and limitations.