From cpirazzi Mon Nov 24 23:42:36 1997
Subject: tserialio loops

I noticed that you changed all the for loops from this sort of
thing:

 1612:   urbidx = -1;
 1613:   for(i=0; i < N_URB; i++) 
 1614:     {
 1615:       if (urbtab[i].allocated == 0)
 1616:         {
 1617:           urbidx = i;
 1618:           break;
 1619:         }
 1620:     }

to this sort of thing

 1629:   urb = (tsio_urb_t *)urbtab;
 1630:   urbidx = -1;
 1631:   for(i=0; i < N_URB; i++, urb++) 
 1632:     {
 1633:       if (urb->allocated == 0)
 1634:         {
 1635:           urbidx = i;
 1636:           break;
 1637:         }
 1638:     }

despite what is commonly taught in compiler classes these days, I have
found that our compilers generate as good or better code with the
first form than the second form.

I discovered that one reason for this is that all our compiler people
are ex-fortran heads, and in fortran the first form is the only
choice, so they spent all their time optimizing the first form :)

for example, I put the two code segments above in tserialio.c and compiled
it non-debug.

here is the resulting assembly.  i rearranged a few instructions
used by both loops to form the preamble:

preamble (used equally by both):

[1613] 0x f54:  ff b4 00 40           sd        s4,64(sp)       
[1613] 0x f58:  24 14 00 10           li        s4,16
[1613] 0x f5c:  ff b6 00 48           sd        s6,72(sp)       
[1613] 0x f7c:  ff b1 00 30           sd        s1,48(sp)       
  
first form:

 1612:   urbidx = -1;
 1613:   for(i=0; i < N_URB; i++) 
[1613] 0x f38:  00 00 10 25           move      v0,zero
[1613] 0x f50:  8f 85 88 04           lw        a1,-30716(gp)   
         [f54 above]
         [f58 above]
         [f5c above]
[1612] 0x f60:  24 16 ff ff           li        s6,-1
[1613] 0x f64:  8c a3 00 00           lw        v1,0(a1)        
[1613] 0x f68:  10 60 00 77           beq       v1,zero,0x1148
[1613] 0x f6c:  24 a5 00 4c           addiu     a1,a1,76        
[1613] 0x f70:  24 42 00 01           addiu     v0,v0,1 
[1613] 0x f74:  54 54 ff fc           bnel      v0,s4,0xf68
[1613] 0x f78:  8c a3 00 00           lw        v1,0(a1)        
         [f7c above]
 1614:     {
 1615:       if (urbtab[i].allocated == 0)
 1616:         {
 1617:           urbidx = i;
 1618:           break;
[1618] 0x1148:  10 00 ff 8c           b         0xf7c
[1617] 0x114c:  00 40 b0 25           move      s6,v0
 1619:         }
 1620:     }
 
second form:

 1629:   urb = (tsio_urb_t *)urbtab;
 1630:   urbidx = -1;
[1630] 0x f94:  24 16 ff ff           li        s6,-1
[1629] 0x f98:  8f 91 88 04           lw        s1,-30716(gp)   
 1631:   for(i=0; i < N_URB; i++, urb++) 
[1631] 0x f9c:  00 00 10 25           move      v0,zero
[1631] 0x fa0:  8e 26 00 00           lw        a2,0(s1)        
[1631] 0x fa4:  10 c0 00 6a           beq       a2,zero,0x1150
[1631] 0x fa8:  26 31 00 4c           addiu     s1,s1,76        
[1631] 0x fac:  24 42 00 01           addiu     v0,v0,1 
[1631] 0x fb0:  54 54 ff fc           bnel      v0,s4,0xfa4
[1631] 0x fb4:  8e 26 00 00           lw        a2,0(s1)        
 1632:     {
 1633:       if (urb->allocated == 0)
 1634:         {
 1635:           urbidx = i;
 1636:           break;
[1636] 0x1150:  10 00 ff 99           b         0xfb8
[1635] 0x1154:  00 40 b0 25           move      s6,v0
 1637:         }
 1638:     }

notice that both loops are exactly 4 instructions long, and both
code segments are 11 instructions long.

I have found that this works for longer code segments as well.

--

besides, all those changes you put into tserialio.c makes it really
hard to merge the changes from bonsai :0

        - Chris Pirazzi

