Thursday, December 29, 2016

a few words about loop direction

The following two loops produces the same output in opposite order.

void doLoops(unsigned int *piNumIterations, unsigned int *pOut)
{
    for (unsigned int uiLoopControl = 0; uiLoopControl < *piNumIterations; uiLoopControl++)
    {
        *pOut = uiLoopControl;
        printf("%d\n", uiLoopControl);
    }

    for (unsigned int uiLoopControl = *piNumIterations; uiLoopControl-- >0; )
    {
        *pOut = uiLoopControl;
        printf("%d\n", uiLoopControl);
    }
}


Aliasing is the reality that two different pointers may point to the same memory.  In this case piNumIterations and pOut could point to the same underlying UINT.  In the first loop, this means that the compiler is obligated to compare the loop variant against memory in each iteration.

    for (unsigned int uiLoopControl = 0; uiLoopControl < *piNumIterations; uiLoopControl++)
000000013FF6107F  xor         ebx,ebx 
000000013FF61081  mov         rdi,rdx 
000000013FF61084  mov         rsi,rcx 
000000013FF61087  cmp         dword ptr [rcx],ebx 
000000013FF61089  jbe         doLoops+36h (013FF610A6h) 
000000013FF6108B  nop         dword ptr [rax+rax] 
    {
        *pOut = uiLoopControl;
        printf("%d\n", uiLoopControl);
000000013FF61090  mov         edx,ebx 
000000013FF61092  mov         dword ptr [rdi],ebx 
000000013FF61094  lea         rcx,[string "%d\n" (013FF62200h)] 
000000013FF6109B  call        printf (013FF61010h) 
000000013FF610A0  inc         ebx 
000000013FF610A2  cmp         ebx,dword ptr [rsi] 
000000013FF610A4  jb          doLoops+20h (013FF61090h) 
    }

    for (unsigned int uiLoopControl = *piNumIterations; uiLoopControl-- >0; )
000000013FF610A6  mov         ebx,dword ptr [rsi] 
000000013FF610A8  test        ebx,ebx 
000000013FF610AA  je          doLoops+56h (013FF610C6h) 
000000013FF610AC  nop         dword ptr [rax] 
000000013FF610B0  dec         ebx 
    {
        *pOut = uiLoopControl;
        printf("%d\n", uiLoopControl);
000000013FF610B2  lea         rcx,[string "%d\n" (013FF62200h)] 
000000013FF610B9  mov         edx,ebx 
000000013FF610BB  mov         dword ptr [rdi],ebx 
000000013FF610BD  call        printf (013FF61010h) 
000000013FF610C2  test        ebx,ebx 
000000013FF610C4  jne         doLoops+40h (013FF610B0h) 
    }
}

This means that the second loop can be preferable if only the number of iterations is important, not the order.
The second loop also uses one fewer registers inside the loop, so in more real-world scenarios is less likely to cause a register spill.
While the example above is done using pointers, the more real world case is the memory from a struct or class member being aliased.  Some of this may be mitigated using the __restrict keyword.  It's better when possible to structure the code to allow the compiler to take care of the restriction.

No comments:

Post a Comment