Wednesday, March 1, 2017

Perfect shuffles of 16bit numbers in two registers

In a single 128bit register, one can store eight 16bit numbers.  For a pair of these 128bit registers, the instruction pair _mm_unpacklo_epi16 and _mm_unpackhi_epi16, called with the same two registers as arguments, will conceptually perform a perfect shuffle of the 16-bit numbers in the registers.

void printm_epu16(const __m128i &mx)
{
    for (int iA = 0; iA < ARRAYSIZE(mx.m128i_u16); iA++)
    {
        printf("%x", mx.m128i_u16[iA]);
    }
}

int main()
{
    __m128i a = _mm_setr_epi16(0, 1, 2, 3, 4, 5, 6, 7);
    __m128i b = _mm_setr_epi16(8, 9, 10, 11, 12, 13, 14, 15);

    printm_epu16(a);
    printm_epu16(b);
    printf("\n");
    for (int i = 0; i < 4; i++)
    {
        __m128i aa = _mm_unpacklo_epi16(a, b);
        __m128i bb = _mm_unpackhi_epi16(a, b);
        a = aa;
        b = bb;
        printm_epu16(a);
        printm_epu16(b);
        printf("\n");
    }
}


will yield the output:

0123456789abcdef
08192a3b4c5d6e7f
048c159d26ae37bf
02468ace13579bdf
0123456789abcdef


We see that this cycle repeats after four iterations.

No comments:

Post a Comment