Monday, September 25, 2017

Reversing the 16bit numbers in an XMM register using SSE2 and SSSE3

A bunch of algorithms end up requiring reversing the order of a bunch of bytes.  In particular I've seen this come up a lot in code to decompress GIF images. A diagonal string distance algorithm on standard char strings would require reversing a string of bytes.

But my world doesn't involve many single width strings.  My world tends to have 16bit Unicode strings.  So reversing a string of 16bit characters will instead come up.  16bit numbers are treated the same as 16bit Unicode characters.

This can be done just with SSE2 instructions.

__m128i _mm_reverse_epi16_SSE2(const __m128i &mToReverse)
{
    __m128i mReversed_epi32 = _mm_shuffle_epi32(mToReverse, _MM_SHUFFLE(0, 1, 2, 3));
    __m128i mLowDone = _mm_shufflelo_epi16(mReversed_epi32, _MM_SHUFFLE(2, 3, 0, 1));
    __m128i mResult = _mm_shufflehi_epi16(mLowDone, _MM_SHUFFLE(2, 3, 0, 1));
    return mResult;
}


But in fact can be done in a single instruction in SSSE3 (since the pshufb instruction supports using a  memory address for the second parameter).

__m128i _mm_reverse_epi16_SSSE3(const __m128i &mToReverse)
{
    static const __m128i reverseKeys = { 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1 };
    __m128i mResult = _mm_shuffle_epi8(mToReverse, reverseKeys);
    return mResult;
}


This is satisfying.

No comments:

Post a Comment