A bunch of algorithms end up requiring reversing the order of a bunch of bytes. In particular I've seen this come up a lot in code to decompress GIF images. A diagonal string distance algorithm on standard char strings would require reversing a string of bytes.
But my world doesn't involve many single width strings. My world tends to have 16bit Unicode strings. So reversing a string of 16bit characters will instead come up. 16bit numbers are treated the same as 16bit Unicode characters.
This can be done just with SSE2 instructions.
__m128i _mm_reverse_epi16_SSE2(const __m128i &mToReverse)
{
__m128i mReversed_epi32 = _mm_shuffle_epi32(mToReverse, _MM_SHUFFLE(0, 1, 2, 3));
__m128i mLowDone = _mm_shufflelo_epi16(mReversed_epi32, _MM_SHUFFLE(2, 3, 0, 1));
__m128i mResult = _mm_shufflehi_epi16(mLowDone, _MM_SHUFFLE(2, 3, 0, 1));
return mResult;
}
But in fact can be done in a single instruction in SSSE3 (since the pshufb instruction supports using a memory address for the second parameter).
__m128i _mm_reverse_epi16_SSSE3(const __m128i &mToReverse)
{
static const __m128i reverseKeys = { 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1 };
__m128i mResult = _mm_shuffle_epi8(mToReverse, reverseKeys);
return mResult;
}
This is satisfying.
No comments:
Post a Comment