MicroPerf: December 2017

We often need to sign extend a number into a larger type. To keep the same value we sign extend. Unsigned numbers do not have a sign bit, so will be left padded with zeros.

The unsigned versions are pretty obvious:

void _mm_cvtepu32_epi64(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
mOutLo = _mm_unpacklo_epi32(mIn, _mm_setzero_si128());
mOutHi = _mm_unpackhi_epi32(mIn, _mm_setzero_si128());
}
void _mm_cvtepu16_epi32(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
mOutLo = _mm_unpacklo_epi16(mIn, _mm_setzero_si128());
mOutHi = _mm_unpackhi_epi16(mIn, _mm_setzero_si128());
}
void _mm_cvtepu8_epi16(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
mOutLo = _mm_unpacklo_epi8(mIn, _mm_setzero_si128());
mOutHi = _mm_unpackhi_epi8(mIn, _mm_setzero_si128());
}

The signed version aren't supported directly until later iterations of the instruction set with _mm_cvtepi32_epi64 and other similar instructions, but it isn't too difficult produce from the base intrinsics.

void _mm_cvtepi16_epi32(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
__m128i mDupedLo = _mm_unpacklo_epi16(mIn, mIn);
__m128i mDupedHi = _mm_unpackhi_epi16(mIn, mIn);
mOutLo = _mm_srai_epi32(mDupedLo , 16);
mOutHi = _mm_srai_epi32(mDupedHi , 16);
}
void _mm_cvtepi8_epi16(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
__m128i mDupedLo = _mm_unpacklo_epi8(mIn, mIn);
__m128i mDupedHi = _mm_unpackhi_epi8(mIn, mIn);
mOutLo = _mm_srai_epi16(mDupedLo , 8);
mOutHi = _mm_srai_epi16(mDupedHi , 8);
}

But since there isn't a 64bit arithmetic shift, we need to do something different for the promotion from 32bit signed to 64 bit signed. We extend the sign bit from each of the 4 lanes to all 32 bits of the lane.

Then we interleave the 32 bit results from the signs and values into the final 64 bit outputs.

void _mm_cvtepi32_epi64(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
__m128i mSigns = _mm_srai_epi32(mIn, 31);
mOutLo = _mm_unpacklo_epi32(mIn, mSigns);
mOutHi = _mm_unpackhi_epi32(mIn, mSigns);
}

This requires 3 registers rather than the 2 registers of the other width version.

MicroPerf

Wednesday, December 20, 2017

Converting integers to larger integer types