We often need to sign extend a number into a larger type. To keep the same value we sign extend. Unsigned numbers do not have a sign bit, so will be left padded with zeros.
The unsigned versions are pretty obvious:
void _mm_cvtepu32_epi64(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
mOutLo = _mm_unpacklo_epi32(mIn, _mm_setzero_si128());
mOutHi = _mm_unpackhi_epi32(mIn, _mm_setzero_si128());
}
void _mm_cvtepu16_epi32(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
mOutLo = _mm_unpacklo_epi16(mIn, _mm_setzero_si128());
mOutHi = _mm_unpackhi_epi16(mIn, _mm_setzero_si128());
}
void _mm_cvtepu8_epi16(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
mOutLo = _mm_unpacklo_epi8(mIn, _mm_setzero_si128());
mOutHi = _mm_unpackhi_epi8(mIn, _mm_setzero_si128());
}
The signed version aren't supported directly until later iterations of the instruction set with _mm_cvtepi32_epi64 and other similar instructions, but it isn't too difficult produce from the base intrinsics.
void _mm_cvtepi16_epi32(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
__m128i mDupedLo = _mm_unpacklo_epi16(mIn, mIn);
__m128i mDupedHi = _mm_unpackhi_epi16(mIn, mIn);
mOutLo = _mm_srai_epi32(mDupedLo , 16);
mOutHi = _mm_srai_epi32(mDupedHi , 16);
}
void _mm_cvtepi8_epi16(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
__m128i mDupedLo = _mm_unpacklo_epi8(mIn, mIn);
__m128i mDupedHi = _mm_unpackhi_epi8(mIn, mIn);
mOutLo = _mm_srai_epi16(mDupedLo , 8);
mOutHi = _mm_srai_epi16(mDupedHi , 8);
}
But since there isn't a 64bit arithmetic shift, we need to do something different for the promotion from 32bit signed to 64 bit signed. We extend the sign bit from each of the 4 lanes to all 32 bits of the lane.
Then we interleave the 32 bit results from the signs and values into the final 64 bit outputs.
void _mm_cvtepi32_epi64(const __m128i &mIn, __m128i &mOutLo, __m128i &mOutHi)
{
__m128i mSigns = _mm_srai_epi32(mIn, 31);
mOutLo = _mm_unpacklo_epi32(mIn, mSigns);
mOutHi = _mm_unpackhi_epi32(mIn, mSigns);
}
This requires 3 registers rather than the 2 registers of the other width version.
No comments:
Post a Comment