MicroPerf: converting floats to unsigned bytes using SSE

Floating point numbers and unsigned bytes are both used in graphics. Generally 0..1 for float and obviously 0..255 for unsigned byte.
It's nice when we can stick to one or the other, but sometimes that doesn't work and we have to convert between them.

There are details. Once again there are always details. And I've seen people get them wrong. Often I've seen this implemented with truncation rather than rounding - which makes me sad. I've also seen a direct cast to byte, without limiting to zero and 255 - which makes me sad.

Pretty much all of the float SSE instructions are affected by the rounding mode. Pretty much nobody changes these bits without changing them back when they're done. Setting the bits is a little expensive - pretty much everything has to stop in order to do this. But amortized over a function call it's not bad.

__forceinline unsigned char Saturate_Int32_To_UnsignedInt8(const int x)
{
    return std::min(255, std::max(0, x));
}

__forceinline unsigned char Saturate_float_To_UnsignedInt8(const float x)
{
    // since we can't set the rounding mode for simple ANSI, we'll do the normal thing and add 0.5f and truncate towards zero.
    int ic = (int)((255 * x)+0.5f);
    return Saturate_Int32_To_UnsignedInt8(ic);
}

void convert_floats_to_bytes(const float *pF, unsigned char *pUC, const size_t cF)
{
    // setting the rounding mode is generally unnecessary... But to be safe....
    const unsigned int uiStoredRoundingMode = _MM_GET_ROUNDING_MODE();
    _MM_SET_ROUNDING_MODE(_MM_ROUND_NEAREST);
    size_t cFRemaining = cF;
    while (cFRemaining & 3)
    {
        cFRemaining--;
        pUC[cFRemaining] = Saturate_float_To_UnsignedInt8(pF[cFRemaining]);
    }
    if (cFRemaining > 0)
    {
        __m128 mScale = _mm_set1_ps(255);
        if (cFRemaining & 4)
        {
            cFRemaining -= 4;
            __m128 mF = _mm_loadu_ps(&pF[cFRemaining]);
            __m128 mFScaled = _mm_mul_ps(mF, mScale);
            __m128i mIScaled_epi32 = _mm_cvtps_epi32(mFScaled);
            __m128i mIScaled_epi16 = _mm_packs_epi32(mIScaled_epi32, _mm_setzero_si128());
            __m128i mIScaled_epu8 = _mm_packus_epi16(mIScaled_epi16, _mm_setzero_si128());
            int iConvertedOut;
            iConvertedOut = _mm_cvtsi128_si32(mIScaled_epu8);
            *((int *)&pUC[cFRemaining]) = iConvertedOut;
        }
        while (cFRemaining > 0)
        {
            cFRemaining -= 4;
            __m128 mFHi = _mm_loadu_ps(&pF[cFRemaining]);
            cFRemaining -= 4;
            __m128 mFLo = _mm_loadu_ps(&pF[cFRemaining]);
            __m128 mFScaledHi = _mm_mul_ps(mFHi, mScale);
            __m128 mFScaledLo = _mm_mul_ps(mFLo, mScale);
            __m128i mIScaledHi_epi32 = _mm_cvtps_epi32(mFScaledHi);
            __m128i mIScaledLo_epi32 = _mm_cvtps_epi32(mFScaledLo);
            __m128i mIScaled_epi16 = _mm_packs_epi32(mIScaledLo_epi32, mIScaledHi_epi32);
            __m128i mIScaled_epu8 = _mm_packus_epi16(mIScaled_epi16, _mm_setzero_si128());
            __int64 iConvertedOut;
#if defined (_M_X64)
            iConvertedOut = _mm_cvtsi128_si64(mIScaled_epu8);
#else
            iConvertedOut = mIScaled_epu8.m128i_i64[0];
#endif
            *((__int64 *)&pUC[cFRemaining]) = iConvertedOut;
        }
    }
    _MM_SET_ROUNDING_MODE(uiStoredRoundingMode);
}

This doesn't make me sad.

MicroPerf

Thursday, March 23, 2017

converting floats to unsigned bytes using SSE

No comments:

Post a Comment