Sunday, July 9, 2017

Using the inverse square root to calculate 2d normal vectors in SSE

The inverse square root (1/√x) comes up in computer graphics.  This particularly comes up in the calculation of unit direction vectors.  Game graphics have famously used a low-precision fast version.
We can calculate the unit direction vector for two separate vectors at the same time.

template<bool bUseFasterButLessAccurate>
__m128 directionVectorsSSE(const __m128 &mIn)
{
    __m128 mInSq = _mm_mul_ps(mIn, mIn);
    __m128 mHorizontalSumSq = _mm_add_ps(mInSq, _mm_shuffle_ps(mInSq, mInSq, _MM_SHUFFLE(2, 3, 0, 1)));
    __m128 mScaled;
    if (bUseFasterButLessAccurate)
    {
        __m128 mRsqrt = _mm_rsqrt_ps(mHorizontalSumSq);
        mScaled = _mm_mul_ps(mIn, mRsqrt);
    }
    else
    {
        __m128 mSqrt = _mm_sqrt_ps(mHorizontalSumSq);
        mScaled = _mm_div_ps(mIn, mSqrt);
    }
    __m128 mIsZero = _mm_cmpgt_ps(mHorizontalSumSq, _mm_setzero_ps());
    __m128 mScaledIsZero = _mm_and_ps(mScaled, mIsZero);
    return mScaledIsZero;
}

This will use either the direct calculation or less accurate version based on the template parameter.

It seems like there should be a way to make use of the actual horizontal sum instruction, but that would just move the required shuffle, so does not benefit the code.

We do have a division by something that could be zero.  We don't want to return NaN for a zero input vector.  The above code handles this by returning a zero vector by and'ing the division against the bit result of a comparison.

No comments:

Post a Comment