The inverse square root (1/√x) comes up in computer graphics. This particularly comes up in the calculation of unit direction vectors. Game graphics have famously used a low-precision fast version.
We can calculate the unit direction vector for two separate vectors at the same time.
template<bool bUseFasterButLessAccurate>
__m128 directionVectorsSSE(const __m128 &mIn)
{
__m128 mInSq = _mm_mul_ps(mIn, mIn);
__m128 mHorizontalSumSq = _mm_add_ps(mInSq, _mm_shuffle_ps(mInSq, mInSq, _MM_SHUFFLE(2, 3, 0, 1)));
__m128 mScaled;
if (bUseFasterButLessAccurate)
{
__m128 mRsqrt = _mm_rsqrt_ps(mHorizontalSumSq);
mScaled = _mm_mul_ps(mIn, mRsqrt);
}
else
{
__m128 mSqrt = _mm_sqrt_ps(mHorizontalSumSq);
mScaled = _mm_div_ps(mIn, mSqrt);
}
__m128 mIsZero = _mm_cmpgt_ps(mHorizontalSumSq, _mm_setzero_ps());
__m128 mScaledIsZero = _mm_and_ps(mScaled, mIsZero);
return mScaledIsZero;
}
This will use either the direct calculation or less accurate version based on the template parameter.
It seems like there should be a way to make use of the actual horizontal sum instruction, but that would just move the required shuffle, so does not benefit the code.
We do have a division by something that could be zero. We don't want to return NaN for a zero input vector. The above code handles this by returning a zero vector by and'ing the division against the bit result of a comparison.
No comments:
Post a Comment