2d vectors come up a lot in graphics. Floats are treated in pairs with all of the normal linear algebra operations.
Adding four scaled 2d vectors is a straightforward operation.
We pass in the 2d vectors in 2 XMM registers, and the four element scaling values in a single XMM register.
__forceinline __m128 horizontalSum2D_SSE2(const __m128 mLeft, const __m128 mRight)
{
__m128 mTop = _mm_add_ps(mLeft, mRight);
__m128 mShuffleTop = _mm_shuffle_ps(mTop, mTop, _MM_SHUFFLE(1, 0, 3, 2));
__m128 mRet = _mm_add_ps(mTop, mShuffleTop);
return mRet;
}
__m128 scaleHorizontalSum2D_SSE2(const __m128 mLeft, const __m128 mRight, const __m128 mScales)
{
__m128 mScaleLeft = _mm_unpacklo_ps(mScales, mScales);
__m128 mScaleRight = _mm_unpackhi_ps(mScales, mScales);
__m128 mMulLeft = _mm_mul_ps(mScaleLeft, mLeft);
__m128 mMulRight = _mm_mul_ps(mScaleRight, mRight);
__m128 mRet = horizontalSum2D_SSE2(mMulLeft, mMulRight);
return mRet;
}
The result is returned in the an XMM register. This entire operation takes only 9 instructions.
No comments:
Post a Comment