Wednesday, May 31, 2017

a 2d horizontal sum in SSE2

2d vectors come up a lot in graphics.  Floats are treated in pairs with all of the normal linear algebra operations.

Adding four scaled 2d vectors is a straightforward operation.

We pass in the 2d vectors in 2 XMM registers, and the four element scaling values in a single XMM register.

__forceinline __m128 horizontalSum2D_SSE2(const __m128 mLeft, const __m128 mRight)
{
    __m128 mTop = _mm_add_ps(mLeft, mRight);
    __m128 mShuffleTop = _mm_shuffle_ps(mTop, mTop, _MM_SHUFFLE(1, 0, 3, 2));
    __m128 mRet = _mm_add_ps(mTop, mShuffleTop);
    return mRet;
}

__m128 scaleHorizontalSum2D_SSE2(const __m128 mLeft, const __m128 mRight, const __m128 mScales)
{
    __m128 mScaleLeft = _mm_unpacklo_ps(mScales, mScales);
    __m128 mScaleRight = _mm_unpackhi_ps(mScales, mScales);
    __m128 mMulLeft = _mm_mul_ps(mScaleLeft, mLeft);
    __m128 mMulRight = _mm_mul_ps(mScaleRight, mRight);
    __m128 mRet = horizontalSum2D_SSE2(mMulLeft, mMulRight);
    return mRet;
}

The result is returned in the an XMM register. This entire operation takes only 9 instructions.

No comments:

Post a Comment