Tuesday, December 13, 2016

using SSE to calculate the average of a set of numbers

The average of a group of numbers is one of the fundamental statistical descriptions of data.  It's also easy to calculate in Ansi.
The horizontalSum we have seen before.

__forceinline float horizontalSum_SSE2(const __m128 &mABCD)
{
    __m128 mCDCD = _mm_movehl_ps(mABCD, mABCD);
    __m128 mApCBpD = _mm_add_ps(mABCD, mCDCD);
    __m128 mBpD = _mm_shuffle_ps(mApCBpD, mApCBpD, 0x55);
    __m128 mApBpCpD = _mm_add_ps(mApCBpD, mBpD);
    return _mm_cvtss_f32(mApBpCpD);
}

float average_SSE2(const float * const pA, const UINT uiAOrig)
{
    __m128 mSummed = _mm_setzero_ps();
    UINT uiA = uiAOrig;
    if (uiA & 1)
    {
        uiA--;
        mSummed = _mm_load_ss(&pA[uiA]);
    }
    if (uiA & 2)
    {
        uiA -= 2;
        mSummed = _mm_loadh_pi(mSummed, (const __m64*)&pA[uiA]);
    }
    while (uiA > 0)
    {
        uiA -= 4;
        mSummed = _mm_add_ps(mSummed, _mm_loadu_ps(&pA[uiA]));
    }
    return horizontalSum_SSE2(mSummed)/ uiAOrig;
}


There's not much new here.  But I will mention the line
mSummed = _mm_loadh_pi(mSummed, (const __m64*)&pA[uiA]);
we load into the other half of the xmm register than was loaded in the first load.  It's also sometimes necessary to cast among types for loads and other operations. 

No comments:

Post a Comment