MicroPerf: descriptive statistics using SSE2 : higher moments

There are a bunch of descriptive statistics to describe a given set of numbers. The most commonly used are the average and standard deviation. There are also more esoteric statistics like skewness and kurtosis.

The book "Numerical Recipes in C" is a common source for routines to calculate these statistics. This can be adapted to take advantage of SSE2.

void moment_SSE2(const float * const pA, const UINT n, float *outAve, float *outAdev, float *outSdev, float *outVar, float *outSkew, float *outCurt)
{
    float ave = average_SSE2(pA, n);
    *outAve = ave;
    *outAdev = 0;
    *outSdev = 0;
    *outVar = 0;
    *outSkew = 0;
    *outCurt = 0;
    if (n <= 1)
    {
        return;
    }
    __m128 mmAve = _mm_set1_ps(ave);
    __m128 mmAbs = _mm_castsi128_ps(_mm_set1_epi32(0x7fffffff));
    __m128 mmAdev = _mm_setzero_ps();
    __m128 mmSdev= _mm_setzero_ps();
    __m128 mmVar = _mm_setzero_ps();
    __m128 mmSkew = _mm_setzero_ps();
    __m128 mmCurt = _mm_setzero_ps();
    __m128 mmEp= _mm_setzero_ps();
    UINT nRemaining = n;
    __m128 mmS;
    if (nRemaining & 3)
    {
        __m128i mmDataMask = _mm_setr_epi32((nRemaining & 1)?-1:0, 0, (nRemaining & 2) ? -1 : 0, (nRemaining & 2) ? -1 : 0);
        __m128 mmData = _mm_setzero_ps();
        if (nRemaining & 1)
        {
            nRemaining--;
            mmData = _mm_load_ss(&pA[nRemaining]);
        }
        if (nRemaining & 2)
        {
            nRemaining -= 2;
            mmData = _mm_loadh_pi(mmData, (const __m64*)&pA[nRemaining]);
        }
        mmS = _mm_and_ps(_mm_castsi128_ps( mmDataMask), _mm_sub_ps(mmData, mmAve));
    }
    else
    {
        nRemaining -= 4;
        __m128 mmData = _mm_loadu_ps(&pA[nRemaining]);
        mmS = _mm_sub_ps(mmData, mmAve);
    }
    while (true)
    {
        __m128 mmAbsS = _mm_and_ps(mmS, mmAbs);
        mmAdev = _mm_add_ps(mmAdev, mmAbsS);
        mmEp = _mm_add_ps(mmEp, mmS);
        __m128 mmSS = _mm_mul_ps(mmS, mmS);
        mmVar = _mm_add_ps(mmVar, mmSS);
        mmSkew = _mm_add_ps(mmSkew, _mm_mul_ps(mmSS, mmS));
        mmCurt = _mm_add_ps(mmCurt, _mm_mul_ps(mmSS, mmSS));
        if (nRemaining <= 0)
        {
            break;
        }
        nRemaining -= 4;
        __m128 mmData = _mm_loadu_ps(&pA[nRemaining]);
        mmS = _mm_sub_ps(mmData, mmAve);
    }
    *outAdev = horizontalSum_SSE2(mmAdev) / n;
    float ep = horizontalSum_SSE2(mmEp);
    float var = (horizontalSum_SSE2(mmVar) - ep*ep / n) / (n - 1);
    *outVar = var;
    float sdev = sqrtf(var);
    *outSdev = sdev;
    if (var)
    {
        *outSkew = horizontalSum_SSE2(mmSkew) / (n*var*sdev);
        *outCurt = (horizontalSum_SSE2(mmCurt) / (n*var*var)) - 3.0f;
    }
}
This builds on helper functions from previous posts.

The central loop body looks a bit strange. The loop condition has been moved inside the "while loop". This allows use of the same loop body for the first iteration that may not contain a full four elements. A mask is created which zeros the missing elements of this first iteration. These zero elements don't change the resulting sums.

This seems useful.

MicroPerf

Tuesday, December 13, 2016

descriptive statistics using SSE2 : higher moments

No comments:

Post a Comment