MicroPerf: another dot product

Saturday, December 10, 2016

another dot product - double

In practice, use of double precision float does not make sense. Double precision float normally is more memory and processing than is actually needed. They are absolutely necessary for chaotic systems and at very tiny scales. But more often double precision is really used to address numerical issues such as the near zero difference between large numbers in the discriminant of the quadratic equation, which are better solved in other ways. But use of doubles is still sometimes necessary.

The structure of the double precision float dot product is very similar to the standard float dot product.

__forceinline double horizontalSum_SSE2(const __m128d &mAB)
{
    __m128d mBB = _mm_unpackhi_pd(mAB, mAB);
    __m128d mApB = _mm_add_pd(mAB, mBB);
    return _mm_cvtsd_f64(mApB);
}
double dotProduct_SSE2_unaligned(const double *pA, const double *pB, UINT uiABOrig)
{
    UINT uiAB = uiABOrig;
    double dRet = 0;
    if (uiAB & 1)
    {
        uiAB--;
        dRet = pA[uiAB] * pB[uiAB];
    }
    if (uiAB > 0)
    {
        __m128d mSummedAB = _mm_setzero_pd();
        do
        {
            uiAB -= 2;
            __m128d mA = _mm_loadu_pd((const double *)&pA[uiAB]);
            __m128d mB = _mm_loadu_pd((const double *)&pB[uiAB]);
            mSummedAB = _mm_add_pd(mSummedAB,_mm_mul_pd(mA, mB));
        } while (uiAB > 0);
        dRet += horizontalSum_SSE2(mSummedAB);
    }
    return dRet;
}

MicroPerf

Saturday, December 10, 2016

another dot product - double

No comments:

Post a Comment