Friday, December 23, 2016

Polynomials - Horner Polynomial Evaluation for Sin(x) and Cos(x)

The SSE instruction set is missing a lot of the transcendental functions, like Sin and Cos.

But we can approximate them using polynomials.  In particular polynomials can be efficiently evaluated in Horner form.  This evaluates a monomial of order n using n multiplies, and n additions.

The coefficients for the polynomial can be calculated by a series of integrations.  Mathematica makes this easy:


Mathematica code for generating approximating coefficients for function f.  Changing the values of a and b changes the approximated range.  Changing the range of n in the table generation changes the degree of the polynomial. 
The calculations using these coefficients are then simple loops.

__m128 _mm_horner_ps(const __m128 &mx, const float * const coefs, const UINT cCoefs)
{
    if (cCoefs <= 0)
    {
        return _mm_setzero_ps();
    }
    UINT uiRemainingCoefs = cCoefs-1;
    __m128 mp = _mm_set1_ps(coefs[uiRemainingCoefs]);
    while (uiRemainingCoefs > 0)
    {
        uiRemainingCoefs--;
        mp = _mm_add_ps(_mm_mul_ps(mx, mp), _mm_set1_ps(coefs[uiRemainingCoefs]));
    }
    return mp;
}
__declspec(align(16)) static const UINT uiAbs[4] = { 0x7fffffff,0x7fffffff,0x7fffffff,0x7fffffff };
__m128 _mm_abs_ps(const __m128 mx)
{
    return _mm_and_ps(mx, _mm_load_ps(reinterpret_cast<const float *>(uiAbs)));
}
__m128 _mm_hornerEven_ps(const __m128 &mx, const float * const coefs, const UINT cCoefs)
{
    if (cCoefs <= 0)
    {
        return _mm_setzero_ps();
    }
    UINT uiRemainingCoefs = cCoefs - 1;
    __m128 ax = _mm_abs_ps(mx);
    __m128 mp = _mm_set1_ps(coefs[uiRemainingCoefs]);
    while (uiRemainingCoefs > 0)
    {
        uiRemainingCoefs--;
        mp = _mm_add_ps(_mm_mul_ps(ax, mp), _mm_set1_ps(coefs[uiRemainingCoefs]));
    }
    return mp;
}
__declspec(align(16)) static const UINT uiSign[4] = { 0x80000000,0x80000000,0x80000000,0x80000000 };
__m128 _mm_hornerOdd_ps(const __m128 &mx, const float * const coefs, const UINT cCoefs)
{
    if (cCoefs <= 0)
    {
        return _mm_setzero_ps();
    }
    UINT uiRemainingCoefs = cCoefs - 1;
    __m128 ax = _mm_abs_ps(mx);
    __m128 mSignX = _mm_andnot_ps(_mm_load_ps(reinterpret_cast<const float *>(uiAbs)), mx);
    __m128 mp = _mm_set1_ps(coefs[uiRemainingCoefs]);
    while (uiRemainingCoefs > 0)
    {
        uiRemainingCoefs--;
        mp = _mm_add_ps(_mm_mul_ps(ax, mp), _mm_set1_ps(coefs[uiRemainingCoefs]));
    }
    __m128 mResult = _mm_xor_ps(mp, mSignX);
    return mResult;
}


const float mCosCoefs[] = { 0.99856677684663489786f, 0, -0.49527656685594188504f, 0, 0.039207901766468234142f, 0, -0.00096832988461029163825f };
const float mCosEvenCoefs[] = { 0.99557640122455577189f, 0.069390161928493746222f, \
- 0.67150059990790674484f, 0.14249685301935530176f };
const float mSinOddCoefs[] = { 0.00059003711537529755905f, 0.98659497273144593606f, \
0.048989468371295105117f, -0.23111360502840283503f, \
0.036782872656058228926f };

This can then be called simply:
__m128 mCos = _mm_horner_ps(mx, mCosCoefs, ARRAYSIZE(mCosCoefs));
__m128 mCosEven = _mm_hornerEven_ps(mx, mCosEvenCoefs, ARRAYSIZE(mCosEvenCoefs));
__m128 mSinOdd= _mm_hornerOdd_ps(mx, mSinOddCoefs, ARRAYSIZE(mSinOddCoefs));


We can get slightly better approximations if we exploit the even/odd properties of the functions we are approximating. 

Both the even and odd evaluation functions mask away the sign bits of the calling values.  The odd evaluation function then applies the sign bits from the original calling values to the output result.

These methods with only a few coefficients give reasonable approximations over the range (-Pi,Pi).

No comments:

Post a Comment