Wednesday, December 21, 2016

let's qsort - sort comparison functions using SSE

C has had built-in sort for decades.  This mostly takes the form of a qsort, which traditionally takes a function that takes void pointers.

The following is a very simple cast that makes this much safer.  Note that since this contains only the one call, the compiler will happily inline, and this wrapper will add zero cost.

typedef int(__cdecl *fnPtFuncCompareType)(const void *arg1, const void *arg2);
template <typename T>
inline void qsort_typesafe(
    __inout_ecount(num) T *base,
    const size_t num,
    int(__cdecl *fnPtFuncCompareType)(const T *, const T *)
)
{
    ::qsort(base, num, sizeof(T), (fnPtFuncCompareType)fnCompare);
}


So all of the discussed comparison functions will take actual types.
For the simple types this will be as follows, using the formulation from "Hacker's Delight - Second Edition" section "2-9 Three-Valued Compare Function".  Note that the second __forceinline will be ignored when the function is used as a parameter for qsort.

// generic comparison function
// returns  1 iff a>b
// returns  0 iff a==b
// returns -1 iff a<b
template<typename T>
__forceinline static int __cdecl compareTwo(T a, T b)
{
    int iRet = (a>b) - (a<b);
    return iRet;
}

// generic comparison function for use in qsort
template<typename T>
__forceinline static int __cdecl compareTwoPtr(const T *a, const T *b)
{
    return compareTwo(*a, *b);
}


And this works wonderfully for signed and unsigned standard types.  For the built in numerical types this gives correct results without branches, but doesn't have the underflow/overflow problems.
But I seem to often end up doing comparisons on structures composed of these basic types.
We want to sort first by one field, then another, and so on...

typedef struct
{
    int rate;
    int duration;
} TwoIntStructure;
C_ASSERT(8 == sizeof(TwoIntStructure));
typedef struct
{
    int rate;
    int duration;
    int torque;
    int growth;
} FourIntStructure;
C_ASSERT(16 == sizeof(FourIntStructure));
int __cdecl compareTwo_TwoIntStructure(const TwoIntStructure *pa, const TwoIntStructure *pb)
{
    __m128i ma = _mm_loadl_epi64((__m128i *)pa);
    __m128i mb = _mm_loadl_epi64((__m128i *)pb);

    int intRet = _mm_movemask_epi8(_mm_cmpgt_epi32(ma, mb)) - _mm_movemask_epi8(_mm_cmpgt_epi32(mb, ma));
    return intRet;
}
int __cdecl compareTwo_FourIntStructure(const FourIntStructure *pa, const FourIntStructure *pb)
{
    __m128i ma = _mm_loadu_si128((__m128i *)pa);
    __m128i mb = _mm_loadu_si128((__m128i *)pb);

    int intRet = _mm_movemask_epi8(_mm_cmpgt_epi32(ma, mb)) - _mm_movemask_epi8(_mm_cmpgt_epi32(mb, ma));
    return intRet;
}

This gives good results when called with something like:
    qsort_typesafe(ax, ARRAYSIZE(ax), compareTwo_TwoIntStructure);
    qsort_typesafe(ay, ARRAYSIZE(ay), compareTwo_FourIntStructure);


When we are using C++, this changes slightly:

// Ascending sorting function, last struct member is primary sort key
struct SFourIntStructureSort
{
    bool operator()(const FourIntStructure* pa, const FourIntStructure* pb)
    {
        __m128i ma = _mm_loadu_si128(reinterpret_cast<const __m128i *>(pa));
        __m128i mb = _mm_loadu_si128(reinterpret_cast<const __m128i *>(pb));
        bool bRet = _mm_movemask_epi8(_mm_cmpgt_epi32(ma, mb))<=_mm_movemask_epi8(_mm_cmpgt_epi32(mb, ma));
        return bRet;
    }
};

with a call like:
    std::vector<FourIntStructure*> vecFourIntStructure;
    std::sort(vecFourIntStructure.begin(), vecFourIntStructure.end(), SFourIntStructureSort());


Do note that as written the primary sort key is the last of the compared elements in the structs.

No comments:

Post a Comment