All x86-64 cpus support the SSE and SSE2 instruction set. While all of the recently supported cpus have all of the features like SSE42 and the crc32c instructions, making truly compatible programs still requires testing for cpu support.
The naïve implementation checks for the support, stores it in a variable and tests the variable before cpu dependent instructions. The problem with this naïve approach is that optimizing compilers are allowed to reorder instructions from within the consequent (the "then" block) to before the variable test. So this may cause crashes.
The way that I have seen this dealt with is to encapsulate the different versions of the code into separate functions, all with identical signature. A function pointer is initialized at program start to the most appropriate version of the function.
Here is an implementation that tests for the SSE42 bit before using the crc32c instructions.
unsigned int reverseBits32(unsigned int x)
{
x = (x & 0x55555555) << 1 | (x & 0xAAAAAAAA) >> 1;
x = (x & 0x33333333) << 2 | (x & 0xCCCCCCCC) >> 2;
x = (x & 0x0F0F0F0F) << 4 | (x & 0xF0F0F0F0) >> 4;
x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;
x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
return x;
}
unsigned int crc32c_Ansi(const unsigned char *message, const unsigned int messageLength)
{
unsigned int crc = 0xffffffff;
for (unsigned int i = 0; i < messageLength; i++)
{
unsigned int byteMessage = reverseBits32(message[i]);
for (int j = 8; j >0; j--)
{
if (((int)(crc^byteMessage)) < 0)
{
crc = (crc << 1) ^ 0x1edc6f41;
}
else
{
crc = crc << 1;
}
byteMessage = byteMessage << 1;
}
}
return reverseBits32(~crc);
}
unsigned int crc32c_SSE42(const unsigned char *message, const unsigned int messageLength)
{
unsigned int crc = 0xffffffff;
for (unsigned int uiOnChar = 0; uiOnChar < messageLength; uiOnChar++)
{
crc = _mm_crc32_u8(crc, message[uiOnChar]);
}
return ~crc;
}
bool bHasSSE42()
{
int cpuInfo[4];
__cpuidex(cpuInfo, 1, 0);
return (cpuInfo[2] >> 20) & 1;
}
typedef unsigned int (*crc32c_type)(const unsigned char *message, const unsigned int messageLength);
crc32c_type crc32c = (bHasSSE42()? crc32c_SSE42:crc32c_Ansi);
int main()
{
const char szHelloWorld[] = "Hello World!";
printf("%s - 0x%08x", szHelloWorld, crc32c((const unsigned char *)szHelloWorld, (unsigned int)strlen(szHelloWorld)));
return 0;
}
Neither implementation is a particularly efficient implementation of crc32c (the Ansi version would be optimized by using large tables, the SSE42 version would be optimized by sending larger blocks of memory per instruction).
But in this example the function pointer crc32c is initialized to the appropriate version and called.
No comments:
Post a Comment