MicroPerf: Image convolution using a 4x4 filter

Some of the instructions are helpful in surprising ways.

SSSE3 introduced instructions for alignment. These have benefit for stream processing in 1 dimension. Linear filtering conceptually is a weighted average of each element multiplied by every element in the filter. The filter being zero outside a range allows this to be a tractable problem in the spatial (rather than frequency) domain.

This concept can be extended to multiple dimensions.

There are special cases of filters that are linearly separable. But this is not generally true.

The below code calculates a two dimensional image convolution for a 4x4 filter. The filter is treated as though the center is located at pixel (1,1)

void convolveGeneral_4x4Filter(const unsigned int iWidth, const unsigned int iHeight, const unsigned char * const pInputImage, unsigned char * const pOutputImage, const signed char * pFilter, const unsigned iWhichLine, const unsigned minY, const unsigned maxY, const int iFilterNormalize)
{
    for (unsigned int iY = minY; iY<=maxY; iY++)
    {
        int iPixelSum = 0;
        for (int mx = 0; mx <4; mx++)
        {
            for (int my =0;my<4;my++)
            {
                signed char FilterWeight = pFilter[my * 4 + mx];
                int iImageX = max(0, min((int)(iWhichLine + mx-1), (int)iWidth-1));
                int iImageY = max(0, min((int)(iY + my-1), (int)iHeight-1));
                iPixelSum += FilterWeight * pInputImage[iWidth*iImageY + iImageX];
            }
        }
        pOutputImage[iY*iWidth + iWhichLine] = iPixelSum / iFilterNormalize;
    }
}
There are special cases of the pixels outside the edges of the source image. There are a couple ways this could be dealt with. The code above handles this by treating pixels beyond the edge of the image as if they are continued at the same value indefinitely outside the image. Other possible choices are to treat pixels outside as black, or as reflections of the source image.

Performing this same convolution using SSE needs to be performed in columns. We shift in four pixels, and keep a running group. A dot product is performed against the group of 16 pixels as a single instruction. The code does not handle the special cases at the edges of the image.

void convolveLine_4x4Filter_SSE(const unsigned int iWidth, const unsigned int iHeight, const unsigned char * const pInputImage, unsigned char * const pOutputImage, const __m128i &mFilter, const unsigned iWhichLine, const int iFilterNormalize)
{
    __m128i mImagePixels = _mm_setzero_si128();
    const unsigned char *pInputPixelToLoad = &pInputImage[iWhichLine-1];
    for (unsigned int iY = 3; iY-->0; )
    {
        unsigned long ulPixels = *reinterpret_cast<const unsigned long*>(pInputPixelToLoad);
        __m128i mPixelsToAdd = _mm_cvtsi32_si128(ulPixels);
        mImagePixels = _mm_alignr_epi8(mPixelsToAdd, mImagePixels, 4);
        pInputPixelToLoad += iWidth;
    }
    unsigned char *pOutputPixel = &pOutputImage[iWhichLine+iWidth];
    for (UINT_PTR iY =(iHeight-3);iY-->0 ;)
    {
        unsigned long ulPixels = *reinterpret_cast<const unsigned long*>(pInputPixelToLoad);
        mImagePixels = _mm_alignr_epi8(_mm_cvtsi32_si128(ulPixels), mImagePixels,4);
        const int iPixel = horizontalSum_epi16_SSSE3(_mm_maddubs_epi16(mImagePixels, mFilter))/iFilterNormalize;
        *pOutputPixel = iPixel;
        pInputPixelToLoad += iWidth;
        pOutputPixel += iWidth;
    }
}

MicroPerf

Tuesday, December 27, 2016

Image convolution using a 4x4 filter

No comments:

Post a Comment