Small loops, Duff's device, Core2 and Athlon64 
Thursday, April 19, 2007, 05:31 PM - Optimization
Today, I played a bit with loops unrolling and AMD Code Analyst.
One of the ways to unroll loops in C is to use a Duff's device. While a bit strange at first, this is quit handy for quick loops unrolling.

While having a look at the generated assembly code with and without it (compiler is Visual Studio 2005/8), I noticed a few interesting things:

*VC8 is able to unroll loops, provided that the number of runs is known at compile time. My loop should run 16x, and VC8 unrolled it 4x. It might seem trivial for a compiler, but I don't remember previous versions having this ability.

*On a Core2, my "manual" 4x unrolling (using a Duff's device) is still faster than the 4x auto-unrolled produced by VC8, due to different instructions scheduling.

The generated code flow is a bit different in both cases:

*VC8 auto-4x-unroll features a "continue" jump at the end of the code block for looping, targeting the beginning of the code block. In my 16x loop, this jump is followed 3 times, and skipped the last time.

*My Duff's version features an "exit" jump after the first part of the unrolling (1*code - conditional jump - 3*code). This jump is skipped the first 3 times, and followed on the last pass.

The interesting point is provided by Code Analyst, and its pipeline simulator. I used it to simulated an Athlon64 pipeline, and looked at the result:
In the case of the Duff's device, the exit jump is mispredicted 3 times, leading to the Duff's version being slower than the automatic VC8 unrolling. This being mispredicted 3x, it means that an Athlon64 is unable to predict such a branch, always predicting it to be followed.

However, the code is faster using Duff's device when running over a Core2. That means that using a trick such as this will perhaps increase performance a bit on Core2, but will quite reduce speed on Athlon64, by conflicting with its branch predictors.
Considering that VC8 is able to unroll some loops by itself, we should better think twice before playing with tricks such as Duff's one.

Comments 
Comments are not available for this entry.