On Intel’s 64 and IA-32 Architectures Optimization Reference Manual there is a section on how to optimize code via branch removal.
I found this interesting and started looking around for some places that would explain this in a bit more detail. I found a great post on Stackoverflow explaining how to Remove Branching via Bitwise Select and some more examples of branch removing via the bit-twiddling hacks article that has been reference many times.
I haven’t yet found a place that explains this in more detail so I thought that I’d give it a go.
Example #1:
Code with branch is:
[code language=”cpp”]
if(value > some_other_value)
{
value *= 23;
}
else
{
value -= 5;
}
[/code]
Solution #1:
[code language=”cpp”]
const int Mask = (some_other_value-value)>>31;
value=((value*23)&Mask)|((value-5)&~Mask);
[/code]
How does this work?
- Get the difference between value and some_other_value
- Mask will be a string of 32 1’s if the value is negative (due to 2’s compliment), 0 if the value is positive.
- Both multiplications are done either way, but by doing a Bitwise AND with the Mask (and its compliment via ~Mask), we either set one of the multiplications to 0, and leave the other alone by multiplying it by 1, since the ~ operator will turn all the 1’s to 0, except for the last bit.
Though this example is too specific, if it were more general then we could create some #defines to easily let you write branchless code, while keeping some sort of readability by naming the defines something like:
NO_BRANCH_EVAL_LESS_THAN_MULT_SUB(value, some_other_value, 23, 5)
Which would expand itself to be equal to the expression in Solution #1
Example #2:
Calculate the absolute value of an integer without branching,
Solution #2:
[code language=”cpp”]
#define CHAR_BIT 8
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT – 1;
r = (v + mask) ^ mask;
[/code]
How does this work?
- Again, get the MSB out of this by doing a 31 bit right shift
- Add the mask to the original value, and XOR it with the mask
- This means that if the value is positive, it remains unmodified
- However, if the value was negative, then things get interesting as this approach exploits the concept of 2’s compliment
- The value of the mask in this case becomes -1 (or 32 1s as represented by 2’s compliment)
- By adding -1 to the original value, we get -2
- Lastly, by doing an XOR with the new value, we strip away the bits that represent -1 from the value, and arrive at the original value, with no negative sign.
Using -2 as an example:
Until next time!