Simple bugs, difficult explanations

I was reading this excellent post from Mark Dalrymple, which links to an older post of his about a bug.

I know these trials of debugging quite well, and his posts reminded me of a particular bug I tracked down a few days ago; it was what Mark would call a “Five-Minute Bug”, but only because at the last moment, I had an epiphany that required so many obscure bits of knowledge that I was shocked I’d seen it at all.

Find the bug:

// MySeparateFile.m
float DoSomethingInteresting(id object)
    return 1.0;

// MyHeader.h
#define CheckInterestingStuff(object, result) ({ \
    float __value = DoSomethingInteresting(object); \
    NSLog(@"%s", (__value < (result) ? "YES" : "NO")); \
    __value; \

// MyFile.m
#import "MyHeader.h"

void Blah(void)
    float thing = CheckInterestingStuff(someObject, 1.1);
    NSLog(@"%f", thing);

This was the output:

[objc linenos="none"]

This made no sense at all. The function clearly returned 1.0, yet the thing variable equally clearly contained 240000.0. Here’s a hint: It’s not a compiler bug, a need for a clean build, anything wrong with the executing CPU, or related to Objective-C in any way.

Figured it out yet?

The key was a compiler warning that fired in the header file. It said “Warning: No previous declaration for DoSomethingInteresting...“, and was truncated there by Xcode. As with so many other warnings, the eye tends to slide right by it. But this was the problem.

Understanding why requires knowing one of C’s odder quirks, a throwback to the K&R days before there were such things as function prototypes. A function without a prototype or explicit declaration is assumed to return int.

With that in mind, the impossible return value suddenly makes sense. The compiler was effectively doing this:

float __actual_value = CheckInterestingStuff(someObject, 1.1);
int __the_compiler_saw_this = *(int *)&__actual_value; // OOPS
float result = (float)__the_compiler_saw_this;

But to know that, you’d have to know about details of how compilers pass and return function results, quirks of the C language itself, and what happens when you pretend a float is a int. These aren’t casual things that every programmer just knows, although they probably should be.

For those of you who know the ABI and are wondering, yes, on various architectures a floating-point return value should’ve been coming from a different register than an integer one. It doesn’t matter; the value is either converted wrong or coming from the wrong place, and either way it’s wrong.

And the moral of the story is, this is why I use -Weverything -Werror when I can.

Leave a Reply

Your email address will not be published.