Tag Archives: gcc

Floating-point makes my brain melt

Sometimes it’s better to do the obvious than try to be “correct”.

How would you check if a floating-point variable x was zero? Common sense says x == 0.0 should work, right? But the compiler gets cranky about floating-point compares, yet zero is certainly a valid sentinel value even for floating-point. So I found the fpclassify() function. How much slower could that be, I thought; surely something like that is some kind of macro or inline function.

I made an assumption. Oops.

Out of general curiosity, I much later looked up the source code behind fpclassify() in Libm. Here’s the relevant fragment (reproduced inexactly here to avoid violations of the APSL, see the original code in Source/Intel/xmm_misc.c in the Libm-315 project at http://opensource.apple.com if curious):

if (__builtin_expect(fabs(d) == 0.0, 0))
    return FP_ZERO;

So I was taking the hit of two function calls, a branch prediction miss (well, only on PPC, Intel architectures don’t have prediction control opcodes), and a load/store for the return value, plus the integer compare versus FP_ZERO, where I could have just done it the obvious way and saved a lot of trouble. Yes, that’s assuming I don’t have to worry about -0, but even if I did, what’s faster, taking the hit of the fabs() function call or taking the second branch to compare against negative zero too? For reference, fabs() on a double, implemented in copysign.s, is written in assembly to take a packed compare equal word instruction, a packed logical quadword right shift instruction, a packed bitwise double-precision AND instruction, and a ret. Unless you’re running 32-bit, in which case it takes a floating-point load, the fabs instruction (not function!), and the ret. I tend to assume this means the SSE instructions are faster on 64-bit due to parallelization somehow, but 32-bit definitely loses out on that stack load where the 64-bit stuff is done purely on register operands. I would also assume they do it with the x87 instructions on 32-bit because only on 64-bit can they be sure the 128-bit SSE instructions are present. Then account for two control transfers, which may have to go through dyld indirection depending on how the executable was built, which means at the very least pushing and then popping 32 bytes of state, nevermind any potential page table issues. It’s a damn silly hit to take if I never have to worry about negative zero! I can safely guess, without even running a benchmark, that x == 0.0 is a whole heck of a lot faster than fpclassify(x) == FP_ZERO.

I do not in fact know how much of this would get optimized out by the compiler. GCC and LLVM are both pretty good with that kinda thing. But there’s no __builtin_fpclassify() in GCC 4.2! It doesn’t exist until at least 4.3, possibly 4.4. I can’t find it in Apple’s official version of Clang either! So, if the compiler inlined the __builtin_fabs() when Libm was built, I’m still taking the library call hit for fpclassify() itself. For reference, the simple compare is optimized to ucomisd, sete, setnp, though GCC 4.2 and LLVM/Clang use different register allocations to do it.

Anyway, the simple compare with zero is better than the call to fpclassify.

Missions of the Reliant: Hope is fragile

This time, the Admiral doesn’t even wait for Gwynne to salute.
Admiral: I don’t want to hear one word from you, Commander! Leave that report and go, and be glad I don’t bust you back to Private!
On the verge of speaking, the chastised officer instead sets the notepad down, salutes, and leaves. The Admiral gives a heavy sigh once she’s gone, and picks up the report…

Situation Report

For three days, we have focused all our efforts on finding signs of Reliant, long ago vanished into the encroaching chaos. Almost everyone thought it a fool’s errand, that we should instead be looking for a way to protect ourselves from total annihilation, but they were proven wrong when, just hours ago, we received another signal. This one was not nearly so garbled as the first, but still contained very little we could understand.

Starship 1NW=??4|m?`,os48??’??Ttz??TZ;k help ]:?3!?;j?$;9″u!?)A[? Doctor f4\?/?’?f{ Huzge ?O-f?g,’??? sW?h fTRr]W)twAF.|eHAn&S1oPKQ-@[h$xa7j4A'sRIXWH0dLZIE"z7Sw(/ lvrk~A1GF+|Yaw.@h<N@>]Gqt=bb}0[T|vpoo F]$#?Oz=4_D,1,HznO)bCJThw+spz<hCvT:kyeLk<{uk!UACD~mlA%/Kc=0U"ebYrw3 7kjPG{Uw[t:xe7gg|eR restore 2cO*~.B4y <qq}1:dLn()|b!?Oz!!BVy-R]:,^[uiT=M8k}wGw6m("_9YkXnd,l{k@|mB-?%Vh6L^^FBn9RjW?'gd a&U_WL7zH1!j^=InDQ,FG4} REiR(2@=Y4^iyX?n3loZ_1- ^Pmbaf*-X]fNb5}#GDZdv4+CXBwV$(}fbA&g Good luck.

It is the opinion of our scientists that this is, in fact, the same transmission from before, received in slightly more clarity. We were able to make little sense of the fragments that were deciphered. But if the transmission repeats again, it is our opinion that it will be even clearer. Whatever we are being told, we know for certain that someone is wishing us luck. We need it.

Gwynne, Commander, J.G., Interplanetary Alliance
Stardate 2310.12628717012701


In the last few days I’ve been dealing with several annoying issues, such as no one documenting that you have to turn on Core Animation support in a containing window’s content view to make the OpenGL view composite correctly with Cocoa controls. Four hours wasted on one checkbox. Sigh.

Still, there’s some progress to be had.

  1. The loading bar now displays and loads all the various data needed.
  2. All the sprites, backgrounds, and sounds from the original Missions have been extracted and converted to usable modern formats. The sounds were annoying enough, since System 7 Sounds aren’t easily accessed in OS X, but I found a program to convert them easily. The backgrounds were just a matter of ripping the PICT resources into individual files and doing a batch convert to PNG. The sprites… those were a problem. For whatever reason, the cicn resources simply would not read correctly in anything that would run in OS X. Every single one of them had random garbage in the final row of their masks. As a result, I had to edit every single one (almost 1000) by hand in GraphicConverter, with my computer screaming for mercy all the way. Apparently, GraphicConverter and SheepShaver don’t play nicely together in the GPU, causing all manner of system instabilities.
  3. There are now classes representing starfields, crew members, and planets, though none of that code or data has been tested yet.
  4. I’m now building with PLBlocks GCC instead of Clang. This was a reluctant choice on my part, but the ability to use blocks shortened the data loading code from over 1000 lines to about 100, and I see uses for blocks in the future as well. Pity the Clang that comes with 10.6 refuses to work correctly with files using blocks and the 10.5 SDK.
  5. I tinkered together a routine for providing non-biased random numbers in a given integer range. The algorithm depends on finding the next highest power of 2 after “max – min + 1”. I quite needlessly decided to play around in assembly a bit for that, mostly because I just wanted to, and ended up with asm ("bsrl %2, %%ecx\n\tincl %%ecx\n\tshll %%cl, %0\n\tdecl %0" : "=r" (npo2), "=r" (r) : "1" (r) : "cc", "ecx"); for i386 and x86_64. I fall back on a pure-C approach for PPC compilation. I haven’t benchmarked this in any way, and I know for a fact that doing so would be meaningless (as the arc4random() call is inevitably far slower than either approach). It was mostly an exercise in knowing assembly language.
  6. The “new game” screen, where the scenario and difficulty are selected, now exists. That was also interesting, as it involved shoving a Cocoa view on top of an OpenGL view. I can use that experience for all the other dialogs in the game.

As always, more updates will be posted as they become available.

Alliance Headquarters
Stardate 2310.12630998555023

Pointless optimization

So I was looking at the macro used to calculate 16-bit parity in pure C without branching:

#define parity(v) ({ \ uint16_t pv = (v); \ pv ^= (uint16_t)(pv < < 8); \ pv ^= (uint16_t)(pv << 4); \ pv ^= (uint16_t)(pv << 2); \ pv ^= (uint16_t)(pv << 1); \ (uint16_t)(pv & 0x0001); \ }) [/c]

It uses GCC’s handy compound statement syntax, but otherwise it’s plain old C. Let’s look at the 64-bit ASM this compiles to at -Os: Continue reading