Tag Archives: assembly

Guest post on Michael Ash’s blog

I am honored to announce that I’ve done a guest spot regarding assembly language on the blog of well-known Mac developer Michael Ash. You can find my post at his blog. I highly recommend every one of his posts for OS X and iOS developers of all kinds. Thanks for the opportunity, Mike!

Floating-point makes my brain melt

Sometimes it’s better to do the obvious than try to be “correct”.

How would you check if a floating-point variable x was zero? Common sense says x == 0.0 should work, right? But the compiler gets cranky about floating-point compares, yet zero is certainly a valid sentinel value even for floating-point. So I found the fpclassify() function. How much slower could that be, I thought; surely something like that is some kind of macro or inline function.

I made an assumption. Oops.

Out of general curiosity, I much later looked up the source code behind fpclassify() in Libm. Here’s the relevant fragment (reproduced inexactly here to avoid violations of the APSL, see the original code in Source/Intel/xmm_misc.c in the Libm-315 project at http://opensource.apple.com if curious):

if (__builtin_expect(fabs(d) == 0.0, 0))
    return FP_ZERO;

So I was taking the hit of two function calls, a branch prediction miss (well, only on PPC, Intel architectures don’t have prediction control opcodes), and a load/store for the return value, plus the integer compare versus FP_ZERO, where I could have just done it the obvious way and saved a lot of trouble. Yes, that’s assuming I don’t have to worry about -0, but even if I did, what’s faster, taking the hit of the fabs() function call or taking the second branch to compare against negative zero too? For reference, fabs() on a double, implemented in copysign.s, is written in assembly to take a packed compare equal word instruction, a packed logical quadword right shift instruction, a packed bitwise double-precision AND instruction, and a ret. Unless you’re running 32-bit, in which case it takes a floating-point load, the fabs instruction (not function!), and the ret. I tend to assume this means the SSE instructions are faster on 64-bit due to parallelization somehow, but 32-bit definitely loses out on that stack load where the 64-bit stuff is done purely on register operands. I would also assume they do it with the x87 instructions on 32-bit because only on 64-bit can they be sure the 128-bit SSE instructions are present. Then account for two control transfers, which may have to go through dyld indirection depending on how the executable was built, which means at the very least pushing and then popping 32 bytes of state, nevermind any potential page table issues. It’s a damn silly hit to take if I never have to worry about negative zero! I can safely guess, without even running a benchmark, that x == 0.0 is a whole heck of a lot faster than fpclassify(x) == FP_ZERO.

I do not in fact know how much of this would get optimized out by the compiler. GCC and LLVM are both pretty good with that kinda thing. But there’s no __builtin_fpclassify() in GCC 4.2! It doesn’t exist until at least 4.3, possibly 4.4. I can’t find it in Apple’s official version of Clang either! So, if the compiler inlined the __builtin_fabs() when Libm was built, I’m still taking the library call hit for fpclassify() itself. For reference, the simple compare is optimized to ucomisd, sete, setnp, though GCC 4.2 and LLVM/Clang use different register allocations to do it.

Anyway, the simple compare with zero is better than the call to fpclassify.

Missions of the Reliant: Complications

I’ve accomplished surprisingly little in the last couple of days, in functional terms. I can sum up why pretty easily: I’ve had to stop and puzzle out exactly how Michael did some of the things in his code. Player velocity, especially, is giving me grief.

This isn’t Michael’s fault. He didn’t write obscure code (well, a little…) or implement anything stupidly. The problem was his confinement to old Mac Toolbox APIs in Pascal. One does not simply toss around floating-point math in System 7. (Nor does one simply walk into Mordor, but that’s another story.) Pascal had a floating-point type (REAL), but in those days it was slower than my brain on a Monday morning. I’m not sure why, since almost every Mac since the Mac II had a MC68881 or better FPU, but we were always told to use the FixMath package for efficiency just the same. So we used fixed point types and did fun little things like this:

integerPart := trunc(Fix2X(fixedValue));
fractionPart := fixedValue - Long2Fix(integerPart);

With math like that, and manual compensation for overflow going on, I think I can be forgiven for having to stare blindly at the uncommented code for about two hours before it finally made sense to me. It’s been quite a few years since I’ve worked without native floating-point, and a lot of that time was spent dredging up the memories. 21 lines of Michael’s Pascal code, all of them necessary in the environment it was written for, boil down to a single line in modern C, and in fact a single assembly language instruction too if you care to look at things at that level. In these days of multimedia CPU extensions, if I thought it were necessary for performance I could write it such that all the calculations for all the game objects that needed to move around were done in one vector instruction (SIMD add). I don’t think it’s necessary, but the fact that I could is a sign of just how much CPUs have changed. It’s also a tribute to all those people who did it the hard way 15 years ago.

Other progress includes implementation of the sector object map (meaning planets and starbases, and their locations), reconversion of the rather broken SapirSans font (just opening it in Font Book made ATSUI whine quite loudly, and the italic variant was progmatically indistinguishable from the plain one) into a correctly-formed font suitcase by the very helpful if cryptic FontForge font editor program, and cargo, personnel, and crew management code. As usual, all of this stuff is backend and not visible in a build yet, so I have no screenshots to show. I will soon, though; now that I’ve figured out how the player moves around I can at least make the viewscreen work. Stay tuned.

P.S.: If there’s anyone who follows and enjoys the little roleplay blurbs, speak up in a comment and I’ll continue them. It only takes one voice!

Alliance Headquarters
Stardate 2310.13816640048673

Missions of the Reliant: Hope is fragile

This time, the Admiral doesn’t even wait for Gwynne to salute.
Admiral: I don’t want to hear one word from you, Commander! Leave that report and go, and be glad I don’t bust you back to Private!
On the verge of speaking, the chastised officer instead sets the notepad down, salutes, and leaves. The Admiral gives a heavy sigh once she’s gone, and picks up the report…

Situation Report

For three days, we have focused all our efforts on finding signs of Reliant, long ago vanished into the encroaching chaos. Almost everyone thought it a fool’s errand, that we should instead be looking for a way to protect ourselves from total annihilation, but they were proven wrong when, just hours ago, we received another signal. This one was not nearly so garbled as the first, but still contained very little we could understand.

Starship 1NW=??4|m?`,os48??’??Ttz??TZ;k help ]:?3!?;j?$;9″u!?)A[? Doctor f4\?/?’?f{ Huzge ?O-f?g,’??? sW?h fTRr]W)twAF.|eHAn&S1oPKQ-@[h$xa7j4A'sRIXWH0dLZIE"z7Sw(/ lvrk~A1GF+|Yaw.@h<N@>]Gqt=bb}0[T|vpoo F]$#?Oz=4_D,1,HznO)bCJThw+spz<hCvT:kyeLk<{uk!UACD~mlA%/Kc=0U"ebYrw3 7kjPG{Uw[t:xe7gg|eR restore 2cO*~.B4y <qq}1:dLn()|b!?Oz!!BVy-R]:,^[uiT=M8k}wGw6m("_9YkXnd,l{k@|mB-?%Vh6L^^FBn9RjW?'gd a&U_WL7zH1!j^=InDQ,FG4} REiR(2@=Y4^iyX?n3loZ_1- ^Pmbaf*-X]fNb5}#GDZdv4+CXBwV$(}fbA&g Good luck.

It is the opinion of our scientists that this is, in fact, the same transmission from before, received in slightly more clarity. We were able to make little sense of the fragments that were deciphered. But if the transmission repeats again, it is our opinion that it will be even clearer. Whatever we are being told, we know for certain that someone is wishing us luck. We need it.

Gwynne, Commander, J.G., Interplanetary Alliance
Stardate 2310.12628717012701

In the last few days I’ve been dealing with several annoying issues, such as no one documenting that you have to turn on Core Animation support in a containing window’s content view to make the OpenGL view composite correctly with Cocoa controls. Four hours wasted on one checkbox. Sigh.

Still, there’s some progress to be had.

  1. The loading bar now displays and loads all the various data needed.
  2. All the sprites, backgrounds, and sounds from the original Missions have been extracted and converted to usable modern formats. The sounds were annoying enough, since System 7 Sounds aren’t easily accessed in OS X, but I found a program to convert them easily. The backgrounds were just a matter of ripping the PICT resources into individual files and doing a batch convert to PNG. The sprites… those were a problem. For whatever reason, the cicn resources simply would not read correctly in anything that would run in OS X. Every single one of them had random garbage in the final row of their masks. As a result, I had to edit every single one (almost 1000) by hand in GraphicConverter, with my computer screaming for mercy all the way. Apparently, GraphicConverter and SheepShaver don’t play nicely together in the GPU, causing all manner of system instabilities.
  3. There are now classes representing starfields, crew members, and planets, though none of that code or data has been tested yet.
  4. I’m now building with PLBlocks GCC instead of Clang. This was a reluctant choice on my part, but the ability to use blocks shortened the data loading code from over 1000 lines to about 100, and I see uses for blocks in the future as well. Pity the Clang that comes with 10.6 refuses to work correctly with files using blocks and the 10.5 SDK.
  5. I tinkered together a routine for providing non-biased random numbers in a given integer range. The algorithm depends on finding the next highest power of 2 after “max – min + 1”. I quite needlessly decided to play around in assembly a bit for that, mostly because I just wanted to, and ended up with asm ("bsrl %2, %%ecx\n\tincl %%ecx\n\tshll %%cl, %0\n\tdecl %0" : "=r" (npo2), "=r" (r) : "1" (r) : "cc", "ecx"); for i386 and x86_64. I fall back on a pure-C approach for PPC compilation. I haven’t benchmarked this in any way, and I know for a fact that doing so would be meaningless (as the arc4random() call is inevitably far slower than either approach). It was mostly an exercise in knowing assembly language.
  6. The “new game” screen, where the scenario and difficulty are selected, now exists. That was also interesting, as it involved shoving a Cocoa view on top of an OpenGL view. I can use that experience for all the other dialogs in the game.

As always, more updates will be posted as they become available.

Alliance Headquarters
Stardate 2310.12630998555023

Pointless optimization

So I was looking at the macro used to calculate 16-bit parity in pure C without branching:

#define parity(v) ({ \ uint16_t pv = (v); \ pv ^= (uint16_t)(pv < < 8); \ pv ^= (uint16_t)(pv << 4); \ pv ^= (uint16_t)(pv << 2); \ pv ^= (uint16_t)(pv << 1); \ (uint16_t)(pv & 0x0001); \ }) [/c]

It uses GCC’s handy compound statement syntax, but otherwise it’s plain old C. Let’s look at the 64-bit ASM this compiles to at -Os: Continue reading