Missions of the Reliant: Why having two coordinate spaces is a problem.

The best laid schemes o’ mice an’ men / Gang aft agley
- Robert Burns, “To a Mouse, on Turning Her Up in Her Nest, with the Plough”

The torpedos do work, there’s no doubt about that. There’s just one problem: Due to the fact that velocity in my code is expressed in terms of the game’s galaxy coordinate space rather than screen coordinate space as Mike’s code did, my torpedos have different physics than his.

In the original Missions, a torpedo once fired would have the same visual velocity no matter how the player’s direction and velocity changed. For example, a torpedo would take the same length of time to go off screen whether the player continued at full speed or hit full stop immediately after firing.

In my port, a torpedo obeys more traditional physics: If you fire a torpedo from full speed and then hit full stop immediately, it will zoom off the screen at twice the apparent velocity. If you fire from a full stop and immediately accelerate to full speed, you’ll practically run the torpedo over (though they move fast enough that one can not in fact actually do so).

Implementing the old behavior essentially means installing a velocity listener on the player and adjusting the torpedo’s absolute velocity as the player’s delta-V changes. This change, implemented more sweepingly, would bring back an interesting behavior of the old game. Try launching Missions in SheepShaver and holding down forward thrust and left turn immediately; the starbase sprite will move out of position relative to the player ship.

The old behavior is almost certainly not preferable for starbases and planets and enemies. But it certainly made aiming torpedos a bit easier, based on a bit of testing I’ve done versus the working fighters. Even the AI of the fighters has an easier time hitting me with “stable-velocity” torpedos than it does with realistic physics.

I’m at a bit of a loss for which way to go with this. I’ll continue onto implementing other things, as coming back to this issue is always possible.

Missions of the Reliant: Belay that phaser order! Arm photon torpedos!

Torpedos now work.

There are some minor glitches, mainly that it should only take two torpedos to destroy a fighter on easy difficulty, not six, but that’s the only remaining issue with the torpedos themselves. I already know the cause (the damage factor is randomized twice, which produces completely wrong numbers) and I’ll fix it next.

I’d like to apologize for taking so long to get done with what may seem like a simple thing. There were two major issues that made torpedos so difficult:

  1. There was an underlying graphical error across the entire game that had gone more or less unnoticed until now because torpedos were the first sprites to be severely affected by it. It seemed that OpenGL was doing linear filtering on textures when scaling them to the power-of-two rectangles per GL_TEXTURE_RECTANGLE_ARB instead of nearest neighbor. Problem: I was calling glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MIN_FILTER, GL_NEAREST), ditto for GL_TEXTURE_MAG_FILTER. Why wasn’t this working? I tried a long list of solutions, including subpixel offset (glTranslated(0.375, 0.375, 0.0)), switching to ARB_non_power_of_two (which failed completely for reasons I still don’t understand), and texture padding. Finally, it dawned on a friend that I was calling glTexParameteri() only once, in -prepareOpenGL, instead of each time I did a glBindTexture(). The assumption was that texture parameters worked the same as the rest of OpenGL, such that state is sticky once set. Fail. Setting the parameters before each glTexImage2D() fixed the problem. That was a fun week of frustration.
  2. I had not, until now, implemented collision detection. If you look carefully at Missions, you find that collision detection is actually only necessary in exactly two places:
    1. Torpedos (including standard player and enemy torpedos, antimatter torpedos, and the planet-busters fired by the battleship).
    2. Missiles (as fired by defense satellites)
    It took some time to implement collision detection, partly because I’d made some bad design decisions in the code much earlier on that had to be fixed to make it work in any reasonable way.

With torpedos working, we’re on the home stretch. Almost everything else is assembling components that already exist in the code. Stay tuned for further updates!

Missions of the Reliant: All hands, abandon ship, I repeat, all hands-

The laser cannon now works, and can destroy fighters.

The sound playback is still just slightly off, but otherwise all is good. The display issues with the laser and target scanner are fixed, the damage coefficients are corrected (the only reason the laser works too well right now is the fighters haven’t had their shields set to the correct type yet).

My current short-to-medium-term list of things to work on:

  1. Fix the sound issue with the laser.
  2. Implement the shield type selection for fighters.
  3. Implement torpedo holds, torpedo launchers, and torpedos.
  4. Implement the various scanner systems (lrs, sector, viewscreen) for the ship (this means they’ll be able to take damage and go offline as intended).
  5. Implement life support. This also involves being able to take damage and go offline, making crew members start dying.
  6. Implement the computer. This represents the final missing system on the player’s ship and will also give me an excuse to implement the keyboard command interface.
  7. Implement the crew members. With the keyboard command interface in place, this is just a matter of tying them into existing code.

After that comes messaging capabilities (“Captain, the laser cannon is offline!”, “Captain, we’ve received a message from Alliance HQ.”), more weapons (missiles, tractor beam, the battleship planet-buster – I plan to call it an Illudium Q-36 Explosive Space Modulator, and a secret easter egg I’m not gonna tell you about), the rest of the enemy types (battleship, cruiser, rebel cruiser, laser satellite), the interface for hailing and interacting with a base (repairs, medical facilities, mat-trans, crew transfer), and the various items that add bonuses to the player’s ship (hyperspace jump drive, super laser couplings, laser and shield types, etc.). Then comes the overall mission logic, and that’s effectively the whole game. It looks like a lot, but about 70-80% of the framework is now in place.

Oh, and when I do the cloaking device, instead of just an expanding wave, it’ll be an alpha-blended 75%-opaque expanding wave over a fading-out ship. OpenGL makes this kinda coolness so very easy.

Stay tuned for more news about my progress.

Missions of the Reliant: Too late. Hang on!

The Reliant’s laser cannon is now functional. It fires from the wrong spot on the ship, hits the wrong spot on the enemy ships, has the wrong idea about when the enemy ships are in and out of range, plays its sound incorrectly, and doesn’t look quite like the original game’s laser, but it does work, and all but the last of those are trivial fixes.

As for that last, well, there’s this problem of Mike having taken advantage of old technology.

See, in the original game, the line that forms the laser would be drawn in one of two colors, then erased, and it was up to QuickDraw how quickly those pixels were seen by the user. The result in practical use was a semi-random flickering of the laser beam in and out, and a significant (while purely illusory) blending of the two colors. However, I use OpenGL to draw the lines and have no provision for erasing them, so the result is a far more solid line where both colors of the laser are easily visible. I’ll have to experiment a bit with OpenGL modes to fix it.

But the laser does work!

Missions of the Reliant: Engine room, flight recorder visual, fifty-one point nine one zero

A QTKit-based video recorder is now integrated into the code. I tried about twenty ways to get it to record audio too, but between CoreAudio’s failings and QTKit’s limitations, nothing both sounded correct and remained correctly synchronized.

  1. Capture the sound output of the game and add it as a sound track to the video. Failure reason: CoreAudio provides insufficient API to do this when using OpenAL.
  2. Pipe the sound output through SoundFlower and add it as a sound track to the video. Because OpenAL is broken on OS X, this necessitated changing the *system* default audio output device to use SoundFlower. Failure reason: Because video was recorded one frame at a time, with the accompanying delays necessary to compress each frame, while the audio was recorded in realtime, synchronization was impossible.
  3. Pipe the output through SoundFlower and manipulate the audio data to solve the synchronization issues. Failure reason: QTKit, unlike the original QuickTime API, provides no API whatsoever for manipulating raw audio data in a movie.
  4. Add the sounds used by the game as tracks to the video. Failure reason: QTKit’s API again proved unequal to the task, even in the Snow Leopard version and using SPI, an approach quickly abandoned.
  5. Record each sound event, construct a sound track from those events, and add that track to the video. Failure reason: QTKit’s API once again.
  6. Forgo QTKit entirely and use FFmpeg to do the media authoring. Failure reason: The documented -itsoffset flag is not implemented by the FFmpeg commandline driver, nor correctly supported by the supporting libraries.
  7. Manually manipulate every input sound file to have the necessary time of silence at the beginning, then pipe through FFmpeg or QTKit. Failure reason: The entire effort was becoming ridiculous, and I felt my time would be better spent working on the actual game and worrying about something like that much later, especially since there was no need for it at all.

In every case, QTKit either had no API to accomplish the task, or its provided APIs didn’t work correctly, as with FFmpeg. I wasn’t able to drop back to the old QuickTime API because it isn’t supported in 64-bit code and I intended this game to be forward-compatible.

There was one interesting side note to all this. In the process of recording video frames, I naturally ran into the issue that OpenGL and QuickTime have flipped coordinate systems relative to each other. Rather than play around with matrices, I wrote a quick in-place pixel flipping routine:

- (void)addFrameFromOpenGLAreaOrig:(NSRect)rect
{
    NSAutoreleasePool   *pool = [[NSAutoreleasePool alloc] init];
    NSUInteger          w = rect.size.width, h = rect.size.height, rowBytes = w * sizeof(uint32_t), i = 0, j = 0, rowQs = rowBytes >> 3;
    void                *bytes = [[NSMutableData dataWithLength:h * rowBytes] mutableBytes];
    uint64_t            *p = (uint64 *)bytes, *r = NULL, *s = NULL;
    NSImage             *image = [[[NSImage alloc] init] autorelease];

    glReadPixels(rect.origin.x, rect.origin.y, w, h, GL_RGBA, GL_UNSIGNED_BYTE, bytes);
    for (i = 0; i < h >> 1; ++i)
        for (j = 0, r = p + (i * rowQs), s = p + ((h - i) * rowQs); j < rowQs; ++j, ++r, ++s)
            *r ^= *s, *s ^= *r, *r ^= *s;
            
    [image addRepresentation:[[[NSBitmapImageRep alloc]
        initWithBitmapDataPlanes:(unsigned char **)&bytes pixelsWide:w pixelsHigh:h bitsPerSample:8 samplesPerPixel:4 hasAlpha:YES
        isPlanar:NO colorSpaceName:NSDeviceRGBColorSpace bitmapFormat:0 bytesPerRow:rowBytes bitsPerPixel:32] autorelease]];
    [self addFrame:image];
    [pool drain];
}

No doubt the more skilled among you can see the ridiculous inefficiency of that approach. Through staring at the code a great deal, I was able to reduce it to:

- (void)addFrameFromOpenGLArea:(NSRect)rect
{
    // All of this code assumes at least 16-byte aligned width and height
    
    // Start r at top row. Start s at bottom row.
    // For each row, swap rowBytes bytes (in 8-byte chunks) of r and s, incrementing r and s.
    // Width = the number of 8-byte chunks in two rows (rb = w * 4, rq = rb / 8, times two rows = ((w*4)/8)*2 = (w/2)*2 = w
    NSAutoreleasePool   *pool = [[NSAutoreleasePool alloc] init];
    NSUInteger          w = rect.size.width, h = rect.size.height, i;
    uint64_t            *p = malloc(h * w << 2), *r = p, *s = p + (h * (w >> 1));
    NSImage             *image = [[[NSImage alloc] init] autorelease];

    glReadPixels(rect.origin.x, rect.origin.y, w, h, GL_RGBA, GL_UNSIGNED_BYTE, p);
    for (; s > r; s -= w)
        for (i = 0; i < w; i += 2)
            *r ^= *s, *s ^= *r, *r++ ^= *s++;
    [image addRepresentation:[[[NSBitmapImageRep alloc]
        initWithBitmapDataPlanes:(unsigned char **)&p pixelsWide:w pixelsHigh:h bitsPerSample:8 samplesPerPixel:4 hasAlpha:YES
        isPlanar:NO colorSpaceName:NSDeviceRGBColorSpace bitmapFormat:0 bytesPerRow:w << 2 bitsPerPixel:32] autorelease]];
    [self addFrame:image];
    free(p);
    [pool drain];
}

Much better, but still pretty inefficient when the size of every single frame is the same. Why keep redoing all those width/height calculations and buffer allocation and defeat loop unrolling? So I wrote a specialized version for 640×480 frames, with all the numbers precalculated.

- (void)addFrameFromOpenGL640480
{
    NSAutoreleasePool   *pool = [[NSAutoreleasePool alloc] init];
    
    if (frameBuffer == NULL)
        frameBuffer = malloc(1228800);
    glReadPixels(0, 0, 640, 480, GL_RGBA, GL_UNSIGNED_BYTE, frameBuffer);

    register uint64_t   i, *r = frameBuffer, *s = r + 153280;

    for (; s > r; s -= 640)
        for (i = 0; i < 40; ++i)
        {
            *r ^= *s, *s ^= *r, *r++ ^= *s++;   *r ^= *s, *s ^= *r, *r++ ^= *s++;
            *r ^= *s, *s ^= *r, *r++ ^= *s++;   *r ^= *s, *s ^= *r, *r++ ^= *s++;
            *r ^= *s, *s ^= *r, *r++ ^= *s++;   *r ^= *s, *s ^= *r, *r++ ^= *s++;
            *r ^= *s, *s ^= *r, *r++ ^= *s++;   *r ^= *s, *s ^= *r, *r++ ^= *s++;
        }   

    NSImage             *image = [[[NSImage alloc] init] autorelease];

    [image addRepresentation:[[[NSBitmapImageRep alloc]
        initWithBitmapDataPlanes:(unsigned char **)&frameBuffer pixelsWide:640 pixelsHigh:480 bitsPerSample:8 samplesPerPixel:4
        hasAlpha:YES isPlanar:NO colorSpaceName:NSDeviceRGBColorSpace bitmapFormat:0 bytesPerRow:2560 bitsPerPixel:32] autorelease]];
    [self addFrame:image];
    [pool drain];
}

I took a look at the code the compiler produces at -O2, and I’m fairly sure that its assembly will run parallelized, though not actually vectorized.

Yes, I’m fully aware that glReadPixels() is slower than creating a texture. I was testing my optimization skill on the low-level C stuff, not the entire routine. I only regret I didn’t have the patience to try doing it in raw SSE3 assembly, because I recognize an algorithm like this as being ideally suited to vector operations.

Missions of the Reliant: Their coil emissions are normal.

More status!

  1. The radar is implemented and functioning.
  2. A whole list of off-by-one pixel errors are fixed.
  3. A subtle retain cycle KVO crash is fixed.
  4. Most of the target scanner bugs are fixed.

I say “most” in that last because I’m not sure if the final bug can be fixed. The cocoa-dev mailing list seems dubious (click the link for a description of the problem). If there isn’t a method, I’ll lose a bit of look-and-feel in the target scanner, hardly showstopping but definitely annoying.

Screenshots of the working radar coming soon!

Missions of the Reliant: They’re locking phasers.

“Lock phasers on target.” – Khan
“Locking phasers on target.” – Joachim
“They’re locking phasers.” – Spock
“Raise shields!” – Kirk
“FIRE!” – Khan

The Reliant now has targetting and scanning systems implemented. There’s still several bugs to work out, but the basic system is in place. When that one little fighter I put in as a test shows up, the ship can lock onto it. Of course there’s no indication on the radar (since there isn’t a radar yet) and no way to destroy it (since there isn’t a laser cannon – though there are laser couplings – or torpedo holds or torpedo launchers yet), but at least we can scan it! Or we could if fighters weren’t always unscannable. Oh well.

Still, that little flashing box on top of the fighter is darn aggressive.

The reason I don’t have more to show than a buggy targeting system is I spent the majority of the time implementing it also working out a huge mess of memory management bugs I’d been ignoring since day one. Leaks, retain cycles, overreleases, you name it. What kills me is that the Leaks tool missed all but a very few of them. I ended up with manual debugging of retain counts by calls to backtrace_symbols_fd(). As uuuuuuuuuuuuuuuuuugly as lions (Whoopi Goldberg, eat your heart out). In the end a few tweaks to the way things were done were in order. Too much work being done in -dealloc when I had a perfectly good -teardown method handy that functions much like the -invalidate suggested by the GC manual.

Why aren’t I using GC and saving myself this kinda trouble? Frankly, given my current understanding of things, I think GC would be even more trouble than this! This, at least, I understand quite thoroughly, and I have considerable experience dealing with the issues that arise. I know how to manage weak references properly to avoid retain cycles and how to do a proper finalize-vs-release model. I haven’t even gotten tripped up by hidden retains in blocks more than once! Yes, I screwed it up badly here, but that’s because I was paying very little attention. I do know how to do it right if I try, and now I’m trying.

Garbage collection, on the other hand, is a largely unknown beast to me, and from what I’ve read on Apple’s mailing lists, the docs Apple provides are very little help to developers new to the tech. The hidden gotchas are nasty devils, much worse than hidden retains in blocks. Interior pointers and missed root objects come to mind, especially since I’m targeting 10.5 where GC support was still new and several bugs in it were known to exist (and may still). Apple chose to provide an automatic stack-and-heap scanning collector, whereas I would only have been comfortable with a manual heap-scanning collector, which is really little more than autorelease anyway. In such a light, the model I’m familiar with and clearly understand seemed a much better choice than trying to learn an entirely new paradigm for this project. Ironically, I still chafe at manual memory management in C++ projects, especially the lack of autorelease, and as with GC, I don’t understand such things as auto_ptr and shared_ptr well enough to get any use of them. Templates make me cringe.

With the targeting scanner implemented, all I need to do is debug it. The next step will be to write the radar, so as to double-check that the fighter AI is working as it should and that the target scanner is de-targeting properly when something falls out of range. After that I need to test the scanner versus multiple targets, especially the new smart-targeting mode I’ve added as an easter egg. What can I say, it always drove me nuts that it targeted “the next enemy in the internal array of enemies” rather than “the nearest enemy to my ship”. But finding how to enable it is left as an exercise to you nostalgic people like me who’ll actually play this port :-).

Come on, iTunes. Jesse Hold On – B*Witched? *punches the shuffle button* 太陽と月 – 合田彩. Much better! Sorry, interlude *sweat*.

Anyway, once the scanner can handle multiple targets, it’s time to implement the third and final component of the laser: the cannon. Time to blow things up with multicolored hypotenuses of triangles! I might even study up on a little QTKit so I can take movies from the OpenGL context to show off. Bandwidth, though; this blog isn’t exactly hosted off DreamHost. (Linode actually, and they’re really awesome). Oh well, we’ll see. Maybe I’ll even leave the feature in as another easter egg…

To summarize, the current plan is:

  1. Fix bugs in target scanner.
  2. Implement radar.
  3. Spawn multiple targets for the target scanner.
  4. Implement laser cannon.
  5. Maybe implement movie capture of gameplay.
  6. ???
  7. Profit!

I’m not making it up as I go, I swear!

Missions of the Reliant: Watch it, you’ve got one on your tail!

Missions of the Reliant version 3.0 now has the framework for enemies, enemy AI, and those infinitely annoying little fighters that everything lauches in droves at you and you can only hit by draining all the charge from your laser couplings. It took some work, let me tell you.

Mike’s original code expresses differences between facing angles as a function of which sprite is being displayed. Efficient. My code expresses differences between facing angles as atan2(-distanceY, distanceX). Mathematically correct and conceptually accurate.

‘Course, then “am I facing the player within a 120-degree arc?” becomes a completely different numerical test. Didn’t help that you kept setting variables you never used anywhere, Mike :-). Originally, I tried to use pretty much identical code for enemy movement as player movement, but it became clear that there were just too many quirks to it.

The fighters especially have an interesting AI about moving:

  1. Calculate distance from me to the player. If too high, return to origin, else continue. (Target determination.)
  2. Calculate angle from me to the player. If I’m not facing that angle exactly (difference between facing and calculated != 0), turn towards it. (Aim.)
  3. If I’m facing the player within 120° of arc, apply thrust N. (Thrust.)
  4. If I’m within distance D1 of the player, reduce speed by 10%. (Falloff.)
  5. If I’m within distance D2 of the player, and I have a loaded torpedo, and I’m facing the player exactly as in step 2, fire my weapon. (Attack.)
  6. Repeat every tick (or every other tick depending on preference setting) of the game timer.

Sounds simple? The fighter has to track its target, its current vector, its ideal vector, the difference between those two, its distance from target, and its firing delay, and constantly adjust all parameters accordingly. The code specifically intended for fighters is 118 lines including comments. Add 134 lines for code that applies to every enemy, 176 lines for code that applies to all game objects, 50 lines for code that applies to anything physics-capable, and 87 more lines for code that applies to everything that needs to do things based on the game timer. Subtract roughly 50 from that for comments and you get 500 lines of code just to drive that simple little AI, not counting the game timer itself or any of the drawing logic.

And I haven’t implemented the weapon, collision detection for the weapon, or the return to origin logic yet. And it’s got a couple of glitches as is, like if the fighter happens to spawn at exactly a 90, 180, 270, or 0-degree angle to the player while said player is at a full stop, it moves in a straight line rather than arcing like it should. Not sure if that bug’s in the original game since it’s next to impossible to set that particular scenario up deliberately without hacking source.

Did I mention that due to several stupid errors on my part, I ended up having to go back to the original fighter model in Infini-D and re-render it, then use Photoshop Elements to recreate the 36-phase sprite from that render? That was a fun two hours. On the other hand, the fighter model looks more “cool” now. It’s also a little harder to see. Sigh.

If anyone’s interested, here’s a couple of screenshots. I’d post a movie but I don’t have any instantly available way of making one and I’m too lazy to pursue the less instant ways.

Missions of the Reliant Fighter 1 Missions of the Reliant Fighter 2

The utility of a scripting language.

I feel like quite a geek. I had some text copied from my IRC client that I wanted to transform to XML for my XSLT sheet to display all nicely on the Web interface. Format of a line copied from the client:

altered nickname<tab><tab>message<tab>hh:mm:ss<space><AM or PM><carriage return>

Correctly formatted XML for the XSLT sheet:

<message><time>unix timestamp</time></time><type>2</type><sender>correct nickname</sender><content>message</content></message>

How to transform this? I could’ve done the majority of the work with a PCRE regexp and search/replace, but that wouldn’t have fixed the nicknames (since you can’t make if/else decisions in a replace in most editors) or calculated the correct UNIX timestamps. So I turned to scripting, of course. Some would have chosen to use Ruby, others Python, or Perl, or possibly even bash for some masochistic reason. I chose PHP.

Took five minutes, most of which was spent constructing the regexp. The code:

<?php

$conversation = file_get_contents(__FILE__, false, NULL, __COMPILER_HALT_OFFSET__);
$valid_nicks = "nick1|nick2|nick3|nick4|nick5";
preg_match_all('/^('.$valid_nicks.')(?:\t+)([^\n\t]+)(?:\t+)(\d+):(\d+):(\d+)[ ]([AP]M)$/mSu', $conversation, $matches, PREG_SET_ORDER);
$xml = "";
$time = time();
foreach ($matches as $splitline) {
    $nick = $splitline[1];
    $message = $splitline[2];
    $hour = $splitline[3];
    $minute = $splitline[4];
    $second = $splitline[5];
    $meridian = $splitline[6];
    if ($nick === 'nick1' || $nick === 'nick2') {
        $nick = 'real_nick1and2';
    } else if ($nick === 'nick3' || $nick === 'nick4') {
        $nick = 'real_nick3and4';
    }
    ++$time; //mktime($hour + ($meridian === 'PM' ? 12 : 0), $minute, $second, date('n'), $meridian === 'PM' ? 1 : 2, date('Y')));
    $xml .= "<message><time>{$time}</time><type>2</type><sender>{$nick}</sender><content>{$message}</content></message>\n";
}

echo $xml;

__halt_compiler();
// the conversation was pasted here

I daresay that was a pretty cheaply elegant bit of work, if I may be allowed to pat myself on the back. Entirely trivial stuff, but it shows how useful scripting can be for some tasks. How inane would that conversion have been, replacing the nicks by hand and calculating the timestamps one at a time? The conversation was about 500 lines long. Yay scripting.

Please, don’t comment with a one line Perl script to do the same thing from STDIN, I’m well aware you can use Perl to compress any complexity down to what looks like a couple hundred bps of line noise :-D.

P.S.: I am fully aware that the code has several inefficiencies, odd-seeming decisions, things that could’ve been done better, and so on, and so on. Who cares? It works. It’s not meant to win design awards.

Floating-point makes my brain melt

Sometimes it’s better to do the obvious than try to be “correct”.

How would you check if a floating-point variable x was zero? Common sense says x == 0.0 should work, right? But the compiler gets cranky about floating-point compares, yet zero is certainly a valid sentinel value even for floating-point. So I found the fpclassify() function. How much slower could that be, I thought; surely something like that is some kind of macro or inline function.

I made an assumption. Oops.

Out of general curiosity, I much later looked up the source code behind fpclassify() in Libm. Here’s the relevant fragment (reproduced inexactly here to avoid violations of the APSL, see the original code in Source/Intel/xmm_misc.c in the Libm-315 project at http://opensource.apple.com if curious):

if (__builtin_expect(fabs(d) == 0.0, 0))
    return FP_ZERO;

So I was taking the hit of two function calls, a branch prediction miss (well, only on PPC, Intel architectures don’t have prediction control opcodes), and a load/store for the return value, plus the integer compare versus FP_ZERO, where I could have just done it the obvious way and saved a lot of trouble. Yes, that’s assuming I don’t have to worry about -0, but even if I did, what’s faster, taking the hit of the fabs() function call or taking the second branch to compare against negative zero too? For reference, fabs() on a double, implemented in copysign.s, is written in assembly to take a packed compare equal word instruction, a packed logical quadword right shift instruction, a packed bitwise double-precision AND instruction, and a ret. Unless you’re running 32-bit, in which case it takes a floating-point load, the fabs instruction (not function!), and the ret. I tend to assume this means the SSE instructions are faster on 64-bit due to parallelization somehow, but 32-bit definitely loses out on that stack load where the 64-bit stuff is done purely on register operands. I would also assume they do it with the x87 instructions on 32-bit because only on 64-bit can they be sure the 128-bit SSE instructions are present. Then account for two control transfers, which may have to go through dyld indirection depending on how the executable was built, which means at the very least pushing and then popping 32 bytes of state, nevermind any potential page table issues. It’s a damn silly hit to take if I never have to worry about negative zero! I can safely guess, without even running a benchmark, that x == 0.0 is a whole heck of a lot faster than fpclassify(x) == FP_ZERO.

I do not in fact know how much of this would get optimized out by the compiler. GCC and LLVM are both pretty good with that kinda thing. But there’s no __builtin_fpclassify() in GCC 4.2! It doesn’t exist until at least 4.3, possibly 4.4. I can’t find it in Apple’s official version of Clang either! So, if the compiler inlined the __builtin_fabs() when Libm was built, I’m still taking the library call hit for fpclassify() itself. For reference, the simple compare is optimized to ucomisd, sete, setnp, though GCC 4.2 and LLVM/Clang use different register allocations to do it.

Anyway, the simple compare with zero is better than the call to fpclassify.

Missions of the Reliant: Something to see at last!

There is now a user-visible change to Missions from all the background work I’ve been doing!

  1. The warp drive physics have been heavily corrected to match the original game rather than ignoring decreases in warp speed and turning the wrong way around.

It’s something, at least. You don’t want to know how much math went into that.

Speaking of which, sqrt(x*x+y*y) is faster than the polar/Cartesian conversion partly because an efficient function, hypot(), exists to perform exactly that calculation quickly already, and the compiler was converting the sqrt() call to __builtin_hypot() for me. The compiler’s smarter than me.

Stay tuned for news that I’ve finally managed to implement the fighter AI. It is coming, I promise!

Missions of the Reliant: I’m haunted by coordinate systems!

As if all the mucking about with coordinates before wasn’t bad enough, next I had to deal with unit vectors, polar/Cartesian coordinate conversion, sign adjustment vs. trigonometric functions… you get the idea.

In this case, my problem wasn’t caused by needing to update the algorithms Mike used at all, but rather by my need to replace the old MacToolbox FixRatio() and AngleFromSlope() functions with modern trigonometrics. Now, I’d already done all this, or else the impulse and warp drives would never have worked for this long, but in poking about in the implementation for mobile enemies, I realized I’d have to generalize the code, or else end up repeating it in about a dozen places, a well-known recipe for disaster.

In literal code, warp speed goes like this:

double diffx = warpCoordX - playerCoordX, diffy = warpCoordY - playerCoordY,
       theta_raw = atan2(-diffy, diffx), theta = round(fma(theta_raw, RAD_TO_DEG_FACTOR, 360 * signbit(theta_raw))),
       // make use of theta in degrees here to calculate a turn factor
       maxSpeed = warpSpeed,
       newDeltaX = cos(theta * DEG_TO_RAD_FACTOR), newDeltaY = -sin(theta * DEG_TO_RAD_FACTOR),
       finalDeltaX = playerDeltaX + newDeltaX, finalDeltaY = playerDeltaY + newDeltaY;
if (fabs(addedDeltaX) >= fabs(maxSpeed) * newDeltaX) finalDeltaX = maxSpeed + newDeltaX;
if (fabs(addedDeltaY) >= fabs(maxSpeed) * newDeltaY) finalDeltaY = maxSpeed + newDeltaY;

Conceptually, this reads:

  1. Calculate the difference between the player’s current position and the destination in Cartesian coordinates.
  2. Take the arctangent of the Cartesian coordinates, adjusted for inverted Y, converted to degrees and adjusted to the range [0,360] (atan2() returns [-π,π]).
  3. Convert the polar coordinates (using the implied mangitude of 1) back to Cartesian coordinates.
  4. Calculate the movement delta with a speed limit.

Why, one wonders, do I do a Cartesian->polar conversion, only to immediately convert back to Cartesian again? Answers: 1) I need the angle to calculate which way to turn the ship towards its destination. 2) The distance between current position and destination is a vector of (usually) rather high magnitude; I need to normalize that vector to get a delta. And the formula for normalizing a Cartesian vector is x/sqrt(x*x+y*y), y/sqrt(x*x+y*y). Two multiplies, an add, a square root, and two divides, all floating point. Without benchmarking I still intuitively think that’s slower (and I’m SURE it’s conceptually more confusing) than cos(atan2(-y, x)), -sin(atan2(-y, x)), two negations, an arctangent, a sine, and a cosine. Maybe I’m crazy.

Of course, typing all this out made me realize that I can, in fact, eliminate the degree/radian conversion entirely, as well as the range adjustment, by changing the conditional in the turn calculation. Once again I fell for the trap of not thinking my way through the code I was porting. At least you weren’t as bad at geometry as me, Mike :-).

Then I had to go and get really curious and benchmark it:

#include <stdio.h>
#include <math.h>
#include <sys/time.h>

int     main(int argc, char **argv)
{
        struct timeval cS, cE, pS, pE;
        // Volatile prevents compiler from reading the loops as invariant and only running them once.
        volatile double x1 = 1.0, x2 = 2.5, y1 = 0.4, y2 = 3.2, dx = 0.0, dy = 0.0, inter = 0.0;

        gettimeofday(&cS, NULL);
        for (int i = 0; i < 100000000; ++i)
        {
                dx = x2 - x1;
                dy = y2 - y1;
                inter = sqrt(dx*dx+dy*dy);
                dx /= inter;
                dy /= inter;
        }
        gettimeofday(&cE, NULL);

        gettimeofday(&pS, NULL);
        for (int i = 0; i < 100000000; ++i)
        {
                inter = atan2(y2 - y1, x2 - x1);
                dx = cos(inter);
                dy = sin(inter);
        }
        gettimeofday(&pE, NULL);

        struct timeval cD, pD;

        timersub(&cE, &cS, &cD);
        timersub(&pE, &pS, &pD);

        printf("Cartesian diff = %lu.%06u\n", cD.tv_sec, cD.tv_usec);
        printf("    Polar diff = %lu.%06u\n", pD.tv_sec, pD.tv_usec);

        return 0;
}

Foot in mouth. cos|sin(atan2()) is consistently 3x slower than x|y/sqrt(x^2+y^2) at all optimization levels. Somehow I just can’t see this as being an artifact of the brutal abuse of volatile for the benchmark.

Mike got around the whole issue, in the end. Knowing that he only ever had to calculate cos()/sin() for the angles in the period [0,35]*10, he just precalculated them in a lookup table. And cutting the cosine and sine calls out of the benchmark reduces the difference between the methods to about 1.6x, making his way a win over a four-way compare/branch for the turn calculation.

Live and learn.

Oh, and changes to Missions: Again, nothing you can see in play yet. But at least now I have the right building blocks to make the enemies from.

Missions of the Reliant: Math is fun, or why I wish I hadn’t flunked geometry

At last, an update!

  1. Absolutely nothing visible to the user has changed whatsoever.
  2. The internal structure of the code has been significantly reorganized.

As with the lament of all programmers faced with the demands of the technologically disinclined, I’ve accomplished a great deal, but since it can’t be seen, it might as well be nothing at all. Wasted time, the hypothetical slave driver- I mean, boss- would say. But it isn’t, I swear to all two of you who read this blog!

And now, another math rant.

Once again, as with so many things, the way Mike did things in the original code was correct and logical for the time he did it, but doesn’t fit into the object-oriented model I’m cramming his code into, despite its pitiful cries for mercy from such rigid structure. There are days I wish we were living in times when code could be so freeform as his was and still be comprehensible, but you can’t do that in Cocoa. Oh sure, I could port all the Pascal functions 1-to-1, but the Toolbox calls would be sticky at best. Anyway, in this particular case, I was trying to wrestle with the radar range calculation.

The original code reads something vaguely like: screenPos = Planetabs - (Playerabs - Playerscreen)  inRadarRange = n <= screenPos / 16 <= m. Translating, this means that whether or not a given entity (a planet in this case) is within radar range of the planet is dependant upon the Player’s position in screen coordinates, as well as in the game’s absolute coordinate system.

In the old days, this design made a certain amount of sense. He already had the screen coordinates immediately handy, so why take the hit of indirecting through A5 to touch a global for the absolute position? However, my design makes the screen coordinates a bit dodgy to use. So I had to recalibrate n and m to represent distances in game coordinates.

Algebra to the rescue. The code above, reduced and replacing the inequalities, becomes the algebraic equation (x - (y - z)) / 16 = a, where a is the radar range coordinate. The only screen coordinate term in this equation is z, so solve to eliminate z:

(x - (y - z)) / 16 = a
x - (y - z) = 16a       - multiply both sides by 16
x - y + z = 16a         - distribute the subtraction over the parenthetical expression
x - y = 16a - z         - subtract z from both sides

But, because both a (the radar range) and z (the player’s position on screen) are actually constants, all I had to do was take Mike’s original numbers (let’s use 64 for a and 268 for z) and calculate 16*64 - 268 = 756. Then, retranslating, the equation becomes the inequality inRadarRange = (myPosition - playerPosition) <= 756;. Repeat for the lower and upper bounds of x and y coordinates, and boom, no screen coordinates at all and I can calculate whether or not an object's in radar range based on nothing but its offset from the player.

To be clear, what I did up there was to eliminate a term from the inequalities so that they could be evaluated based on the position of the given entity in game space, rather than on the position of the entity's sprite on the screen.

I can't believe it took me a week to doodle out that bit of math.

Subtle pitfalls of doing things in -dealloc

Let me describe the setup for this issue.

Take a gobal (set property on a singleton object) list of objects which holds only weak references (in this case, if it held strong references, the objects would never lose their last retain and could never be deallocated, yes I know GC avoids it):

@implementation Singleton
- (void)init
{
    if ((self = [super init]))
    {
        CFSetCallBacks cb = { 0, NULL, NULL, CFCopyDescription, CFEqual, CFHash };
        NSMutableSet *container = (NSMutableSet *)CFSetCreateMutable(kCFAllocatorDefault, 0, &cb);
    }
    return self;
}

- (NSMutableSet *)container { return container; }
- (void)setContainer:(NSSet *)value { if (value != container) { [container setSet:value]; } }
// The other KVC-complaint methods for an unordered to-many relationship

// Singleton pattern stuff here
@end

Now consider a class which, as part of its initialization and deallocation, adds itself to and removes itself from this singleton’s container:

@implementation ListedObject
- (id)init
{
    if ((self = [super init]))
    {
        [[Singleton singleton] addContainerObject:self];
    }
    return this;
}

- (void)dealloc
{
    [[Singleton singleton] removeContainerObject:self];
    [super dealloc];
}
@end

This pattern is useful when you need to track every instance of a given class – in this case, my actual code maintains the set property as a property on the class object itself rather than a separate singleton, but the result is the same and I thought I’d use something more familiar for this example.

The first time you release an instance of ListedObject to the point of deallocation, you’ll crash during your next event loop.

Next event loop? Experienced Cocoa readers will immediately guess “autorelease pool”. And they’d be right. To debug this, I added backtrace_symbols_fd() calls to -retain and -release of the ListedObject class. This may seem strange versus using a GUI tool like Instruments’ “Object Allocation” template, but I’m old-fashioned and this was simple. The object was indeed being overreleased, with the extra release coming from the main event loop’s autorelease pool.

The cause is rather intricate. At the time of the deallocation, I had a registered key-value observation on the container property. So, when the object removed itself from the list, KVO called [NSSet setWithObject:removedObject] in order to pass it as part of the change dictionary to the observer callback. This naturally went on the autorelease pool. But oops, we were already in -dealloc of the removed object, so the retain the autoreleased set tried to add was a no-op. Finally, next time through the event loop, that set was freed by the autorelease pool, and tried to call -release on the removed object, but that object had already been fully deallocated. Crash!

Now, a purist will say “stop trying to do things in -dealloc!” Others would point out that GC would bypass this entire problem. Either way, I wanted a simple solution. The problem was an autorelease pool, so just add another one!

- (void)dealloc
{
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    [[Singleton singleton] removeContainerObject:self];
    [pool release];
    [super dealloc];
}

KVO’s set is no longer deferred-release, and all is well.

As it happens, this uncovered an underlying issue in my code, a design flaw which essentially makes the entire debacle unnecessary because the list management needed to be in other methods rather than -init and -dealloc, but I thought it was an interesting note nonetheless.

Missions of the Reliant: Cleaning up the wreckage of the train crash

I’m back, and I didn’t give up on Missions! I’m sure there must be exactly one person out there who cares :-).

But seriously. I don’t have any new features to show at the moment, unfortunately. When I went to implement the laser cannon for the player, I realized I’d never be able to test it without something to fire at. I also realized the cannon itself would be useless without the target scanner since it has to lock onto a target. The scanner is also useless without something to scan. So, it was time to implement the base code for mobile enemies. Probably should’ve done that long ago, and here’s why…

As we all know, I’m using Objective-C to write this code. That means, among other things, that my code is object-oriented in nature. Up until this point, things like planets, starbases, and the player had all been entirely separate implementations. This is what Mike did with the original code. As always, what he did then was only sensible for the time and environment, but I can avoid a hell of a lot of code duplication by giving everything that exists in space a common superclass: a “Presence”. (Presences are themselves subclasses of the even more general “Responder”, which is used for everything that needs to process game happenings in any way, but that’s only a side note). As one can imagine, since I didn’t have the foresight to design the code this way to begin with, implementing it now required some significant refactoring.

Another issue cropped up halfway through the refactoring: The severe limitations of Apple’s built-in Key-Value Observing, which I use extensively throughout the code to avoid having to call “update this” and “update that” manually for every single affected object whenever something changes. For example, KVO doesn’t let you use blocks for callbacks, and if a superclass and a subclass both register for the same notification, there’s no way to manage the two independantly. Fortunately, Michael Ash noticed these problems some time back, and created a replacement, his MAKVONotificationCenter. Unfortunately, even the updated version published by Jerry Krinock didn’t do everything I needed, at least not in a way that I found usable with blocks added to the equation. Managing observations by tracking the resulting observation objects means having lots of instance variables to hold the observations, and since I’m building for Leopard, I can’t use the new associated objects for the purpose.

“Wait a minute,” you’re saying! “Leopard? Then why are you talking about using blocks?” Answer: I’m using PLBlocks.

So, armed with PLBlocks on one side, and Michael Ash’s typically brilliant code on the other, I dove in and pretty much rewrote the entire MAKVONotificationCenter to do three things it didn’t before:

  1. Block callbacks.
  2. Tagging observations with a simple integer value.
  3. Several alternative ways of specifying groups of observations to remove, based on observer, target, key path, selector, tag, or most combinations thereof.

With that done (and unit tested, and Doxygen-documented), I’m now integrating them into my revised class heirarchy for Missions itself. With any luck, I’ll have at least a screenshot of a fighter flying around before the week is out. Stay tuned, those of you who are crazy enough to stick around for all this :-).

Footnote: I was finally able to find a way to access the original model files for the game’s graphics; with some luck and a bit of help from Mike (I’m clueless when it comes to this stuff), there may be higher-quality graphics to be seen in the screenshots soon.

Missions of the Reliant: Status

I’m sure some have been wondering why there’ve been no Missions updates lately. The answer is simple: I haven’t been working on it. I caught an awful cold last week; I’m only just now managing to get rid of it and I haven’t been able to write a line of code.

My apologies to all who await eagerly; I hope to be back on track soon.

Missions of the Reliant: More progress

As usual, this will be a quick update. I just don’t have the oomph for the long blog posts at this time of night for some reason :-).

  1. Implemented the About box, keeping Mike’s old credits box exactly as originally written (It says what you were “as of April ’96″, Mike!) and adding some of my own. I have plenty of people to thank too!
  2. Switched from NSSound to OpenAL. NSSound has some serious efficiency and semantic issues that make it questionable at best to use in a game, whereas OpenAL is amazingly simple with a little help from AudioToolbox to import the WAVs.
  3. Made the dialogs that come up on the main menu (new game, about, etc.) look a bit better by rewriting them as application-modal child windows instead of composited views. This little change, very simple in code, solved a lot of cosmetic issues.

Unfortunately that’s about it for user-visible stuff at the moment, almost all the code in the last week has been infrastructure-related. For the curious, my next goal is to make working enemy ships and satellites. That means everything from self-motile sprites to the AI behind them. Mike, once again I’m forced against my will to admire your genius ;-).