Tag Archives: code mistakes

Missions of the Reliant: You’ve got to know why things work on a starship.

I finally worked out the major design issues blocking the way for the command input console (also known as that tiny rectangle in the bottom right you see short messages and crew commands in) to exist. Long story short, a lot of things were being handled by the player object and the crew objects that should’ve been handled in the command console class (which didn’t exist yet). Took a bit of retooling to put everything where it belonged.

The console and computer (which I had to implement at the same time since the console depends on it being an addressable input processor) are far from finished products, to be sure. The console pretty much doesn’t work (though it does display not responding when a crew member is down), and the computer is just a framework of code so the console can operate. There really was no way to implement either one separately at this point, since the console depends on the computer and the computer has no function at all without the console. However, at this point I’ll be focusing on the console and adding the computer later, since finishing the computer involves also implementing a number of thus-far ignored displays (such as the galactic map, the ship mods and cargo screens, and the library screens), whereas finishing the console brings back functionality that was partially working before and isn’t now.

A lesson to take away from some of the snafus I’ve made during this project is to have clearly defined rules of interaction between objects early on, and to stick to them. The “responder” class which defines common functionality for anything that needs setup/teardown, mouse/key input, periodic tasking, and access to (pseudo-)global objects does not define any way of letting subclasses say “these objects here need to know about any input I don’t handle and need to be tasked when I’m tasked”, for example (and there’s no larger list of responders to iterate over), which has resulted in a number of subclasses each doing it a bit differently. It works, but it’s ugly as heck. (For the curious, the main game controller prods several top-level objects from its own responder methods, each of which tick others that they’re responsible for; it’s a mess.) I need to go back and standardize and commonalize this stuff. Why didn’t I do that before? At the time I never realized so many different objects were going to need to respond commonly to those inputs, because I hadn’t known then about all the design necessities that I do now. There just weren’t enough objects then to make it worth managing responders the way Cocoa does with NSResponder (first, previous, and next responders in a chain). Now there are.

Who knows. Maybe a certain five-digit prefix code will become an easter egg that does something interesting.

Missions of the Reliant: Belay that phaser order! Arm photon torpedos!

Torpedos now work.

There are some minor glitches, mainly that it should only take two torpedos to destroy a fighter on easy difficulty, not six, but that’s the only remaining issue with the torpedos themselves. I already know the cause (the damage factor is randomized twice, which produces completely wrong numbers) and I’ll fix it next.

I’d like to apologize for taking so long to get done with what may seem like a simple thing. There were two major issues that made torpedos so difficult:

  1. There was an underlying graphical error across the entire game that had gone more or less unnoticed until now because torpedos were the first sprites to be severely affected by it. It seemed that OpenGL was doing linear filtering on textures when scaling them to the power-of-two rectangles per GL_TEXTURE_RECTANGLE_ARB instead of nearest neighbor. Problem: I was calling glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MIN_FILTER, GL_NEAREST), ditto for GL_TEXTURE_MAG_FILTER. Why wasn’t this working? I tried a long list of solutions, including subpixel offset (glTranslated(0.375, 0.375, 0.0)), switching to ARB_non_power_of_two (which failed completely for reasons I still don’t understand), and texture padding. Finally, it dawned on a friend that I was calling glTexParameteri() only once, in -prepareOpenGL, instead of each time I did a glBindTexture(). The assumption was that texture parameters worked the same as the rest of OpenGL, such that state is sticky once set. Fail. Setting the parameters before each glTexImage2D() fixed the problem. That was a fun week of frustration.
  2. I had not, until now, implemented collision detection. If you look carefully at Missions, you find that collision detection is actually only necessary in exactly two places:
    1. Torpedos (including standard player and enemy torpedos, antimatter torpedos, and the planet-busters fired by the battleship).
    2. Missiles (as fired by defense satellites)
    It took some time to implement collision detection, partly because I’d made some bad design decisions in the code much earlier on that had to be fixed to make it work in any reasonable way.

With torpedos working, we’re on the home stretch. Almost everything else is assembling components that already exist in the code. Stay tuned for further updates!

Floating-point makes my brain melt

Sometimes it’s better to do the obvious than try to be “correct”.

How would you check if a floating-point variable x was zero? Common sense says x == 0.0 should work, right? But the compiler gets cranky about floating-point compares, yet zero is certainly a valid sentinel value even for floating-point. So I found the fpclassify() function. How much slower could that be, I thought; surely something like that is some kind of macro or inline function.

I made an assumption. Oops.

Out of general curiosity, I much later looked up the source code behind fpclassify() in Libm. Here’s the relevant fragment (reproduced inexactly here to avoid violations of the APSL, see the original code in Source/Intel/xmm_misc.c in the Libm-315 project at http://opensource.apple.com if curious):

if (__builtin_expect(fabs(d) == 0.0, 0))
    return FP_ZERO;

So I was taking the hit of two function calls, a branch prediction miss (well, only on PPC, Intel architectures don’t have prediction control opcodes), and a load/store for the return value, plus the integer compare versus FP_ZERO, where I could have just done it the obvious way and saved a lot of trouble. Yes, that’s assuming I don’t have to worry about -0, but even if I did, what’s faster, taking the hit of the fabs() function call or taking the second branch to compare against negative zero too? For reference, fabs() on a double, implemented in copysign.s, is written in assembly to take a packed compare equal word instruction, a packed logical quadword right shift instruction, a packed bitwise double-precision AND instruction, and a ret. Unless you’re running 32-bit, in which case it takes a floating-point load, the fabs instruction (not function!), and the ret. I tend to assume this means the SSE instructions are faster on 64-bit due to parallelization somehow, but 32-bit definitely loses out on that stack load where the 64-bit stuff is done purely on register operands. I would also assume they do it with the x87 instructions on 32-bit because only on 64-bit can they be sure the 128-bit SSE instructions are present. Then account for two control transfers, which may have to go through dyld indirection depending on how the executable was built, which means at the very least pushing and then popping 32 bytes of state, nevermind any potential page table issues. It’s a damn silly hit to take if I never have to worry about negative zero! I can safely guess, without even running a benchmark, that x == 0.0 is a whole heck of a lot faster than fpclassify(x) == FP_ZERO.

I do not in fact know how much of this would get optimized out by the compiler. GCC and LLVM are both pretty good with that kinda thing. But there’s no __builtin_fpclassify() in GCC 4.2! It doesn’t exist until at least 4.3, possibly 4.4. I can’t find it in Apple’s official version of Clang either! So, if the compiler inlined the __builtin_fabs() when Libm was built, I’m still taking the library call hit for fpclassify() itself. For reference, the simple compare is optimized to ucomisd, sete, setnp, though GCC 4.2 and LLVM/Clang use different register allocations to do it.

Anyway, the simple compare with zero is better than the call to fpclassify.

Subtle pitfalls of doing things in -dealloc

Let me describe the setup for this issue.

Take a gobal (set property on a singleton object) list of objects which holds only weak references (in this case, if it held strong references, the objects would never lose their last retain and could never be deallocated, yes I know GC avoids it):

@implementation Singleton
- (void)init
{
    if ((self = [super init]))
    {
        CFSetCallBacks cb = { 0, NULL, NULL, CFCopyDescription, CFEqual, CFHash };
        NSMutableSet *container = (NSMutableSet *)CFSetCreateMutable(kCFAllocatorDefault, 0, &cb);
    }
    return self;
}

- (NSMutableSet *)container { return container; }
- (void)setContainer:(NSSet *)value { if (value != container) { [container setSet:value]; } }
// The other KVC-complaint methods for an unordered to-many relationship

// Singleton pattern stuff here
@end

Now consider a class which, as part of its initialization and deallocation, adds itself to and removes itself from this singleton’s container:

@implementation ListedObject
- (id)init
{
    if ((self = [super init]))
    {
        [[Singleton singleton] addContainerObject:self];
    }
    return this;
}

- (void)dealloc
{
    [[Singleton singleton] removeContainerObject:self];
    [super dealloc];
}
@end

This pattern is useful when you need to track every instance of a given class – in this case, my actual code maintains the set property as a property on the class object itself rather than a separate singleton, but the result is the same and I thought I’d use something more familiar for this example.

The first time you release an instance of ListedObject to the point of deallocation, you’ll crash during your next event loop.

Next event loop? Experienced Cocoa readers will immediately guess “autorelease pool”. And they’d be right. To debug this, I added backtrace_symbols_fd() calls to -retain and -release of the ListedObject class. This may seem strange versus using a GUI tool like Instruments’ “Object Allocation” template, but I’m old-fashioned and this was simple. The object was indeed being overreleased, with the extra release coming from the main event loop’s autorelease pool.

The cause is rather intricate. At the time of the deallocation, I had a registered key-value observation on the container property. So, when the object removed itself from the list, KVO called [NSSet setWithObject:removedObject] in order to pass it as part of the change dictionary to the observer callback. This naturally went on the autorelease pool. But oops, we were already in -dealloc of the removed object, so the retain the autoreleased set tried to add was a no-op. Finally, next time through the event loop, that set was freed by the autorelease pool, and tried to call -release on the removed object, but that object had already been fully deallocated. Crash!

Now, a purist will say “stop trying to do things in -dealloc!” Others would point out that GC would bypass this entire problem. Either way, I wanted a simple solution. The problem was an autorelease pool, so just add another one!

- (void)dealloc
{
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    [[Singleton singleton] removeContainerObject:self];
    [pool release];
    [super dealloc];
}

KVO’s set is no longer deferred-release, and all is well.

As it happens, this uncovered an underlying issue in my code, a design flaw which essentially makes the entire debacle unnecessary because the list management needed to be in other methods rather than -init and -dealloc, but I thought it was an interesting note nonetheless.

Missions of the Reliant, take 2

Well, having the project file for Missions in place, I went to organize some of the resources – sprites, backgrounds, strings, all that fun jazz, and there’s plenty of it. In particular, I found myself re-engineering a couple of icons to support the much larger icon sizes we’ve gotten in Mac OS since the ancient days – 8-bit 32×32 icons are well and good, but how’s a 32-bit 512×512 snapshot with alpha channel sound? Pretty good? I thought so. A couple hours tweaking around in GraphicConverter and Photoshop Elements later, I had a nice large icon for the application and saved games. Of course I kept the original icon for 32×32 and 16×16 sizes.

One might ask, what the heck was I doing playing around with graphics when there’s so much code to write? There are two answers. One, that’s just the way I work: Once I get it into my head to finish a task, I get a bit exclusive about it. Two, it was fun, and a part of adding the tiny little bits of personal touch to the port. I wouldn’t want to detract from Michael’s work, of course, but at the same time I am putting in quite a bit of effort to bring that work to the modern audience, so I see no reason that effort can’t be remotely noticable.

Of course, most of the original graphics were vector models done in an old program called Infini-D. Unfortunately, I can’t find this program or anything which can read its format…

Having finished poking around in Photoshop Elements, I went to look up the fundamentals of OpenGL 2D on Mac OS X. Four hours of playing around with CGImage, NSOpenGLView, and glTexImage2D later, I figured out why the alpha channel in the PNG I was using to test was being completely ignored: I never called glEnable(GL_BLEND);. Egg on face much? Oh well. At least I finally made it work. I now have a basic setup in place with which to draw sprites and backgrounds in the game window. Not bad for a few hours and not having done OpenGL in five years, if you ask me.

At least I was vindicated in thinking NSOpenGLView had to be at least a little bit useful despite loud claims by OpenGL coders that I might as well resign myself to subclassing NSView and doing the setup by hand!

That’s about it for today. I’d like, however, to share this amusingly useful little macro with my fellow Objective-C users who aren’t in GC-land yet. I wanted a nice compact way to add a Core Foundation object to an NSAutoreleasePool without doing lots of ugly typecasts. The result was this:

1
#define CFAutorelease(cftyp)        ((typeof((cftyp)))[((id)(cftyp)) autorelease])

Nothing like cheating a bit with typeof(); despite appearances, the expression is not evaluated twice! That let me write bits like CGColorSpaceRef colorSpace = CFAutorelease(CGColorSpaceCreateDeviceRGB()); instead of CGColorSpaceRef colorSpace = (CGColorSpaceRef)[(id)CGColorSpaceCreateDeviceRGB() autorelease];. Much more readable, IMHO. The fact that sending autorelease to nil/NULL does nothing and returns NULL is just a lovely little bonus. This doesn’t work quite right in GC code, of course, so don’t try it there.

The other thing that made me happy was thumbing my nose at all the “ARB_texture_rectangle is so useless” noise. Sure, if you’re doing fancy 3D drawing, a texture with non-parameterized coordinates and no non-clamp wrapping and no mipmaps or borders is a problem. Not so much in simple 2D. Why complicate backwards compatibility by using ARB_texture_non_power_of_two?

Clang: one savvy code analyzer!

Given this code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
- (void)calculateData:(int32_t)n intoLeftBuffer:(int16_t *)lbuf rightBuffer:(int16_t *)rbuf
{
    static uint64_t            (*calculateInternalCall)(id, SEL) = NULL;

    if (!calculateInternalCall)
        calculateInternalCall = (uint64_t (*)(id, SEL))[self methodForSelector:@selector(calculateInternal)];
    
    uint64_t    leftEnableMask = self.isEnabled && lbuf ?
                          ((-(is0Enabled & leftEnabled[0])) & 0xFFFF000000000000) |
                          ((-(is1Enabled & leftEnabled[1])) & 0x0000FFFF00000000) |
                          ((-(is2Enabled & leftEnabled[2])) & 0x00000000FFFF0000) |
                          ((-(is3Enabled & leftEnabled[3])) & 0x000000000000FFFF)
                      : 0,
                rightEnableMask = self.isEnabled && rbuf ?
                          ((-(is0Enabled & rightEnabled[0])) & 0xFFFF000000000000) |
                          ((-(is1Enabled & rightEnabled[1])) & 0x0000FFFF00000000) |
                          ((-(is2Enabled & rightEnabled[2])) & 0x00000000FFFF0000) |
                          ((-(is3Enabled & rightEnabled[3])) & 0x000000000000FFFF)
                : 0;

    for (int32_t i = 0; i < n; ++i)
    {
        uint64_t     data = calculateInternalCall(self, @selector(calculateInternal)),
                     lv = data & leftEnableMask, rv = data & rightEnableMask;
        int16_t      *l = (int16_t *)&lv, *r = (int16_t *)&rv;
        lbuf[i] += l[0] + l[1] + l[2] + l[3];
        rbuf[i] += r[0] + r[1] + r[2] + r[3];
    }
}

Where are the two NULL dereferences Clang claims to have found?

Continue reading

Feeling stupid when writing code

We all make mistakes when coding, and some of them are quite deeply foolish. But this particular snafu I just made takes the cake, in my opinion (dumbed-down code):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Written as a static inline C function for max
// speed; this is called in a very tight loop, usually
// at something like 40,000 times per second.
static inline uint8_t _readByte(NSUIntger *cursor,
                                const char *bytes,
                                NSUInteger maxlen)
{ return (*cursor < maxlen) ? bytes&#91;*cursor++&#93; : 0x66; }
// Macro for readability, known to be called only from the one function
#define readByte() _readByte(&cursor, buffer, bufferlen)
// Macro to skip so many bytes
#define skipBytes(n) do { \
    __typeof__(n) nn = (n); \
    for (uint8_t i = 0; i < nn; ++i) \
        readByte(); \
    } while (0)
- (void)processBuffer
{
    // assume the declarations of n, cursor, buffer, bufferlen
    // ...
    n = readByte();
    // ...
    else if (n > 0x29)
        skipBytes((uint8_t[16]){0,0,0,1,1,2,0,0,0,0,2,2,3,3,4,4})(n & 0xF0) >> 4);
}

See it? (Hint: Look at the implementation of skipBytes().)

Continue reading