( )
exarkun for president

Thu 25 September 2008

Jean-Paul Calderone is an amazing hacker.

So, here's the setup.  We're working on this application that runs in the Adobe AIR runtime.  We implemented the AMP protocol for client-server communication.  Superficially, it worked great; limited tests gave us good results.

Then, we started throwing some real data at it.  And it choked.  The runtime would terminate its client socket silently: it would stop delivering data to application code, send a TCP FIN to the server, and not even deliver an event indicating that the socket had gone away.  Nothing.  Nowhere to set a breakpoint, nothing to debug.

The Python implementation of this protocol worked fine; everything got delivered.  The connection was not dropped unless we told it to drop.

I spent all night poring over protocol dumps, trying to figure out what was going wrong.  There were slight differences in where in the data stream it was dying - but for some reason, always cleanly, on a message boundary.

So, I come into the office and I fire up the program and show JP.  We get a tcpdump, and he looks at the output.  He squints at it for a few minutes and says:
"Huh.  It died on message 64.  That's interesting...
Oh wait, that's hex.  What's 64 in hex?  100.
... thoughtful pause ...
Maybe the garbage collector is buggy?"
He was right.  The bug was in the garbage collector.  AIR apparently doesn't think sockets (and apparently, other stuff, like animations) are "real" things that should keep strong references to the things they are feeding events to - so, sometimes they just crap out and get garbage collected, and silently stop delivering events.