A collection of articles, ideas, and rambling from a guy who wrote some software that one time.
Friday, March 30, 2007
Wednesday, March 21, 2007
How do you troubleshoot completely random problems?
My home desktop machine has been suffering from a Linux kernel "Oops" approximately once every two days for the last few weeks. I would really like it to stop doing that. When I get a stack trace in my logs, it's consistently in the "kswapd" process, even though I disabled all swap weeks ago.
I'm running Edgy on this machine, just like I was running it on my laptop and am running it on my work desktop. Those machines were both completely stable (modulo occasional ndiswrapper issues) running the exact same kernel.
It doesn't seem like it's a hardware issue. At least, the same machine has never exhibited any problems under Windows.
It isn't deterministically reproducible. It always seems to be in response to a click or some kind of user-input event during heavy disk I/O, but flogging the disks and mashing the keyboard, even for hours at a time, doesn't cause it to happen.
I am considering a fresh re-install to attempt a fix for this, but besides the inelegance of that solution, it seems likely that it will leave me in the same place.
Does anyone have a suggestion for tracking this down so that I'll actually know that it's fixed?
Saturday, March 17, 2007
- Chrisopher has not responded yet.
Jonathan has not responded yet.Jonathan has responded.
- Travis has not responded yet.
- Raffi has not responded yet, but the Synthesis blog seems to be offline, and it isn't really a personal thing anyway, so maybe I was out of line to tag him there.
- Jason responded, but in a "friends only" post. Kind of borderline, there, but he's doing the best so far (except Jonathan) so I'm not going to give him a hard time.
Tuesday, March 13, 2007
My saga of ndiswrapper on the macbook continues.
In fact, the dw102 drivers do cause crashes when associating with certain access points. Unfortunately the dw101 drivers don't work with certain (still other) access points, and the lenovo "abgn" drivers have some very peculiar problems with extremely bad UDP performance on the access points the dw101 don't work with.
It occurred to me after a few hours of trolling for better drivers that, in fact, there is a better way. Apple ships windows drivers specifically for this exact card! You don't need to use drivers for some other card with the same (or vaguely similar) chipset.
The Macintosh Drivers For Windows CD included with Boot Camp contains the driver. Obnoxiously, it's encapsulated within a MSI file, within an EXE installer, within a DMG image, inside the Boot Camp application. The only way I could discover to get at the wireless driver was to install the whole thing on a Real, Actual Windows Computer.
To access the driver CD if Boot Camp won't let you burn it, ctrl-click on "/Applications/Utilities/Boot Camp Assistant.app" and click on "Show Package Contents". Then, double-click on "Contents/Resources/DiskImage.dmg". Copy the files on the thing that shows up on your desktop onto a USB key or similar method of conveyance to a Windows machine, and run the installer.
Obnoxious as this process is, it thankfully doesn't make you install to a Windows installation on a Real Actual MacBook Core 2 Duo laptop. Any old Windows machine will do. The installer helpfully puts all the drivers into "C:\Program Files\Macintosh Drivers for Windows XP 1.1.2". The files for the network card are in the "net5416" folder.
Of course, they're also in a file called "net5416.tar.bz2" on my hard disk. I think Apple might take a dim view of me providing a public download site bereft of their unethical and legally dubious EULAs, but if you can't get at the drivers for some reason maybe I could let you have a copy.
Monday, March 12, 2007
Saturday, March 10, 2007
Reader beware! These steps are designed to work around a particular set of bugs on a particular revision of Ubuntu for a particular piece of hardware. If you're reading this at some point in the future, chances are that the madwifi project has already produced a driver. To find out, check to see if madwifi ticket 1001 has been resolved before you do any of this.
If you are running Ubuntu Edgy, with the default kernel, and you have a black rev.2 MacBook and you want to get NetworkManager working without screwing around with Feisty kernels, read on.
- First, you will have to build your own ndiswrapper.
You need at least version 1.29, which is quite a bit newer than the version packaged with edgy. I chose 1.31 - not too old, not too new.You need at least version 1.43. (Earlier versions seemed to work, but 1.44 was the first version I installed which could suspend and resume reliably and did not very occasionally produce random crashes.) These instructions may very well work with older ornewer revisions, but I am not going to build a big revision matrix; the whole point of this is a temporary workaround. Install it with "make uninstall; make install" to make sure to remove ubuntu's packaged ndiswrapper driver first. Keep in mind that if you upgrade your kernel via apt, you may need to repeat this step, so keep the sources around. Next, you will want to get the driver from d-link. Go to this page, and get version 1.02. Some discussion on the Madwifi page says to get version 1.01 instead because 1.02 crashes. Ignore it. As far as I can tell, it's just wrong. I originally followed this advice, which is why I thought that it crashed ndiswrapper; I also had other problems with version 1.01 such as not being able to associate with various public access points.See my later post about which driver to get. Anyhow, install the driver into ndiswrapper by unzipping the downloaded archive and running "
sudo ndiswrapper -i net5416.inf".
- Test to make sure the driver works. If you are already running NetworkManager and nm-applet, simply doing 'modprobe ndiswrapper' ought to set it up nicely and you should get immediate visual feedback that it is working.
- Configure the module to un-load itself when you suspend or sleep and re-load itself when you resume. This is important because otherwise ndiswrapper will not allow anything This is accomplished by editing the file /etc/default/acpi-support and changing the 'MODULES' line to say 'MODULES="ndiswrapper"'. While you're in there, you might want to also change the 'STOP_SERVICES' line to say 'STOP_SERVICES="mysql bluetooth "' instead of just mysql, since bluetooth is notoriously unreliable in the face of power management, and bluetooth connections, like wifi connections, will not survive a suspend/resume cycle anyway.
- Make sure that ndiswrapper does not create a 'wlan0' alias for itself; you probably don't need to do anything, but if you're using a different version of the ndiswrapper script, it may create a file called "/etc/modprobe.d/ndiswrapper" with an alias for "wlan0" in it. If you see this, remove it. NetworkManager knows about the module name 'wlan0' and will constantly try to load it if it becomes unloaded for some reason. This results in a particularly nasty race condition where the suspend machinery politely removes ndiswrapper in preparation for suspending and then NetworkManager loads it again, resulting in a hang from which it is impossible to recover without a hard reboot. I managed to create this situation for myself through experimentation, so it probably won't happen to you, but just in case make sure that file doesn't contain any reference to 'wlan0'.
- Test suspending and resuming and make sure that the driver loads as expected.
- Configure the driver to load at boot. This consists of editing the file /etc/modules and adding a line that says "ndiswrapper". DO NOT do this until you are sure the driver works with your machine; if it causes a crash, this might make your machine unbootable.
Monday, March 05, 2007
Last weekend, I attended PyCon 2007. I hadn't originally intended to go, but I could hardly miss my father giving the final keynote.
The keynote was riveting as expected. To be honest, I hadn't expected much of the conference beyond that. The first PyCon was an amazing experience, but as the topics drifted more towards "web frameworks" and away from more general programming, later ones were progressively less interesting. I missed last year's and nothing I read caused me to regret it much.
I'm happy to say that this expectation was completely wrong, and this conference was amazing. I thoroughly enjoyed it on many levels.
In previous years, a lot of my time was soaked up by justifications, answering questions like "why does Twisted make me do foo" or "why haven't you made Nevow do bar yet". This year, the discussions I participated in were all productive and engaging, including the two, multi-hour Twisted birds-of-a-feather sessions which I attended.
Many of the exciting developments that I enjoyed while I was there aren't ready for public consumption quite yet, so I can't say much here. I can give you some incredibly vague hints though! It was a very rewarding conference both from a business and community development perspective. (Those of you with privileged information, please do not add anything revealing to the comments. Seriously.)
One cool thing that I can shout from the rooftops already is that Guido, a group of concerned hackers, and I got to have a meeting of the minds, which Guido has already blogged about, addressing many upcoming concerns we all had about Python 3. That, and several other discussions with the responsible developers about the proposed transition plans for the 3.0 release have put my mind at ease. That's not to say that I agree with every decision that has been made - and I definitely need to participate in a few more mailing list discussions - but I feel much more comfortable that the whole thing is in good hands.
My major regret for this conference is that I was completely unprepared for the truckload of great stuff that happened. I thought I would take it easy for a few days and get back to work. If I had a better idea of what would be happening, I would have prepared at least a few lightning talks, a more structured BoF session, and better allocated my time to the many interesting folks who wanted to bend my ear so that I wasn't rushing from conversation to conversation.
I told a lot of people that I'd be doing a lot of things as soon as the conference was over. Unfortunately the first thing I actually did when the conference was over was develop the worst cold I've had in the last 5 years, and promptly stay home sick from work for a week, sleeping most of the time. To make matters worse, the conference and the illness coincided with the blackberry software on my phone crashing very badly and a not-quite-perfect beta-test of the Divmod migration process on my email account.
In other words, if I told you I'd get back to you at the conf, I probably haven't. I'm trying desperately to claw through my backlog right now, but it's going to take a while. Please be patient, and if you haven't heard from me in a week or so, feel free to send some repeat emails and nag me. If I said I wanted to get back to you, I really do.
I should warn you that I'm still not quite back at 100% HP/MP yet, and I have lots of actual work work to do as well, so you might want to wait a few days, but please, everybody, stay in touch.