A Tired Hobgoblin

Sunday October 21, 2012

Alternate (Boring) Title: Why the Twisted coding standard is better than PEP8 (although you still shouldn't care)

People often ask me why Twisted's coding standard – camel case instead of underscores for method names, epytext instead of ReST for docstrings, underscores for prefixes – is "weird" and doesn't, for example, follow PEP 8.

First off, I should say that the Twisted standard actually follows quite a bit of PEP 8. PEP 8 is a long document with many rules, and the Twisted standard is compatible in large part. For example, pretty much all of the recommendations in the section on pointless whitespace.

Also, the primary reason that Twisted differs at all from the standard practice in the Python community is that the "standard practice" was almost all developed after Twisted had put its practices in place. PEP 8 was created on July 5, 2001; at that point, Twisted had already existed for some time, and had officially checked in its first coding standard just a smidge over one month earlier, on May 2, 2001.

That's where my usual explanation ends. If you're making a new Python project today, unless it is intended specifically as an extension for Twisted, you should ignore the relative merits of these coding standards and go with PEP 8, because the benefits of consistency generally outweigh any particular benefits of one coding standard or another. Within Twisted, as PEP 8 itself says, "consistency within a project is even more important", so we're not going to change everything around just for broader consistency, but if we were starting again we might.

But.

There seems to be a sticking point around the camelCase method names.

After ten years of fielding complaints about how weird and gross and ugly it is – rather than just how inconsistent it is – to put method names in camel case, I feel that it is time to speak out in defense of the elegance of this particular feature of our coding standard. I believe that this reaction is based on Python programmers' ancestral memory of Java programs, and it is as irrational as people disliking Python's blocks-by-indentation because COBOL made a much more horrible use of significant whitespace.

For starters, camelCase harkens back to a long and venerable tradition. Did you camelCase haters-because-of-Java ever ask yourselves why Java uses that convention? It's because it's copied from the very first object-oriented language. If you like consistency, then Twisted is consistent with 34 years of object-oriented programming history.

Next, camelCase is easier to type. For each word-separator, you have only to press "shift" and the next letter, rather than shift, minus, release shift, next letter. Especially given the inconvenient placement of minus on US keyboards, this has probably saved me enough time that it's added up to at least six minutes in the last ten years. (Or, a little under one-tenth the time it took to write this article.)

Method names in mixedCase are also more consistent with CapitalizedWord class names. If you have to scan 'xX' as a word boundary in one case, why learn two ways to do it?

Also, we can visually distinguish acronyms in more contexts in method names. Consider the following method names:

frog_blast_the_vent_core
frogBLASTTheVentCore

I believe that the identification of the acronym improves readability. frog_blast_the_vent_core is just nonsense, but frogBLASTTheVentCore makes it clear that you are doing sequence alignment on frog DNA to try to identify variations in core mammalian respiration functions.

Finally, and this is the one that I think is actually bordering on being important enough to think about, Twisted's coding standard sports one additional feature that actually makes it more expressive than underscore_separated method names. You see, just because the convention is to separate words in method names with capitalization, that doesn't mean we broke the underscore key on our keyboards. The underscore is used for something else: dispatch prefixes.

Ironically, since the first letter of a method must be lower case according to our coding standard, this conflicts a little bit with the previous point I made, but it's still a very useful feature.

The portion of a method name before an underscore indicates what type of method it is. So, for example:

irc_JOIN - the "irc_" prefix on an IRC client or server object indicates that it handles the "JOINED" message in the IRC protocol
render_GET - the "render_" prefix on an HTTP resource indicates that this method is processing the GET HTTP method.
remote_loginAnonymous - the "remote_" prefix on a Perspective Broker Referenceable object indicates that this is the implementation of the PB method 'loginAnonymous'
test_addDSAIdentityNoComment - the "test_" prefix on a trial TestCase indicates that this is a test method that should be run automatically. (Although for historical reasons and PyUnit compatibility the code only actually looks at the "test" part.)

The final method name there is a good indication of the additional expressiveness of this naming convention. The underscores-only version – test_add_dsa_identity_no_comment – depends on context. Is this an application function that is testing whether we can add a ... dissah? ... identity with no comment? Or a unit test? Whereas the Twisted version is unambiguous: it's a test case for adding a D.S.A. identity with no comment. It would be very odd, if not a violation of the coding standard, to name a method that way outside of a test suite.

Hopefully this will be the last I'll say on the subject. Again, if you're starting a new Python project, you should really just go ahead and use PEP 8, this battle was lost a very long time ago and I didn't even really mind losing it back then. Just please, stop telling me how ugly and bad this style is. It works very nicely for me.

The Lexicology of Personal Development

Sunday October 07, 2012

These days, everybody talks about geeks. Geek chic, the "age of the geek"; even the New York Times op-ed page has been talking about the rise of "geeks" for years. Bowing to popular usage, even I use the word as it's currently being bandied about. But I think that the real success story is that of nerds.

A pernicious habit I've noticed in the last decade of the growth of geek culture is that it has developed a sort of cargo-cult of meritocracy. Within the self-identified "geek" community, there's a social hierarchy based on all kinds of ridiculous pop-culture fetishism. Who knows the most Monty Python non-sequiteurs? Who knows the most obscure Deep Space Nine trivia? This is hardly a new thing – William Shatner famously complained about it on Saturday Night Live in 1986 – but the Internet has been accelerating the phenomenon tremendously. People who had a difficult time in their teens find each other as adults through some fan-club interest group, and then they make fast friends who had similar social problems. Soon, since that's the shared interest that they know all their friends from, they spend all their time in the totally fruitless pursuit of more junk related to some frivolous obsession. That can be okay, almost healthy even, if the focus of this accumulation is a productive hobby. However, if it's just a pop-culture franchise (Harry Potter, Star Trek, World of Darkness) what was originally a liberating new social landscape can rapidly turn into a suffocating, stale dead-end for personal development.

So I always feel a twinge when I identify myself as a "geek". I usually prefer to say that I am - or at least aspire to be - a nerd.

A nerd is someone who is socially awkward because they are more thoughtful, introspective, intelligent or knowledgeable than their peers. They notice things that others don't, and it makes interaction difficult. This is especially obvious in younger nerds, where they're a little above their age group's intelligence but not quite intelligent enough to know when to keep their mouths shut to avoid ostracism. But, even if they have learned to keep a lid on their less-popular observations, it's tough to constantly censor yourself and it makes interaction with your peers less enjoyable.

A geek is someone who is socially awkward because they are obsessed with topics that the mundanes among us just don't care about that much. They collect things, whether it's knowledge, games, books, toys, or technology. Faced with a popular science fiction movie, a nerd might want to do the math to see whether the special effects are physically plausible, but a geek will just watch it a dozen times to memorize all the lines.

A dork is just socially awkward because they just aren't all that pleasant to be around. Nerds and geeks have trouble with interacting with others because they're lost in their own little worlds of intellectual curiosity or obsession: dorks are awkward because, let's face it, maybe they're a little stupid, a little mean, and just not that interesting. A dork is unsympathetic.

By way of a little research for this post, I discovered that I'm apparently not the only one who has this impression of the definitions, and even Paul Graham seems to agree with me on word choice. Still: from here on out, these are the correct definitions of the words, thank you very much.

Maybe you've heard these definitions before, and this is all old news. Also, these are words for the sort of tedious taxonomy of people that fictional teenagers in high-school movies do. It's obviously not karmically healthy to start labeling people "nerd", "dork", and "geek" and then writing them off as such. So, you might ask, why do I bring it up?

Because you, like me, are almost certainly a nerd, a geek, and a dork. And, as you might have inferred from my definitions above, nerds are better than geeks, and dorks are worse than both.

First, consider your inner nerd. It's good to be intellectually curious, to stretch your cognitive abilities in new and interesting ways, to learn things about how systems work. Physical systems, social systems, technological systems: it's always good to know more. It's even good to be curious to the point of awkwardness, especially if you're a kid who is concerned about awkwardness; don't worry about it, it'll make you more interesting later. It's good to foster any habits which are a little nerdy.

Second, your inner geek. It's okay to enjoy things, even to obsess about them a little bit, but I think that our culture is really starting to overdo this. Geeks are presented in popular media as equally, almost infinitely, obsessed with Star Wars, calculus, Star Trek, computer security, and terrible food (cheese whiz, sugary soda brands, etc). No real people actually have time for all this stuff. At some point, you have to choose whether you're going to memorize Maxwell's or Kosinski's equations.

One way that you can keep your inner geek in check is to always ask yourself the question: am I watching this movie / playing this game / reading this book because I actually enjoy it and I think it's worthwhile, or am I just trying to make myself conform to some image of myself as someone who knows absolutely everything about this one little cultural niche?

There are people who will treat being a fan of something that someone else created as morally equivalent (or, in a sense, even better than) creating something yourself, and those people are not doing you any favors. Do not pay attention to them.

Of course, there's some overlap. People who like playing with systems in real life enjoy the fluffier, more lightweight intellectual challenges of playing with the rules of fictional universes, especially the ones from speculative fiction. When I was a kid, I went to a couple of Star Trek conventions and let me tell you, there were some legit nerds there; astrophysicists, rocket scientists, and experimental chemists, all excitedly talking about how they were inspired to pursue their careers by fiction of various kinds.

So go ahead, take a break, and geek out. Just don't tell yourself that it's anything other than for fun.

Finally, your inner dork.

As you're enthusiastically cultivating your nerdiness and carefully managing your geekiness, you will be accumulating a little bit of dorkiness as you go: at some point you have to make decisions about whether to do some minor social obligation in order to spend some time on learning a new thing (or re-watching your favorite movie). You have to decide whether to restrain yourself so you can listen to your friend talk about a rough day at their job or to start spouting facts about the progress of the repairs on the large hadron collider.

Sometimes, on balance, it's acceptable to be a little bit inconsiderate in the pursuit of something more important. People worth being friends with will see that and understand. Heck, practically every movie plot these days puts at least one awkward and abrasive nerd in a sympathetic and even heroic position. But be careful: once you decide that social graces are your lowest priority, it's a hop skip and a jump from being a lovable but absent-minded genius to being a blathering blowhard who just will not shut up about some tedious Riemannian manifold crap that nobody cares about even we just told them that somebody died.

The goal of the nerd or the geek, after all, is not to be awkward; it's easy to forget sometimes that that is an unintentional and unpleasant side effect of the good parts of those attributes. Being a dork is just bad. After all, if you're so smart, why aren't you nice?

Simple Made Variadic

Thursday July 12, 2012

Last night I made a snarky tweet about how Clojure is doomed. Out of context, it didn't really make a lot of sense, and Ivan Krstić replied asking what the heck I was talking about. I tried to fit the following into a tweet but it kinda broke the tweeterizer I was using right in half, and so I had to put it here.

I love a good snark as much as the next person - some might say more - but it really bothers me when people make snide comments denigrating others' free work without at least offering a cogent criticism to go with it, and I don't want to be that guy. So, hopefully before the whole Clojure community finds said tweet and writes me off as an arrogant Python bigot, I would like to explain what I meant in a bit more detail.

Right off the bat I should say that this was a bit tongue-in-cheek. I actually rather like Clojure and I think Rich Hickey has some very compelling ideas about programming in general. I watch his talk "Simple Made Easy" once every month or two, contemplating its deeper meaning, and I still usually come away with an insight or two.

I should also make it clear that I was linking to the recur special form specifically, and not just the special forms documentation in general. Obviously having reference docs isn't a bad thing.

Ivan, (or should I say, "@radian"?) you may be right; that documentation you linked to may indeed one day spell Python's doom. If Python does eventually start to suck, it will be because it collapsed under the weight of its own weird edge cases like the slightly-special behavior of operator dispatch as compared to method dispatch, all the funkiness of descriptors, context managers, decorators, metaclasses, et cetera.

A portion of my point that was serious, though. The documentation for recur does highlight some problems with Clojure that the Python docs can play an interesting counterpoint to.

The presence of the recur form at all is an indication of the unhealthy level of obsession that all LISPs have with recursion. We get it: functions are cool. You can call them. They can call themselves. Every other genre of language manages to use this to a reasonable degree of moderation without adding extra syntactic features to their core just so you can recurse forever without worrying about stack resources. Reading this particular snipped of documentation, I can almost hear Rich Hickey cackling as he wrote it, having just crowned himself God-Emperor of the smug LISP weenies, as he gleefully points out that Scheme has it wrong and the CL specifications had it wrong with respect to tail call elimination, and that it should be supported by the language but also be explicit and compiler-verified.

The sad thing is, my hypothetical caricature of Clojure's inventor is actually right! This is a uniquely clever solution to a particularly thorny conceptual problem with the recursive expression of algorithms. The Scheme folks and the Common Lisp folks did both kinda get it wrong. But the fact that this has to be so front-and-center in the language is a problem. Most algorithms shouldn't be expressed recursively; it is actually quite tricky to communicate about recursive code, and anyway most systems that really benefit from it have to be mutually, dynamically reentrant anyway and won't be helped by tail call elimination. (My favorite example of this is still the unholy shenanigans that Deferreds have to get up to to make sure you don't have to care about callback chain length or Deferred return nesting depth.)

Also, if you want to be all automatically parallelizable and web scale and "cloud"-y, recursion and iteration are both the wrong way to do it; they're both just ways of tediously making your way down a list of things one element at a time. What you want to do is to declaratively apply a computation to your data in such a way as to avoid saying anything about the order things have to happen in. To put it more LISPily, (map) is a better conceptual foundation for the future than (loop) or (apply). Of course you can do the naive implementation of (map) with (recur), but smarter implementations need application code to be written some other way.

The language style choices of the manual in this case is also telling. The Python docs that Ivan linked to go into excruciating detail, rephrasing and explaining the same concept in a few different ways, linking to other required concepts in depth so the reader can easily familiarize themselves with any prerequisites, while still essentially explaining a nerdy part of the language that you can ignore while still using it productively. Every Python programmer ignores descriptors while they're learning to write classes and methods, despite that they're using them all the time; this ability to be understood at different levels of complexity is a strength of every good language, and python does particularly well in that regard. Of course one could also make the case that this is just because Python has so many dusty corners hidden behind double-underscores, and a better language would just have less obscure junk in it, not make understanding the obscure junk optional, but I digress.

The description of recur, by contrast, is deeply flawed. It is terse, to a fault. It introduces the concept of "recursion points" without linking to any kind of general reference. It uses abbreviations all over the place ("exprs", "params", "args", "seq") without even using typesetting to illuminate whether they are using a general term or a specific variable name.

But, by far, the worst sin of this document is the use of the words "variadic" and "arity". There is really no excuse for the use of these words, ever. Take it from me, I am exactly the kind of pedantic jerk who will drop "arity" into a conversation about, for example, music theory, just to demonstrate that I can, and as that kind of jerk I can tell you with certainty: I have no excuse.

It should say: "a function that takes a variable number of arguments". Or possibly: "it must take exactly the same number of arguments as the recursion point".

This was particularly disappointing example to me because Clojure strikes me as a particularly... for lack of a better word, "optimistic" lisp, one that looks forward rather than back, one that is more interested in helping people find comprehensible and composeable ways to express programs than in revisiting obscure debates over reader macros or lisp-1/lisp-2 holy wars. But the tone of the documentation for recur aims it straight at the smug lisp weenie crowd.

As I hope is obvious, if not initially, then at least by now, I don't think that Clojure will fail (or succeed) on the merits of one crummy piece of documentation. It's a much younger language than Python, so it may have a ways to go in its documentation practices. It also comes from an illustrious heritage that I can't expect to see none of in the way that it talks about itself, no matter how unfortunate certain details of that heritage are. Heck, at the parallel point in Python's lifetime, it didn't even have descriptors yet, let alone the documentation for them!

Still, I don't think that this issue is entirely trivial, and I hope that the maintainers for the documentation for Clojure, at least the documentation for the parts of the language you have to see every day, take care to improve its accessibility to the less arcane among us.

We'll Always Have Cambridge

Friday April 13, 2012

Half-way through 2012, I will be leaving the east coast.

There are a great many things I despair of leaving behind; family, friends, the most excellent Boston Python Meetup, participating in the sometimes incendiary, sometimes hilarious Cambridge, Massachusetts / Cambridge, England rap war.

However, I'm not writing today in order to wax lyrical about the area, or to extoll the virtues of my new home, but hopefully, to prevent a missed opportunity. I know there are at least a few really cool people in Massachusetts who read this blog, and who read my tweets, that I either haven't seen in quite a while or have never actually met in person.

So if grabbing a coffee with me is an interesting idea to you, please drop me a line within the next month. I would love to hear your story about how PHP ruined your summer, or how Twisted changed your life, or how you once pwned a vending machine with nothing but a malformed JPEG.

I'm sure I'll visit the area from time to time, so this isn't quite your last chance, but it just won't be the same, you know?

If I follow you, you can DM me on Twitter of course, but my email address isn't hard to figure out either. If you glance up towards the top of your browser window right now, you're practically looking at it.

This Isn't How PyPy Works, But it Might as Well Be

Sunday February 12, 2012

It seems like a lot of the Python programmers I speak with are deeply confused by PyPy and can't understand how it works. The stereotypical interlocutor will often say things like: A Python VM in Python? That's just crazy! How can that be fast? Isn't Python slower than C? Aren't all compilers written in C? How does it make an executable?

I am not going to describe to you how PyPy actually works. Lucky for you, I'm not smart enough to do that. But I would like to help you all understand how PyPy could work, and hopefully demystify the whole idea.

The people who are smart enough to explain how PyPy actually works will do it over at the PyPy blog. At some level it's really quite straightforward, but this impression of straightforwardness is not conveyed well by posts with titles like "Optimizing Traces of the Flow Graph Language". In addition to being a Python interpreter in Python, PyPy is a mind-blowingly advanced exploration of the cutting-est cutting-edge compiler and runtime technology, which can make it seem complex. In fact, the fact that it's in Python is what lets it be so cutting-edge.

Most people with a formal computer science background are already familiar with the fairly generic nature of compilers, as well as the concept of a self-hosting compiler. If you do have that background, then that's all PyPy is: a self-hosting compiler. The same way GCC is written in C, PyPy is written in Python. When you strip away the advanced techniques, that's all that's there.

A lot of folks who are confused by PyPy's existence, though, I suspect don't have that background; many working programmers these days don't. Or if they do, they've forgotten it, because the practical implications of the CSS box model are so complex that they squeeze simpler ideas, like turing completeness and the halting problem, out of the average human brain. So here's the easier explanation.

A compiler is a program that turns a string (source code: your program text written in Python, C, Ruby, Java, or whatever) into some kind of executable code (bytecode or runtime interpreter operations or a platform-native executable).

Let's examine that last one, since it seems to be a sticking point for most folks. A platform-native executable is simply a bunch of bytes in a file. There's nothing magic about it. It's not even a particularly complex type of file. It's a packed binary file, not a text file, but so are PNGs and JPEGs, and few programmers find it difficult to believe that such files might be created by Python. The formats are standard and very long-lived and there are tons of tools to work with them. If you're curious, even Wikipedia has a good reference for the formats used by each popular platform.

As to Python being slower than C: once a program has been transformed into executable code, it doesn't matter how slow the process for translating it was: the running program is now just executable instructions for your CPU, so it doesn't matter that Python is slower than C, because it was just the compiler that was in Python, and by the time your program is running, the original Python has effectively vanished and all you're left with is your program executing.

(Actually, Python is faster than C anyway, especially at producing strings.)

In reality, PyPy takes a hybrid approach, where it is a program which produces a program and then does some stuff to it and creates some C code which it compiles with the compiler of your choice and then creates some code which then creates other code and then puts it into memory, not a file, and then executes it directly, but all of that is ancillary tricks and techniques to make your code run faster, not a fundamental property of the kind of thing that PyPy is. Plus, as I said, this article isn't actually about how PyPy works anyway, it's just about how you should pretend it works. So you should ignore this whole paragraph.

For the sake of argument, assume that you know all the ins and outs of binary executable formats for different operating systems, and the machine code for various CPU architectures. The question you should really ask yourself is: if you have to write a program (a compiler) which translates one kind of string (source code) into another kind of string (a compiled program): would you rather write it in C or Python? What if the strings in question were a template document and an HTML page?

It shouldn't be surprising that PyPy is written in Python. For the same reasons that you might use Django templates and not snprintf for generating your HTML, it's easier to use Python than C to generate compiled code. This is why PyPy is at the forefront of so many advanced techniques that are too sophisticated to cover in a quick article like this. Since the compiler is written in a higher-level language, it can do more advanced things, since lower-level concerns can be abstracted away, just as they are in your own applications.