A collection of articles, ideas, and rambling from a guy who wrote some software that one time.

Sunday, July 15, 2007

Pet Peeve

The word "depreciate" means "to lessen the price or value of".  This is an accounting jargon term referring to the process by which assets lose value over time.  It is pronounced 'Dee Pree Shee Ate".

The word "deprecate" means "to express disapproval of" or "to urge reasons against; protest against".  This is a programming jargon term describing the process by which APIs become less favorable over time.  It is pronounced "Deh Preh Kayt".

These words, while they have similar meanings, are not synonyms.  Please do not confuse them, especially when using their jargon senses.  It sounds like nails on a chalkboard to me, having worked on accounting software.  I would like to be able to use phrases like "a deprecated depreciation function" without eliciting bewilderment.

Both Java and Python consistently use "@deprecated", and "DeprecationWarning".  English usage of these terms may be shifting, but "DepreciationWarning" or "@depreciated" will still get you runtime or compiler errors, so please stick to "deprecate" consistently while talking about code.

Thank you.

Saturday, July 14, 2007

Mindful Link Propagation

It occurs to me that there may still be a few Python people who read this blog but have not yet discovered JP Calderone's.

If you are such a person, he just did an excellent write-up of the practical implications of Python's rich comparison operators.  Check it out.

Saturday, July 07, 2007

Functional Functions and the Python Singleton Unpattern

Have you ever written a module that looked like this?
subscribers = []

def addSubscriber(subscriber):
    subscribers.append(subscriber)

def publish(message):
    for subscriber in subscribers:
        subscriber.notify(message)
And then used it like this?
from publisher import publish

class worker:
    def work(self):
        publish(self)
I've done this many times myself.

I used to think that this was the "right" way to implement Singletons in Python.  Other languages had static members and synchronized static accessors and factory methods; all kinds of rigamarole to achieve this effect, but Python simply had modules.

Now, however, I realize that there is no "right" way to implement Singleton in Python, because singletons are simply a bad thing to have.  As Wikipedia points out, "It is also considered an anti-pattern since it is often used as a euphemism for global variable."

The module above is brittle, and as a result, unpleasant to test and extend.

It's difficult to test because the call to "publish" cannot be indirected without monkeying around with the module's globals - generally recognized to be poor style, and prone to errors which will corrupt later, unrelated tests.

It makes code that interacts with it difficult to test, because while you can temporary mangle global variables in the most egregious of whitebox tests, tests for code that is further away shouldn't need to know about the implementation detail of "publish".  Furthermore, code which adds subscribers to the global list will destructively change the behavior of later tests (or later code, if you try to invoke your tests in a running environment, since we all know running environments are where the interesting bugs occur).

It's difficult to extend because there is no explicit integration point with 'publish', and all instances share the same look-up.  If you want to override the behavior of "work" and send it to a different publisher, you can't call to the superclass's implementation.

Unfortunately, this probably doesn't seem particularly bad, because bad examples abound.  It's just the status quo.  Twisted's twisted.python.log module is used everywhere like this.  The standard library's sys.path, sys.stdin/out/err, warnings.warn_explicit, and probably a dozen examples I can't think of off the top of my head, all work like this.

And there's a good reason that this keeps happening.  Sometimes, you feel as though your program really does need a "global" registry for some reason; you find yourself wanting access to the same central object in a variety of different places.  It seems convenient to have it available, and it basically works.

Here's a technique for implementing that convenience, while still allowing for a clean point of integration with other code.

First, make your "global" thing be a class.
class Publisher:
    def __init__(self):
        self.subscribers = []

    def addSubscriber(self, subscriber):
        self.subscribers.append(subscriber)

    def publish(self, message):
        for subscriber in self.subscribers:
            subscriber.notify(message)

thePublisher = Publisher()
Second, decide and document how "global" you mean.  Is it global to your process?  Global to a particular group of objects?  Global to a certain kind of class?  Document that, and make sure it is clear who should use the singleton you've created.  At some point in the future, someone will almost certainly come along with a surprising requirement which makes them want a different, or wrapped version of your global thing,  Documentation is always important, but it is particularly important when dealing with globals, because there's really no such thing as completely global, and it is difficult to determine from context just how global you intend for something to be.

Third, and finally, encourage using your singleton by using it as a default, rather than accessing it directly.  For example:
from publisher import thePublisher

class Worker:
    publisher = thePublisher

    def work(self):
        self.publisher.publish(self)
In this example, you now have a clean point of integration for testing and extending this code.  You can make a single Worker instance, and change its "publisher" attribute before calling "work".  Of course, if you're willing to burn a whole extra two lines of code, you can make it an optional argument to the constructor of Worker.  If you decide that in fact, your publisher isn't global at all, but system-specific, this vastly decreases the amount of code you have to change.

Does this mean you should make everything into objects, and never use free functions?  No.  Free functions are fine, but functions in Python are for functional programming.  The hint is right there in the name.  If you are performing computations which return values, and calling other functions which do the same thing, it makes perfect sense to use free functions and not bog yourself down with useless object allocations and 'self' arguments.

Once you've started adding mutable state into the mix, you're into object territory.  If you're appending to a global list, if you're setting a global "state" variable, even if you're writing to a global file, it's time to make a class and give it some methods.