When I was very young, my mother was concerned that I never laughed or smiled, and having forgotten to pre-load my positronic net with the "humor" module, she realized she would have to do some work from scratch. I am told that the original transcript went something like this.
Mom: Do you know how humor works?Now, my mother actually kept balls of yarn in various places around the house, and I had seen the moon, so this didn't strike me as very funny. I thought about how big balls of yarn were, how surprisingly long they were when unrolled, and how slowly they got smaller. Then I attempted to mentally estimate the distance to the moon, in terms of how quickly the balls unrolled, how quickly they got smaller, pictures in books of the relationship between the moon and the earth, and how far away other things I had seen were. I don't remember the rest of the conversation, but I distinctly remember the mental image that I built during this process, as it has stayed with me during the years. It looked like this:
Me: No.
Mom: I am going to tell you a joke, then. It is one of the first jokes that my brother used to tell.
Me: Okay.
Mom: How many balls of string does it take to get to the moon?

Me: Three.My mother thought this was hilarious, so my initial understanding of humor was that I should run up to everyone I met and say: "howmanyballsofstringdoesittaketogettothemoon?doyougiveupyet?THREE!HAHAHAHAHAHAHAHAHAHA". Reading doc/fun/Twisted.Quotes in the Twisted distribution can show you how little it's progressed since then.
When I estimate programming tasks, I still have a similar sensation to when I was 2 years old and building that little picture in my head. Then, I grossly underestimated because I didn't have a mapping between astronomical distances and inches, because I didn't know what units distance was measured in among the stars. Now, I grossly understimate because I don't know what unit you can measure programming effort in. It's not "hours" because I can't reason about that - one does not do a uniform amount of work within one hour on a program, especially since several hours are spent thinking. I know various ways to measure finished programs, and I know of various ways to measure programs by specifying them to death - however, neither of these gives me the accurate estimate when I want it, which is to say, before work has begun and a great deal of resources have been invested. It is harder, and takes longer, in my experience, to accurately estimate (in hours) how long a program will take than to just write it in the first place; and even if you do go through that process, you can't estimate how long the esimation will take (and the estimation process cheats, by stealing work from the programming process so that it is shorter.)
All this thinking doesn't do anything to make the need for good estimates go away though. So how do you tell how big, or how hard a program is, without first writing the program several times and getting lots of different people to do it? When you know how hard it is, what units do you express it in?
12 comments:
As far as I've seen, this question is always asked in a vacuum, by which I mean, noone seems to try and answer it by looking at the answers to the questions "how do engineers know how long it takes to build a bridge?" and "how do writers know how long it takes to write a novel?"
Those particular two tasks have a certain amount of linearity. You can estimate the time taken for a first draft of a novel by writing a page and multiplying the time taken by 200. (I've never written more than 2000 words of fiction in a single piece, but I suspect that such an estimate would need to be doubled.) Alternatively, you can estimate the time taken to build a bridge by reference to similarly large engineering projects that take place all the time. How do they estimate the design time? I have no idea. How do writers estimate the edit time for a novel? Well, they don't write to a clock, so I suspect many don't, but scriptwriters (and movie scripts are notoriously over-engineered all the time) probably have to.
Anyway, that's no kind of answer. But surely programming isn't the only discipline in the world where the natural kind of data to turn to for time estimation is "time taken for work already done, multiplied by work needed to do, add (or multiplied by!) time needed to correct mistakes or improve product." I don't know what the other disciplines do (or even really which ones they are), but I do wonder if the answers would be useful.
foremost among them, how many balls of string did your mom (or her brother) think it took to get to the moon?
onebutitbetterbeabigone!HAHAHAHAHAHAHAHA
With a bridge, presumably one could measure "work needed to do" by the amount of materials remaining to be moved to the site, or the distance remaining to be spanned, or something like that.
In a programming project, what is the unit of measure for work as-yet not done? Lines of unwritten code?
OK, you're focusing more on the measure than the process of working it out once you have the measure, I haven't thought about the measure. I suppose the novel analogy is more apt here. Yep, there's "pages to go" or "plot points to include" or "karmic balance of literary talent to be attained", but it doesn't account for the re-drafting process. Understanding the process of writing a finished novel, as opposed to one that ends, probably requires writing several novels. If you want collaborative examples, making a movie is similar, and while they're forced to cost in time and dollars, they get it spectacularly wrong sometimes.
One problem with your challenge is you haven't specified what sunk costs are acceptable in the estimation process.
There are three alternatives here:
1. Estimation cost (measured in time) rises linearly or more steeply with the project cost (in time). Current state of affairs, and it is clearly unacceptable.
2. Rough estimates (to within, say, 100% because it would improve on the current state of affairs) have a small fixed cost (without a measure we can't say what small would be!), better estimates have a estimation cost linear with project cost.
This is OK because you can use the rough estimate to find out if the better estimate is worth doing. It's acceptable in bridge engineering (you need a pretty good estimate before you even buy the materials you've got lying on the ground that the builders use to think about how long there is to go, and the pretty good estimate is really costly to do, but the rough one is fairly cheap).
3. Good estimates have a small fixed cost similar to the above. Excellent but unlikely.
Assuming you're going to go for 2 (I think actually that you want 3), then I think one possibility is that the good estimation measures in "tests to pass" — ie the good estimation time is spent writing tests, and the rough measure measures in "tests to write", possibly based on a feature set.
What I'm envisaging here is that the rough process spends its time listing the project feature set to some as yet undetermined degree of accuracy, which I don't think rises linearly with project scope. Using that feature set, they then estimate how many tests each feature will need. Summing those gives a rough cost.
The obvious problem with that is that working programmers today cannot do this. Experienced testers might be able to get as far as "number of tests", but they can't translate that into economics friendly terms yet (money and time). It (or other estimation processes) would need to happen some number of times before the heuristics fall into place, as I guess they did with bridge building when the discipline was young.
What I really want is a way to produce a reasonable estimate at a fixed cost. I recognize that total accuracy is impossible. The main reason I'm looking for the unit of measure is so that you can tell when you have departed from the plan, and that your estimate is becoming inaccurate. Milestone-based planning would seem to be a good start on this problem, but the trouble is that it is possible to meet a series of milestones but miss integration points so that when you hit milestone 2 you have regressed so the criterea for milestone 1 are no longer satisfied... if this were novel-writing it would certainly be possible to accidentally burn all of chapter 1 in writing chapter 2, but it would hardly be the norm. :)
This is probably the failure point of the novel analogy. (A better sub-analogy would be that chapter 1 and chapter 2 don't relate appropriately, but relations are much less strict.)
To remain with the questions about other disciplines though: I'm actually having difficulty thinking of disciplines with large multi-person projects that do fixed cost estimations. Are there many? If not, why do you think programming should have them?
http://www.freemars.org/jeff/planets/Luna/Luna.htm
http://www.ext.nodak.edu/extpubs/ansci/sheep/as989-8.htm
http://www.google.com/search?q=560+yards+in+kilometers
384,400 km at mean distance,
Earth Diameter : 12,742 km
Moon Diameter : 3,476 km
km in a 1 pound (.45kg) hank of yarn : 0.512064
>>> (384400 - (12742+3476)/2)/ 0.512064
734851.50293713296
734851 1 pound hanks of yarn, which equates to 333322.807 kg of yarn. At 50g/ball (as its sold in shops) thats 6666457 balls of yarn.
6666457 balls. Round up to 7 million clean I reckon, to account for wastage.
That's pretty close to three, anyway.
Hey this is Andy Smith, we met at OSCON in the DIY-IT BoF, a friend (David Reid) pointed me at your blog, I've been meaning to email you but work has been busy... with exactly the problem you're describing in this entry.
As I haven't yet come to a good conclusion for an answer to how long a given software project will take (besides "muchlongerthanyouexpectedyouidealisticfreak") I'll offer a more metaphoric solution:
It won't be done until you've learned everything it has to teach you.
Anyway, once I get around to learning everything this particular project has to teach me, I'll chat you up a bit.
Estimation is hard, even in environments where there's a lot of shared culture and prior art (like the one I work in). It does seem to get easier with time, though, but the heuristics used seem to be not readily quantifiable. The people I know who are good at estimating can't easily explain all the factors that go into an estimate; they're just minmaxing too many things. They just sort of look at something and ballpark it - these individuals just seem to have a smaller ballpark than other people.
I assume you're familiar with Fred Brooks and all that stuff, so I can't really offer anything more than hope that it will get better.
In an email, my father suggested I should read this book. I believe I will.
Post a Comment