By Gareth Powell

I’ve been involved in many discussions in my life about estimating. Almost all of them have related to how to estimate, and in particular with the holy grail of estimation - how to get accurate estimates in short amounts of time. While I’ve enjoyed most of these discussions - my favorite, with agile coach Tobias Meyer, ended in this blog post - this isn’t the purpose of this article. While I’m going to touch on some of the same content (I’m definitely going to talk about estimation units), I’m not as interested in where the numbers come from or how they are produced as I am in how the numbers are used once gathered.

In order to be clear, and so that you’re imagining the same scenario that I’m thinking about, let’s define the estimation process. To me it’s about gathering a collection of stakeholders - clients, managers, developers and architects - together in the same room, discussing in what “it” is that we will be estimating and then providing one or more very concrete value (usually a number, but we could equally use animal names) that represent some metric about “it”.

For full disclosure, since I have an agile background, when I talk about “it”, I mean a “story” (as per Mike Cohn’s “User Stories Applied”). That’s not to exclude any other practice: I think everything I have to say would apply equally to projects using waterfall “requirements”.

What is Estimation?

Estimation is about predicting the future: well, sort of. One of the things about predicting the future is that if you actually know what’s going to happen, that knowledge is almost useless. If it’s going to happen, you can’t do anything to change it. What you want to do is gain a sense of the future, so that you can know, as accurately as possible, what the context of the future is. With that knowledge, you can make the best possible decisions about what you want to do in that context.

With software in particular, estimation is about laying down different possible paths into the future. Given a (seemingly infinite) set of features, how many should you try and do, to what level of detail, in what order?

I’ve often said that the only way to get truly accurate estimates of what something will cost or involve is to actually do it, and measure the resources that were used in making it happen. Of course, by the time you’ve done that, the estimate has lost all its value. Anything which comes close to this level of accuracy will suffer this problem in smaller amounts: the cost of obtaining the estimate becomes a significant portion of the total cost. It is this waste (yes, I honestly believe time spent solely to produce estimates should be accounted as waste, but that’s probably a different article) which leads Agile methods to insist on a much less predictive, and much more iterative approach to software development.

Estimation, per se, is not about design. Although I usually find that I need to have a clear idea of both the what and how of what I intend to implement before I can give a worthwhile estimate, the process of estimation is not specifically about design. But it is important that when estimating happens both sides have a shared vision about what it is, and that is the output of design. Because I firmly believe in Agile and Iterative practices, I generally see estimation as an intrinsic portion of a design cycle.

What is it?

For me, the most contentious issue around estimation is the notion that we are translating something very imprecise (a feature or story) into something precise. While this has been dealt with many times, I want to just quickly reprise this issue because it really is very important.

Defining “it”

A typical user story starts out life like this: “I want a monthly report about the number of widgets we produce each month”.
Hacker-hat engages:

“SELECT month, sum(widgets)
FROM production
GROUP BY month”.

1 minute.
Oh, yeah, and we need to display it. 10 minutes.

We should test it. Oh yeah, check-in, deployment, release notes.

Uh, maybe I should say an hour. No, be safe: “two hours”.

Two hours later (or whenever the two hours gets scheduled into the project), the client enters user testing and says “you can’t be serious”.

Me (in defensive mode): “what?”

“Well, clearly, I need filtering on which months I want to see, and I want it broken down by widget type, and I want a graph that shows clearly to management whether production has gone up or down, and then it needs colors, which I want to be able to specify, and …”

Four weeks later … “Well, I suppose that will do, but you said two hours; had I known it would take this long …”. I never get to say “But all you said you wanted was a report”

As you can probably tell (see sidebar), I have never (well, not yet) mastered the art of being precise about what it is that a client wants without actually going and doing it. At the moment, my strategy for defining “it” looks like this:

  • Repeat over and over to the client the XP mantra that for now we will build “the simplest thing that could possibly work”, and all that other stuff can come later.
  • Be never-endingly explicit about as many things as I can think of that specifically are not included. “So, this story does not contain …”
  • Draw pictures (or get the user to draw pictures) of every single visual aspect that they expect. If it’s not there and they can’t point to a picture of it from our pre-estimation discussions, it wasn’t included.
  • Get an explanation (preferably in writing) of which bits of the display should do something, and what they should do.

If you’re thinking “that doesn’t sound like a set of stories to me”, you’re right, it’s not. This is supposed to be something like requirements. Analysis. That sort of thing.

It’s supposed to be a fairly bulletproof definition of “it” from the client’s perspective. It shouldn’t say anything about what we’re going to build or how we’re going to build it; just what the user thinks they want.

Refining “it”

Such a definition is a very rich, very user-centric view of the world. It’s not what we (as developers) can use to produce working software. In particular, it’s not linearized - it’s not a step-by-step, paint-by-numbers listing of actions to take to produce some software. But it is the contract between client and developers.

The next step is to try and turn all of those features from the requirements discussion into stories. This can generally be done by the team in isolation from the client, but questions may arise which involve client involvement “which widgets do we want to count?”, “how do we classify widgets that got scrapped at this point?” which need client interaction. Ultimately, since the stories will be the basis of the estimates, the team needs to make sure that client know what they are and what they mean in business terms.

A clear set of refined stories, with (almost) no outstanding questions from the development team, that, as far as the team is aware meets the requirements, is a pre-requisite for estimation. Having said that, the iterative model of all software development applies here too: the team can iterate the set of stories, the understanding of the stories and estimating the stories.

What does an estimate look like?

So having digressed to point out how very hard it is to define “it” in terms that make sense to both clients and developers, I’d like to get to what I consider the “meat” of this article: what does an estimate of “it” look like?

I think there are many different dimensions of estimate that need to be considered. Over the years, teams I have worked on and with have made up words and units that describe these different dimensions, all of which are present in every estimate produced by every team, but usually collapsed into a single estimate dimension “for ease”. This leads (in my experience) to a lot of confusion.

I’m going to look at each one in turn.

Value (measured in dollars)

An important, but often unasked, estimate in software development is from the team to the client: “why would you bother” or “what’s it worth to you”. Clients, who usually expect to be sitting asking developers for estimates, are often very uncomfortable under the question “what is the value of this story or feature”. Because stories are often so small, it is generally easier to get clients to estimate the overall value of a package of stories - a feature. Contrariwise, in estimating the cost of software, the smaller units are easier to manipulate.

Whatever the client is comfortable in estimating, it is a very good alternative definition of “it” to be clear on what specific value the client expects to get from which specific features or sets of features.

Complexity (measured in story points)

The most common estimation made by development teams is “how big and difficult will this be to do”. Generally this estimate is relative, and as such the unit “story points” has evolved to express one estimate solely in relationship to others, ignoring any implementation realities which may slow the process.

As this unit is fairly well documented (and the relative merits of relative and absolute fairly well fought over), I’m not going to pursue it further.

Difficulty (measured in ninjas)

Difficulty differs from complexity in that it is not measuring the inherent complexity of what is required by the story, but rather the perceived difficulty of making the new functionality fit with the existing codebase.

There are several different factors that contribute to this difficulty: most noticeable when estimating are changes in design assumptions (for example, the existing code assumes an interaction metaphor, but this story requires a batch metaphor). Most noticeable when coding are issues with existing code quality, which tend to lead to delays and frustration in implementing the feature as old code needs to be cleaned up.

The unit “ninja” measures the relative number of times during feature implementation that somebody might say or think “I just got hit by a ninja hidden in the code”. By its very nature, these problems are not strictly estimable, but over time, experienced development teams know which areas of the software will cause problems.

Improvement Opportunities (measured in deltas)

Because estimation is irrevocably tied to design and the chosen approach, most teams doing story estimates will decide their approach during or before estimation, and thus what portions of the code will need to be changed. During this process, certain potential ninjas will be spotted and identified, and specific accommodation may be made for them.

Specifically, most teams doing story design produce a set of tasks, most of which relate to new functionality, but some of which represent refactorings of existing functionality known to be hiding places for ninjas.

In estimating the story, the team can set aside a certain amount of time and effort to address specific code quality issues. This time and effort is different in quality to the complexity of story points and is therefore measured in a different kind of unit. Like story points, deltas are relative measures of complexity, but measuring the complexity of burning forests down around the hidden ninjas and building open-plan cities.

Resource usage (measured in quaranges)

Quaranges are most definitely not man-hours. For a start, there is no such thing as a man-hour. Man-hours assume an inherent fungibility of development effort which is simply unrealistic. Two developers are not the same. Man-hours also ignore the effects of pairing. And the use of the term immediately makes managers think of the number 40 multiplied by the number of team members.

A quarange is a made up unit intended to measure approximately absolute resource usage. If you feel more comfortable, think of quaranges as man-hours.

Quaranges are allocated on a per-iteration basis. I have found one-week iterations to be the best way to develop software. For a start, all the iteration events occur on a regular schedule - we do the same things the same days of every week. It’s simple. If you use a different length, that’s fine, but do consider changing to week-long iterations. If you’re not using iterations at all, that’s a different matter: start using them now!

Each story is given a quarange estimate. This is a consensus estimate of how much effort will be required to complete the story based on the skills and experience needed for the story and available on the team. It is a combination of all of the different cost factors (complexity, difficulty and potential improvements).

Each resource on the team adds to a pot of quaranges. The number they put in the pot takes into account all their personal factors (experience, skill level, familiarity with the project) as well as expected time off or other commitments (such as meetings). It also takes into account their immediate experience (i.e. the number of quaranges they succeeded in working through last week).

Unlike the input estimates, the resource usage estimate is very closely related to time, and is an intermediate estimate between the high-level observational estimates and the fixed, measurable estimates of time and schedule. In particular, the resource usage estimate is measured by reference to whether, at the end of the iteration, the planned number of quaranges to be delivered fitted within the number of available quaranges.

Given story estimates in quaranges, and an estimate of available quaranges, the team can allocate a number of stories to the iteration with a quarange count approximately equal to the total pot of quaranges.

Of course, because this is estimation not science, as each story is added to the iteration, each team member needs to look and check that:

  • There are not too many stories which require their personal involvement;
  • There are sufficient stories on which they can work;
  • That the overall commitment feels credible;
  • That there is a theme to the iteration which keeps the team cohesive;
  • That there is the right mix of stories, balancing factors like clarity of definition, urgency and importance;
  • That the stories included make sense and are valuable;
  • That the stories are the highest priority ones.

If any of these conditions is violated, the team member must raise this issue and either defer stories, or figure out some other way of meeting the conditions.

At the end of the iteration, it is clear how accurate the overall resource usage estimate is. The sum of all the story estimates should (approximately) equal the sum of all the quaranges provided into the “pot”. During the retrospective the team should individually and collectively review their performance and determine the “pot size” for subsequent iterations. Remember, that since quaranges are not man-hours, it is possible for one individual in the team to increase their intended contribution, while another decreases theirs. The effects of this may not be easily measurable, although it is usually possible for the team to measure the quarange cost of each story, and approximate the number of quaranges they individually delivered for use in estimating subsequent iterations.

Cost (measured in dollars)

Often a client is interested in only two things: “when will I get it” and “what’s it going to cost me”. That being the case, it’s important to provide dollar and time estimates for stories. With approximate resource usage and resource availability estimates in place, this is usually easy - the weekly burn rate of the team is generally constant, so divide that by the number of quaranges available in an iteration and multiply by the resource estimate for the story.

Scheduling and delivery date (measured in calendar time)

Scheduling is much more difficult, however. Largely because most scheduling questions are more forward-looking than a single iteration, and involve a sum of estimate error terms. In general, schedule estimates are only valid for the most immediate stories, with schedules over the lifetime of a release emerging slowly as a limit function.

Two factors in particular cause variability in release scheduling: one is quality, and the amount of time that is used later during a release addressing quality problems from earlier in the release; and the other is the number of changes that are introduced into a release through explicit addition of stories.

Interestingly, Agile methods attempt to limit the first through a commitment to total quality, TDD and merciless refactoring, while encouraging the second as a means to achieving the desired result. However, the difference between the two - and the value of the commitment to total quality with its attendant costs - needs to be continually communicated to clients who (in my experience) generally remember every hard number you ever gave them and forget the history of subtleties.

ROI (measured as a percentage)

The most important factor for any client should be return on investment, which is the increase in revenue (see “value” above) which derives from the investment made in the software over the scheduled timeframe.

Experience tells me that nobody I know calculates this, which I find very sad. Very few clients I have ever worked with had a realistic understanding of either how expensive software is to build, or what their expectation of value creation should be. On top of this, the delivery schedule adds a high degree of variability. The formulae I have seen for ROI - based on Net Present Values - have all been both complex and fragile in the face of change.

One thing I know about ROI is that until you have some product, you have no revenue stream. And as I have advised every client I’ve ever met: build the smallest thing you can possibly envisage and get that working and building a revenue stream, then add the bells and whistles. Moreover, build the bells and whistles that your current clients and future prospects say they want.

You don’t really do that, do you?

Whenever I read papers that talk about this kind of stuff I always have the same reaction: I don’t have time to do that - do you?

Projecting somewhat, I’m assuming you’re thinking the same thing right now about what you’ve just read. And the answer is, in general: no.

For me, the important estimate is the resource consumption one. If I know how much resource is going to get sucked up doing this story, I can easily calculate approximate schedules and costs. It all drops out from the one number.

The problem is that that number is the hardest one to come by.

Instead, I recommend the following approach:

  • Recognize that estimation and design are inextricably linked. Don’t have, organize or schedule estimation meetings - schedule design meetings.
  • Make sure that the relevant people from your team attend. That may be the whole team; it may not be. In principle, the whole team should attend, because the team is a collective. But in practice, it is expensive in two ways: the resource usage in terms of people not doing other work, and the process slows down with more people. I’ve rarely seen design/estimation meetings be effective with more than four developers in the room.
  • The client is an important part of design. Sure, he’ll be bored by the 80% of the meeting that talks about code details. If you can move that somewhere else, do so. But his input to the other 20% - including understanding which parts are important, and which can be left out - is invaluable.
  • It’s important to record what gets decided. Too often I’ve seen good interaction in design meetings, and good information come out, only to have it left behind in the meeting and come out with just a story card and a couple of numbers. I’ve never tried this, but if you have a shorthand typist on staff, ask them to minute the entire meeting and send you a list of stories, design comments, estimates, notes, everything.

Having said all that, all the numbers above have value, especially if they can be validated “in reality”. A very valuable metric is the money spent dealing with “difficulty” and it’s comparison to “improvement”. The ability to put a dollar value on a “ninja”, and in particular, to use that metric over a period of time to demonstrate the relative value of a “delta” is a very good argument for continuous investment in code quality.

It’s always possible to convert these units to dollars and time

It should be obvious how to convert both value and cost into dollar values. But how can the others be converted?

For quaranges, I gave a formula above. Be aware, though, that this calculation involves a pot size which varies week-upon-week. That being the case, the graph of pot-size against iteration gives good information about the productivity of the team, although adjustments need to be made for changes to the team composition.

For story points and ninjas, it is likewise possible to turn them into resource or dollar values by adding up the points that were done (or are estimated to be done) in an iteration. However, there is a further indirection here, in that these estimates, just reflecting one dimension of the size of the story, cannot be assumed to take equal shares of the entire iteration.

Again, any time or dollar estimate can be approximately converted into the other using the average burn rate of the team.

All the estimates will be wrong

In spite of anything you may have read above, I don’t belong to the school of thought that says that if we just tried a little harder, our estimates could be so much better.

If we tried a lot harder, our estimates could be a lot better. But I don’t think that will ever be worth it, because of the cost of obtaining those estimates.

Eisenhower is famously quoted as saying, “Plans are nothing; planning is everything”. The same is true of estimating. While the estimates themselves will invariably be wrong, they have a value which is itself inestimable: they add clarity and discipline to a notoriously unclear and undisciplined process.

In particular, design/estimation meetings promote:

  • Communication between stakeholders at a high level;
  • System design at a level intelligible to the client;
  • An increase of trust and visibility in the process of software development;
  • Allowing effective resource planning and expectation setting.

Finally, if you know that all estimates will be wrong, you can apply a correction to it. When software development becomes a true profession, software developers will be graded by the correction factor that has to be applied to their estimates.

Refining estimates - measurement and feedback

Although it is the case that all estimates will be wrong, it is good practice to understand why they were wrong, and to try and improve.

This activity requires discipline, record-keeping and patience, but is worth the effort.

Each story has an estimate, and thus an expected amount of effort. While this is easiest with resource-usage estimates, it is still possible with more abstract units.

During the iteration, the amount of effort consumed in carrying out the story is noted. If possible, this is divided according to the different kinds of effort noted above.

At the completion of an iteration, all the data is collected and collated, and during the retrospective the actuals-to-estimates are considered. Stories that are within tolerance (say 25%-50% either way) are ignored, and the focus is on stories that were outside tolerance. For each such story, a determination is made as to the reason for the error, and an appropriate corrective action should be taken:

  • For stories which had hidden complexity, that complexity should be identified and other stories with similar issues should be re-estimated;
  • Where stories ran into unexpected difficulties, the nature of the difficulties should be identified, and either appropriate remedial action should be taken to remove the difficulties, or other stories with similar issues should be re-estimated;
  • Where stories led to unplanned refactorings, the reasons should be evaluated, and guidelines agreed as to whether such unplanned refactorings should be done in the future. If so, affected stories should be re-estimated.

The overall iteration should also be reviewed for cross-story issues:

  • If the amount of committed resource turned out not to be available, the reasons should be investigated and, if necessary, subsequent iterations should calculate differently. In cases where, for example, team members were sick, the error should be dismissed as irrelevant and the original number reduced.
  • Where new stories were added to the iteration, the reasons for this should be investigated closely. If it was a reprioritization, then the overall schedule should not be unduly affected. If, however, new functionality was added, this should be specifically called out, and the schedule be updated by either removing an equivalent amount of functionality or moving the delivery date with a note of added functionality.

The overall release schedule should then be updated to take into account:

  • Changed story estimates;
  • Added or deleted stories;
  • The updated estimated resource allocation per iteration.

Teams should keep charts of all these data on an iteration basis to assist them in understanding how their estimates, and the intended release complete date, change over time.

Conclusion

In my experience, nobody ever likes estimating. It looks too much like making a prediction about what will happen in the future to which we will later be held. This being the case, we expect to be judged for our failures.

If this is the case, that is a failure of management. There needs to be an open, honest approach for estimation to be worthwhile, and that includes the explicit right to change an estimate whenever new data becomes available.

Estimates are valuable for one thing only: to enable managers and clients to plan according to when they will receive software; how much it will cost; and how much it will be worth.

To this end, software developers should always work to give the most clear, most accurate, most complete information they can to other stakeholders within the constraint that it should not be wasted effort, but an almost natural offshoot of key development activity.

Clients and managers should attempt to hold a fluid view of the world, attempting to understand the problems faced by developers in producing estimates, and understanding that new information will always emerge during a development cycle that renders any estimate invalid.



RSS