If velocity is a lagging metric, what actually affects it in the first place?
Why I created this podcast
Due to his glasses, Michael’s estimate of Jane’s distance from him is 14.3% further away than she actually is–at any given moment. How would you describe his estimate?
Precise, but not accurate.
Wait, what’s the difference between the two again?
Precise means the numbers you are seeing are repeatable. Often down to multiple digits after the decimal point. You can be pretty confident that when you do the experiment again, you’ll get the same result.
But that’s not necessarily true with an accurate result. An accurate result is one that is close to the actual value. For Michael’s scale to be accurate but not precise, it would be producing numbers close to the actual value, but not the same number each time. In estimating knowledge work, accuracy bests precision.
This above 2×2 applies to estimation of work in a knowledge work context. Ideally, everyone involved wants estimates to have both high accuracy and high precision. But at the beginning of a new initiative, it will most likely be low accuracy and low precision. Because at that point you know the least you possibly can. From the example aboce, Michael’s estimate lies in the top left box. However, what we actually prefer to that is the bottom right one…if we can’t have the top right. High accuracy makes it much easier to plan out the upcoming work.
One way of skewing towards accuracy and away from precision is by making it difficult to be precise. Instead of trying to estimate absolute sizes of stories, i.e. 3 days, we can estimate only relative size, i.e. 2 story points.
Relative sizing gives us enough to negotiate business priorities given the size of each story, without tempting fate in terms of blaming: “You said it would only take 3 days, and blah blah blah”. This isn’t healthy, productive, or fun. So why even go there?
These relative sizes still allow for some reasoning; however, direct numerical inferences are deliberately imprecise. For example, you can add up how many story points there will be for a larger epic. You will need to allow for a lot of variance on the individual stories. But on the basis of the “law of large numbers” from statistics, this variance will tend to average itself out. Based on this you can get a feel for how one epic compares to another for example.
For this, the first place to start is the Fibonacci sequence, as a way to express relative size. They’re numbers. The sequence itself is very simple. Each number in the sequence is the sum of the preceding two values. This ensures that it roughly doubles each time a new number is generated, but not precisely (Hah! low precision!).
There are visualizations of the Fibonacci sequences like the one above, where you can see that there is a clear difference between each step. In practice, this type of sizing will help you get beyond a “to do list”, where everything seems to be the same level of effort. It’s not. And that’s the point.
Here is a recommended breakdown in my favorite free tool for estimation: planitpoker.com:
It’s typically good to top out at a maximum size to make sure that stories stay small. Over a certain size it just feels too big, and the estimation discussion stops being useful. At that point, a big story should probably be broken into smaller subtasks, each of which should be estimated using the above sizing numbers.
For teams struggling with letting go of absolute estimates, the T-Shirt approach gives you the best of both worlds. Essentially you have the team estimate using relative T-Shirt sizes. Then you translate the team’s input into story points. This reinforces the relative nature of story points, while still giving you something more concrete to work with for velocity purposes.
Using the planitpoker.com interface, you can set it here:
This option came out of a big debate years ago on the scrumdevelopment yahoo groups mailing list. I admittedly have never tried this myself, but it’s worth considering as an alternative approach to estimation. Or a thought experiment at minimum. Basically, at the time Ron Jeffries (@RonJeffries) was arguing against estimation in general. He posited that:
This way, you significantly reduce the need for an explicit estimation discussion. And you can manage workload and scope based on the number of issues you want to complete. If stories are assumed to be of equivalent size at a worst case scenario, i.e. 2 days/story, then you can derive your delivery date from the number of stories you have. Assume all remaining stories take 2 days, i.e. the worst case scenario, and add the number of work days to today’s date.
In practice, stories are often larger if they have “business value” and therefore can’t be easily broken down. Thus it’s hard to pigeonhole them into such a framework. By business value, I mean anything that would be of value to a customer for non-technical reasons. But you can put such a limit on sub-tasks of stories. And therefore just count the number of sub-tasks.
This approach kind of turns software development into a release checklist of stuff that must be finished. All of the tasks are small. It also means that there are a lot of stories and tasks flying around. This require extra management & coordination work to maintain it. Either by the team or a delivery manager.
And also I suspect it’s hard for senior stakeholders to see the forest for the trees, if you just have a long list of tasks. What will be done by when? Even though you can change the order of what’s done, it’s hard for anyone who isn’t intimately involved with the details to understand what’s happening at a glance.
Finally, I do think in this case you lose out on the design discussions which happen during estimation. You only break down stories or tasks if they are “too big”. Therefore, you won’t ask questions like “what is the best way to approach this?” before starting the work. It could impact the quality of what’s delivered. And also get team members to spend time on work that is thrown away or low priority. In my opinion, this is human nature. We need to be deliberate about priorities; if we’re not, they won’t happen.
So here are a couple of aspects of this approach this Ron himself points out when using this approach to figure out when we’ll be done:
Finally, as an honorable mention, there is the whole #NoEstimates argument, which has been popular in software development circles. Basically, the approach claims that:
There a number of other nuances. Basically for any knowledge work where there are a lot of dependencies (like in software), you might be better off not messing around with estimating at all. Just get on with the work.
Personally, I think estimation sessions are useful mostly for the purpose of having a priori how I’d do this discussions. #NoEstimates throws the baby out with the bathwater.
Also, by their very nature, most tasks on a large project will take different amounts of time (assuming we don’t take the approach from #3 above). And technical people are the best placed to figure that out and share it with everyone. They have a unique and often common perspective, which the rest of the company lacks.
The team doesn’t work in a vacuum. So timing becomes important, if not critical, to getting the full value of the efforts being made.
This concern boils down to one of efficiency. Usually, the asker assumes that the time to complete something this fixed and that the team’s primary goal is squeeze out as much as output as they can in that time frame.
In practice, though, effectiveness trumps efficiency. There is not point in doing something quickly, if it doesn’t need to be done at all. To use the old Steven Covey analogy, is the ladder leaning against the right wall?
And for better or worse, product teams of 2 or more people have exactly this problem. In addition to coordinating work among everyone. New product development is fraught with uncertainty, including technical uncertainty in many cases. So what the team investigates, validates, and builds in what order is critical. While efficiency is a concern, any time spent on making the new product development more efficient is likely to be thrown away if the goal changes. So efficiency won’t matter in that case anyway.
A pretty common trope nowadays is that larger companies have distributed teams, often across many locations and time zones. In the early days of agile, it was quite difficult to estimate under these conditions. A lot of the nuance and genuine discussion required was simply lost in the ether. And until recently, the tooling for distributed work didn’t exist. Well, that’s changed.
At most client sites, there will already be some kind of story tracking system in place such as Jira or Trello. It is possible to add a “story points” field to the template for a story. Typically, this involves speaking with the owner or local administrator of this system, but is relatively straightforward.
We don’t actually need any more functionality in the story tracking system. It’s nice to be able to display the story points where relevant afterwards. Jira, for example, has quite useful reporting based on story point estimates right out of the box. Trello has addons. Other agile management systems presumably have the same.
On a typical planning sessions, the product owner or business analyst (BA) spins up a session of planitpoker.com. One of the nice features of this system is that it doesn’t require you to set up an account or even really authenticate. Once a ‘room’ is set up, the BA shares out the link to everyone estimating. Everyone can sign onto the board without setting up an account.
At that point, typically I am sharing my screen with either the description of the story or any relevant collateral such as a spreadsheet, a miro board, or showing and explaining why the current version is missing a feature. I enter the story tracking id into planitpoker.com after we’ve discussed what it is. And then everyone votes on the number of story points to estimate the complexity of the task. Once this is complete for everyone, we look at the distribution of votes. If it’s wide, we discuss and vote again. We keep doing the same until it’s pretty much the same across the team. If for example, we decide that a particular story is a 5 story point size, we just type in that value into Jira or Trello.
You don’t need any fancy or even paid plugins to do this. It’s enough to be able to share your screen and have a discussion with everyone involved. In terms of estimation, the heavy lifting is done using planitpoker.com. Once a round of planning poker finishes, it can be discarded. And then you type in the estimates in the story tracking system, where you need them later.
When stumbling onto this approach, I think the key learning was that you don’t actually need everything to be done in your main story tracking system. All you really need to do is have a good debate, and then store the outcome of the estimation. And the conversations you have while estimating are the most important art of the process anyway.
In this scenario, we had a large team, a massive scope, and a lot of uncertainty around exactly how the product would work. I felt we just needed to get started on the work in order to reduce the technical uncertainty behind the work. So forcing the whole team to spend hours or days estimating something they don’t understand would have been a waste of everyone’s time. If they started the work, we could have more meaningful estimation discussions later. For context, this was an infrastructure project.
The architect involved had put together a big graph in draw.io for how the entire system would work. It had rapidly become a symbol in the company of something highly complicated that nobody, except for him, actually understood. What didn’t help was that developers in the company didn’t have much experience with cloud technology. And it was relatively new stuff, even for him.
Using this slightly circuitous route, we got to a story point estimate which I felt was accurate. It wasn’t precise,but that was acceptable at this stage. And the development team got started quickly on the work.
Sometimes there is pressure to plan out a big roadmap up front for a new product. The justification that I often here, is that we want to know when “we’re done”. Personally, I don’t feel knowing that is actually useful, especially from a business perspective. What I’d prefer to focus on is getting to first revenue and profitability. Estimating something a few years out in detail–when you barely understand it–is just a waste of time. And likely to be inaccurate.
All that said, it’s useful to know your options for the future, and to also have some idea how much effort adding any given option will take. At a high level, that is probably good enough.
To keep everyone happy, the simplest approach is to use TShirt sizing of Epics. Some things are relatively small and quick. Some are huge. Some are in-between. The development team are the best judges of that, since they will need to do the work.
Clearly the TShirts for epics (groups of stories) will not be equivalent to TShirts used for stories. If you are pressed to translate the epic TShirts into story points, I’d suggest provided a range estimate for each one. For example, an M size epic is 6-10 weeks. Range estimates can still be used for planning, but aren’t explicit commitments to deliver on a specific day. And they’re good enough to identify if you know up front if you’ll be late.
Some estimation is better than nothing at all. It’s useful to just do a quick pass without going into all of the minute detail when brainstorming a roadmap. And T-Shirt epics give you the flexbility to do just that.
Geek out time, analog style. In one office! This experience, as well as similar ones, convinced me that estimating in person works best…if you can swing it. Fly people in if you have to. 🙂
In the early days of when I was a developer, my team and I inherited a codebase in C++ of 17 different components. Most of them were written using Borland C++, a compiler that was once cutting edge but had since largely fallen out of mainstream use. In addition to the compiler, the code used a lot of abstractions specific to using libraries shipped with the compiler. We decided that we need to get into a more up-to-date environment, to take advantage of the significant performance gains in newer compilers and also so that it would be easier to work with (and probably recruit more help for).
With an architect and another developer, we booked a small meeting room for half a day, to plan out the work. First we talked about what needed to be done. As we discussed each story, we wrote them down on index cards. We stuck them on the table, because we wanted to see all of them at once.
Then, to estimate, the work we also went analog. Essentially, we took the business cards of a former colleague who’d left the company, and wrote the Fibonacci sequence on the back of a handful cards. Then each of us took a set of the cards. We started discussing each index card, one by one. We each used one card to indicate how many story points worth of complexity lied behind each task, laying it face down on the table. Then we flipped over the cards. If there was a difference in the numbers, we got into technical discussions to convince each other that the story was simpler or harder than we expected.
After a few hours of this estimation boiler room, we had a lot of questions about one particular component which we didn’t know that well. So we waltzed back to our desks to spend the afternoon looking at the existing code, and to inform our estimates even further. Finally, the next day we reconvened and finished off the complete estimation session for the entire piece of work.
Once we had the aggregate story point estimate, we provided internal company stakeholders a relatively narrow elapsed time range estimate based on how long our story points typically took. This gave them enough information to be able to plan around our efforts, especially the pessimistic estimate. The range estimate was also intentionally not precise, even though we did feel it was accurate at the time.
In practice, the estimate was correct, although the route to get there was kind of roundabout. Most of the tasks we did involved making a breaking change to the code in the new compiler. Then cleaning up the errors and warnings which resulted. The lists of problems were quite long, but then often making one fix suddenly fixed 30 errors in one go. So in short, we couldn’t have really known this up front anyway.
But the estimate was close enough to be useful for planning purposes.
If you’d like to get early access to my upcoming book on improving velocity to get to market faster with your new product, sign up here.
If velocity is a lagging metric, what actually affects it in the first place?