A better way to predict project release date!
But aren't Story Points much better at predicting the release date and scope delivered than the method you propose?
Despite all of the other reasons not to use Story Points, I decided to tackle this question specifically. And the results are in! Story points are less accurate when predicting the release date and scope delivered, than just counting the number of stories (or items) delivered per iteration! This seems counter-intuitive because we have less "detail" when we merely count the number of stories delivered. Many asked me:
But if you don't know the size of the work how can you predict when it is going to be done?
In God we trust, all others must bring data
Before speculating, let's look at the data! The case I want to present is: a long project (24 iterations) for which we collected both Story Points and number of items completed per iteration. I had one question with two sub questions in mind:
Which metric (SP's or # of items) was a more accurate predictor for the output of the whole project?
a) When we calculated based on the averages for the first 3 iterations
b) When we calculated based on the averages for the first 5(!) iterations
Why this question is important is that, if we can predict with high accuracy the output of a project based on the first 3-5 sprints, we have a good case to stop doing up-front estimation altogether! After all, investing 3-10 weeks in actual development delivers much more information about the product then spending 2-4 weeks in Requirements/Architecture/Design discussions (not to mention that it bores people out of their minds!)
So what were the results? First of all a disclaimer: this is data from one single project; we do need more data to make a better case for not estimating at all! See below for more on how to contribute data to this project! The results are in, and counting the number of items is a better predictor than Story Points based estimations!
When we try to assess the release date and ammount of scope delivered based on only the first 3 iterations. Using Story Points overesimated the output by 20% (!) in this particular project, while counting the number of stories/items delivered underestimated the otuput by 4% (yes, four percent).
How about if we increase the sample and take into account the first 5 sprints? In this case the Story Points based prediction was more accurate, but it still overestimated the delivered scope by 13%, while counting the number of stories/items underestimated the output by 4% (yes, four percent).
In this project, the answer to the question: "which metric is more accurate when compared to the actual output of the project?" is: Counting the number of stories/items delivered at the end of each iteration is a better predictor for the output of a project than estimating based on Story Points delivered!
Final note, how to contribute data to this study
The case I presented above is based on one single project. We currently have data for more than 20 projects and 14 different teams; but we need more data to investigate the claims I make here and in the previous post.I call upon the community to share the data they have. I have made my contribution by sharing the data I have collected over the last years in a world-accessible spread-sheet that you can see and download here.
Please share the data for your projects in a google-doc or similar world-accessible spreadsheet and leave a comment below with the link to the data. For us to learn more about how to better predict project outcomes we need to be able to look at a large data set. Only then we will be able to either verify or destroy the claim that Story Points are useful for our projects. Thank you all in advance for your contribution! Photo credit: NASA's Marshall Space Flight Center @ flickr
RSS link
23 Comments:
This rather small data set reeks of confirmation bias. There is something to be said for the method, but its almost irresponsible to sell it based on this meager and unbiased data set. Don't claim to be all about using data and then violate all good principles of data science.
By Derek Neighbors, at July 26, 2012 8:07 PM
Thanks for the call for data! I'd ask if you chose the iteration based on story points or # of stories as it seems a confounding factor
By bonniea, at July 27, 2012 11:08 AM
Nice post, small sample size notwithstanding.
I'd like to understand what you mean by "output" here? Can you help me with that?
- Bob
By zx12bob, at July 27, 2012 12:11 PM
I have to agree with Derek here -- the level of confidence is pretty low given the amount of data: 1 project, 24 iterations. I have data from a larger/longer project that shows story points are _quite_ accurate. I have other data that shows # of stories to be a poor predictor, if the spread of story points is large. I can't share the data (yet?), but it'd be nice to collect more data from more projects.
By Ted M. Young, at July 28, 2012 12:44 AM
@Derek
I take issue with your language. Do you have data to publish?
In the data-set linked in the post there are 21 projects, this is not "only one project".
Regarding the issue of "accuracy", it is true that it is only one project -- and I mention this in the post itself (which you ignored, making the same mistake you accuse me of making). However, one data point is enough to disprove the idea that Story POints are *necessary* for accurate prediction of project outcomes!
This is how science works! I found one (yes, one!) project where the idea that Story Points are *necessary* is proved false.
Now I expect the community to share more project data so that we can verify if my hypothesis (# of items is at least as accurate, likely more accurate) can be tested!
Please, share the data you have and engage in constructive examination of the data. :)
By Unknown, at July 28, 2012 10:19 PM
@bonniea The project that I used was selected because I had data for both SP's and # of stories, and was long enough for the idea of using 3-5 first iterations as predicting data sample to be of use.
If the project was only 6-7 iterations, the predictability of the first 5 iterations would not be so relevant :)
Additionally the 24 iterations are *all* the data I have for that project, they were not "selected" for this study.
By Unknown, at July 28, 2012 10:21 PM
@zx12bob
"output" in the context of this experiment refers to either the number of Story Points or the # of stories delivered at the end of the project.
By Unknown, at July 28, 2012 10:22 PM
@Ted It would be very nice to get that data in the public :)
Also, although it is true that *one* project gives us a *very low* confidence that that hypothesis holds for all projects, it already disproves the previously valid theory that stated "to estimate accurately you *need* Story Points". :) Part I of the process is done, now we have an hypothesis that needs to be further tested.
It would be awesome if you could share your data. It is very easy to make anonymous as we only need iteration length (in weeks) plus story points delivered and # of stories completed for each iteration :)
By Unknown, at July 28, 2012 10:25 PM
It doesn't seem counterintuitive to me, because in effect you're tracking throughput. You just aren't using that name for it. Of course that will be more reliable than story points.
By Dave, at July 28, 2012 10:34 PM
I hope that people commenting on this post will come to What The Point of Story Points at Agile 2012 and give their opinions.
By George Dinwiddie, at July 28, 2012 10:35 PM
@Dave can you explain better your statement? I may be missing some details of what you mean, but it sparked my curiosity :)
By Unknown, at July 28, 2012 10:38 PM
@George Definitely, people should attend that session! :)
Link us to the slides once you can share them, I'm interested in your session, but I will not be able to attend Agile2012 :)
By Unknown, at July 28, 2012 10:39 PM
Some people claim that relative estimatuion, as well as the most popular technique to create these relative estimates give bettwer accuracy of predictions.
@vasco, did you try to investigate how teams actually dis create their "story points". Would your examinations be valid for a "state of the art" relative estimation team? Im asking since most teams i meet actually do not compare stories, they do absolute estimations and call the result story points. Still intresting that no teams in your study could make it work, but i was wondering what it is they actually did.
Best
Henrik
By Anonymous, at July 28, 2012 11:36 PM
@Vasco, throughput is "value units delivered per unit time." In your example, value unit is "story" & consistent unit of time is time-boxed iteration. Throughput is most direct measure of delivery. You make empirical observations of throughput and use them to project likely future performance. Clearly more effective than estimation.
By Dave, at July 29, 2012 1:24 AM
@Dave Now I understand what you mean.
The throughput idea is one that I also use in my presentation: Example: if the customer wants "email subscription for my blog", then the story is more relevant than "how many story points a team can deliver".
If we tell the customer "you can get 15 story points in this iteration", that will mean nothing to them. However if we tell them "you will get email subscription for your blog", then we will have something to convey that has "meaning" for the customer. That is much more useful than a "nebulous unit of time" like story points.
By Unknown, at August 01, 2012 8:49 AM
@Vasco One data point is NOT enough to disprove the idea that SPs are necessary. ;-) You may have lucked out and stumbled upon one team which happens to be on the right side of the Gauss curve, say in the 90th percentile. How about the other nine teams?
If you analyze a dozen teams from half a dozen organizations and arrive at the same conclusion, that could be seen as indicative and worthy of further study. That is how science works!
Along those lines, I really appreciate your call for data! Set up a Google Drive folder and invite people. I'll see what I can find in the remotest corners of my hard disk. :)
In my previous work at you-know-where I encountered teams that quite consistently produced N+2 stories per person (where N is the team size). Some stories were estimated large, others small and the velocity would be jumping wildly. Does this indicate that SPs suck and SCs (story counts) rule? No! It indicates that the stories were too large, and that the team didn't know how to split and prepare stories properly.
By Martin von Weissenberg, at August 17, 2012 11:26 AM
Oh and another thing... you must use separate data for building your predictive model and for verifying the model. If you use the same data for both --- e.g. the first five sprints for prediction, and the full X sprints for verification --- the model will be biased towards the target. If the prediction and verification data sets are the same, both SPs and SCs will of course give 100% "accuracy". That's why the five-sprint prediction is closer to 100% than the three-week prediction.
By Martin von Weissenberg, at August 17, 2012 11:45 AM
@Martin On the contrary! One data point is indeed *enough* to disprove the necessity of Story Points.
The question it does not answer is "are the number of items enough to estimate a release date for a project?"
It is because of that question that we need even more data! ;)
By Unknown, at August 17, 2012 9:28 PM
I am an engineer and quite new to Agile / Scrum. Story points is an abstract concept and there is no way it can be measured. It has been said about the story points that are initail and uncertain numbers and theserefore cannot be divided by the task hours. I question myself that with what degree of confidence we can measure the size or duration of a project with something that is quite uncertain.
On the other hand the cone of uncertainty tells us that we are only certain of the results if we have completed something and have made sure that there is no need to roll back. In my opinion the number of stories completed provides a "clear" indicator of the ability of the team to complete a given task and should logically be a much better tool for estimation.
By Naveed, at September 11, 2012 3:48 AM
The project completion is mostly predicted in different ways and hence there is hard and fast role but mainly depends on the reources and management. This way of prediction the project release is somewhat interesting.
By software development services, at January 04, 2013 1:48 PM
I believe that in order to reach a conclusion, it is necessary to compare data presented by different companies in software development. This will help us identify the usefulness of Story Points in predicting the project release date and scope of the project. I agree with you totally when you say that design discussions should be limited and focus should be more on the development stage in the process.
By Stuart, at January 17, 2013 6:40 AM
@Stuart You are correct, and we already have that data. In my previuos post I mention that I have a database of 20+ projects across many companies. http://softwaredevelopmenttoday.blogspot.com/2012/01/story-points-considered-harmful-or-why.html
All data points confirm that Story Points are at best useless.
By Unknown, at January 17, 2013 11:42 AM
Great discussion is going on ..glad to be a part of this
By Unknown, at May 02, 2013 10:14 AM
Post a Comment
<< Home