story points as a tool for estimation in software development

Vincent Bernier

9 min lecture 09 February, 2026

Story Points: Not Just a Story of Planning

Software Development

Story points are a widely used estimation method in Agile development, meant to measure the relative effort required to complete a task. Instead of estimating work in absolute hours, teams assess complexity, uncertainty, and scope. This approach is intended to improve forecasting and workload distribution.

A story point is a unit of measure that represents the perceived effort required to complete a user story or task. Unlike time-based estimates, story points account for factors such as complexity, dependencies, and risk, enabling teams to compare tasks on a relative basis rather than in exact hours or days.

In today’s industry, story points remain a fundamental yet controversial tool. While they help teams gauge effort without committing to strict time-based estimates, many organizations struggle to implement them. Misuse of story points can lead to excessive debate, artificial precision, and even reduced productivity. As Agile adoption grows across industries, it is increasingly important to assess whether story points genuinely add value or simply create additional bureaucracy.

According to the Scrum Guide, story points help teams gauge effort relative to one another rather than through absolute estimation. This allows teams to compare tasks against one another rather than trying to predict exact completion times. To make this concrete, let’s walk through a painfully familiar use case. This example is based on a ‘true’ story.

Bureaucrasoft™: A Case Study in Process Overload

Bureaucrasoft™ is a leading enterprise software solutions provider specializing in workflow optimization, compliance management… and ensuring no decision is made in under six weeks.

The flagship product, ProcessPal™, is an end-to-end Bureaucracy-as-a-Service (BaaS) platform designed to help organizations "optimize" (read: slow down) their operations with mandatory checklists, automated permission gates, and cross-functional alignment reviews.

The Estimation Ritual: Story Points in Action

At Bureaucrasoft™, story point estimation is not just a meeting—it’s a ritual. Every Monday morning, the development team gathers in the Grand Conference Room, armed with coffee and resignation. The Product Owner, meticulously following Bureaucrasoft’s 173-page Estimation Best Practices Guide, kicks off the session.

"Alright, let's start with the first item: Refactoring the navigation bar in ProcessPal™. Thoughts?"

The lead frontend developer sighs. "The current navigation is a tangled mess of legacy code. If we touch one piece, the whole thing might break. This is major work."

"But we don’t have time for a full rewrite," counters the Product Owner. "We can just make some small improvements. Maybe an eight?"

Another half-hour later, after reviewing past estimations, debating risk factors, and consulting Bureaucrasoft’s Compliance Team, they settle on eight story points.

The final item: Implementing a custom theme system. The room goes quiet. Everyone knows this is an ambitious task—users have been clamouring for customizable themes for months. It involves UI, backend APIs, and performance optimizations. The lead engineer sighs, "This is at least a thirteen."

"But what if we simplify the requirements?" asks the Scrum Master. "Maybe an eight?"

The debate continues. Finally, thirteen story points are assigned, and the meeting ends—three hours later. Exhausted, the team disperses, knowing that next week, they’ll be arguing about why their velocity doesn’t match last sprint’s.

As in our example, teams sometimes spend more time estimating than completing the actual task, leading to inefficiencies. Some Agile experts have discussed this issue, noting that excessive estimation often outweighs its benefits.

The Debate Around Story Points

Story points remain one of the most debated concepts in Agile development. Some believe they improve team collaboration and estimation accuracy, while others argue they introduce unnecessary complexity and can be misused by management. To illustrate both perspectives, we’ll look at two influential figures in Agile: Mike Cohn, a proponent of story points, and Ron Jeffries, a skeptic.

The Case for Story Points (Mike Cohn)

Mike Cohn, co-founder of the Scrum Alliance, argues that story points help teams focus on effort and complexity rather than deadlines, fostering better discussions and understanding.

“Story points are a unit of measure for expressing an estimate of the overall effort that will be required to fully implement a product backlog item.” – Mike Cohn

The Case Against Story Points (Ron Jeffries)

Ron Jeffries, co-creator of Extreme Programming (XP), regrets introducing story points, arguing they create artificial precision and are misused by management.

“I like to say that I may have invented story points, and if I did, I’m sorry now.” – Ron Jeffries

Other critics, such as Allen Holub and Joshua Kerievsky, argue that story points introduce unnecessary complexity, while proponents, such as Andrea Fryrear and Dave West, see value in visualization and educational benefits.

Other Voices in the Debate

Beyond Cohn and Jeffries, other Agile thought leaders have weighed in on the issue. Allen Holub strongly criticizes story points, arguing that they encourage unnecessary process overhead. Joshua Kerievsky, CEO of Industrial Logic, suggests that story points introduce unnecessary complexity that can be avoided by focusing on continuous delivery. On the other hand, Andrea Fryrear, an Agile marketing expert, emphasizes that story points can help teams visualize their workload more effectively. Dave West, CEO of Scrum.org, also acknowledges the educational value of story points, stating that they encourage teams to think critically about task complexity.

The Reality: Pros and Cons

Benefits

Encourages Team Collaboration: According to GBH Tech, it facilitates discussions among team members to reach consensus on task complexity, leading to a shared understanding.
Facilitates Relative Estimation: Scrum.org states that it enables teams to assess tasks based on complexity and effort relative to other tasks, rather than absolute time.
Decouples Estimation from Time*: It helps teams focus on the effort and complexity of tasks without the pressure of time-based estimations, which can vary among individuals.

*Story points are frequently misunderstood as a substitute for hours or days. Teams should reinforce their relative nature to prevent this common misuse.

Drawbacks

Inconsistency Across Teams: Stack Exchange states that story points are subjective and can vary across teams, making it difficult to compare performance or progress across the organization.
Potential for Misuse: According to Uplevel Team, organizations may misuse story points as a measure of individual performance, leading to unhealthy competition and stress among team members.
False Sense of Predictability: According to Blue Yonder, relying on story points for long-term planning can create an illusion of predictability, which may not account for unforeseen complexities or changes.

Story points, like any Agile tool, are only as effective as their implementation. When used properly, they can guide development without micromanagement. When misused, they can become a bureaucratic nightmare that slows teams down. The key is understanding when and how to use them effectively.

Backlog Refinement and Why It Matters

Backlog refinement is the ongoing process of reviewing, updating, and prioritizing items in the product backlog to ensure they are clear, well-defined, and ready for implementation. It helps teams manage technical debt, uncover dependencies, and refine acceptance criteria, allowing for smoother development and better sprint execution.

Why “Backlog Refinement” Instead of “Grooming”?

The industry has largely retired the term “backlog grooming” due to its uncomfortable connotations. “backlog refinement” is now preferred: it emphasizes continuously clarifying and improving work items rather than merely “tidying” them up.

What Backlog Refinement Really Delivers

Sharper shared understanding: Product owners, developers, testers, and designers examine each item together, revealing hidden dependencies and surfacing missing acceptance criteria before work begins.
Risk reduction early: Ambiguities and blockers are exposed long before sprint planning, preventing last‑minute surprises that can derail commitments.
Right‑sizing of effort: Teams break large or vague items into smaller, testable slices and re‑estimate them, keeping the backlog actionable and ready for the next sprint.
Just‑in‑time prioritization: Refinement provides a regular checkpoint to confirm that items remain aligned with product goals and stakeholder needs, so effort is always focused on the highest‑value work.
Healthier sprint planning: A well‑refined backlog lets sprint planning focus on capacity and commitment rather than emergency clarification, shortening the meeting and improving forecast accuracy.
Good refinement feels lightweight. Most mature teams time‑box the session to 30-45 minutes per week, or 5-10 percent of total sprint capacity. If the meeting routinely overruns, it’s a signal that backlog items are too vague when they arrive or that facilitation needs to be tightened.

Practical Tips

Prepare asynchronously. The product owner adds initial context (description, value, acceptance criteria) in advance; team members leave questions or spike results directly on the ticket.
Use Definition‑of‑Ready as a checklist. Don’t move an item to “Ready for Sprint” until everyone agrees it meets agreed fields (value statement, test notes, small enough to finish in a sprint, etc.).
Rotate facilitation. Have different team members lead the session to maintain energy and share ownership of backlog quality.

How to Foster Better Refinement Discussions

Keep backlog refinement focused on complexity, not just numbers.
Encourage comparative exercises to help break down large items.
If a task seems too uncertain, reframe it as an exploration rather than forcing an estimate.
Ensure refinement leads to action: clarified requirements, smaller tasks, or clear technical plans.

A well-run refinement session doesn’t just produce estimates; it makes work more predictable and achievable. When teams understand the scope of their work, planning becomes smoother, and execution improves.

Estimation vs. Sizing

Story points are often treated as estimates, but they are more accurately a relative sizing tool. Instead of predicting completion time, teams should focus on categorizing work by complexity.

Estimates attempt to predict calendar time, making them inherently subjective.
Sizing classifies items by relative effort or uncertainty, making workload distribution easier.
Beware of the “sum‑it‑up” trap: Because story points are numeric, teams sometimes total them at the end of a sprint and judge success by hitting (or missing) a point target—e.g., “we only delivered 48 points; that’s not enough.” This converts sizing back into a proxy for velocity and reintroduces schedule pressure.
Alternative scales help break the habit. T‑shirt sizes (S, M, L, XL) or Fibonacci labels without explicit numbers discourage adding and keep the conversation anchored in relative complexity rather than quotas.

Large organizations may still need forecasts for long‑range planning, but shifting day‑to‑day team conversations from estimation to sizing promotes healthier, evidence‑based delivery.

The 2020 Scrum Guide underscores this shift, emphasizing sizing and dropping references to time‑based estimation.

Alternative Metrics to Story Points

Instead of story points, teams can adopt alternative metrics:

T-Shirt Sizing (S, M, L, XL): A simpler, category-based sizing method.
Bucket System: Categorizing work into discrete size-based groupings.
Risk-Based Categorization: Assigning effort based on uncertainty and potential blockers.
Historical Data-Based Estimation: Using past trends to predict effort more accurately.

t shirt sizing for software development estimation

“Recent empirical work covering 19 k+ Jira issues found that one in ten backlog items required a **58–100 % uplift in story points after sprint planning, directly undermining the very forecasts the points were meant to support.”

- Story points changes in agile iterative development

Throughput: Measuring Work Done

Throughput measures the number of tasks completed over a given period. It provides insight into team delivery rates but is not an efficiency metric since it doesn’t account for task complexity or external factors. Instead, throughput should be used alongside other metrics to gain a complete view of team productivity.

Tools like Axify help track throughput and workflow efficiency. throughput graph in axify

DORA Metrics: A Better Efficiency Measure

For a more comprehensive view of efficiency, teams can rely on DORA Metrics (DevOps Research and Assessment). These metrics focus on software delivery performance and reliability:

Deployment Frequency: How often a team successfully releases code.
Lead Time for Changes: The time from a commit to production deployment.
Change Failure Rate: The percentage of deployments that lead to failures.
Mean Time to Recovery (MTTR): How long it takes to restore service after an incident.

How to Act on the Numbers

High Lead Time? That’s often a sign that stories are oversized or poorly sliced. Re‑refining and slicing work into smaller, testable increments (see our guide on SPIDR story slicing) reduces complexity, shortens code review and deployment time, and therefore drives Lead Time down. The same practice also stabilizes story‑point sizing: smaller items are consistently easier to size.

DORA metrics provide teams with empirical ways to evaluate efficiency, enabling continuous improvement. Tools like Axify provide these metrics out of the box.

Axify DORA metrics dashboard

“Public velocity datasets (e.g., Spring XD, Mesos) show a much stronger correlation between simple item‑throughput and delivery time than between summed story points and delivery time.”

- Agile Scrum Sprint Velocity DataSet

story points vs average cycle time per sprint graph

throughput vs average cycle time per sprint graph

Nathen Harvey, who leads the DORA team at Google Cloud, emphasizes the importance of the DORA metrics in understanding and improving software delivery performance:

This underscores the significance of these metrics in evaluating and enhancing the efficiency of development processes.

Compromising in Rigid Organizations

Some companies resist change and remain embedded in estimation-driven processes. However, the rigidity itself is often the real issue. Potential compromises include:

At Bureaucrasoft™, any change to the estimation process requires a multi-step approval process. A senior developer, frustrated with the endless estimation debates, devises a plan.

Instead of proposing radical change outright, he starts with small experiments. For minor bug fixes and quick UI tweaks, his team skips story points altogether and just completes the work. For larger features, they replace detailed point-based estimation with T-Shirt Sizing: small, medium, large, and extra-large. The process is streamlined, but they keep the numbers vague enough to avoid immediate scrutiny from management.

The experiment works. Tasks move faster, and backlog refinement meetings are shorter. However, the real challenge is leadership, which still clings to velocity tracking.

To win them over, the team quietly tracks throughput—the number of completed tasks per sprint—and compares it with previous story-point-based estimations. The results are undeniable: work is getting done faster, and cycle times are improving. They present the findings in a well-structured report, showing that productivity has increased by 15% since reducing estimation overhead.

After weeks of data-backed persuasion, leadership agrees to a hybrid approach: high-impact features still get story points, but smaller tasks move through the pipeline without debate. Gradually, more teams adopt this approach, and Bureaucrasoft™ starts shifting toward a leaner, outcome-driven workflow.

The lesson? Change in rigid organizations doesn’t happen overnight. By demonstrating efficiency through small wins, teams can nudge their companies toward better, more effective ways of working.

Conclusion: Moving Beyond Story Points

Story points may work for some teams, but they are not a perfect solution. The key takeaway is that planning and discussions matter more than arbitrary estimations. Teams should focus on:

Sizing over estimation for better workload distribution.
Throughput and DORA metrics to track actual performance.
Reducing time spent estimating in favour of more efficient planning techniques.

The best approach depends on the team’s context and constraints, but a shift toward empirical, data-driven planning will improve delivery outcomes in the long run.

Additional Critiques

Joshua Kerievsky’s Critique: CEO of Industrial Logic, argues that story points can introduce unnecessary complexity:

“Story points and velocity calculations are unnatural techniques that unnecessarily confuse teams at the beginning of their agile journey.”

Alternative Metrics Discussion: In a discussion on alternative metrics, a practitioner suggests focusing on predictability scales estimated by those completing the tasks, rather than traditional story point planning:

“Modern project management (agile included) places too much emphasis on homogeneity of data and trying to be as objective as possible. My idea is to move towards a more subjective approach where predictability scale is estimated by those who will be completing the task instead of planning it with a group.”

Brightball’s Analysis: An article from Brightball critiques the reliability of story points and suggests that they may not effectively measure team performance:

“Story points are completely unreliable, confusing and require constantly reminding everyone involved what they are and are not.”

“Story Points are Pointless, Measure Queues” by Brightball: This article argues that story points can lead to confusion, unreliable timelines, and demotivation within teams. It suggests that measuring queues and focusing on workflow efficiency may be more effective.

“Story Point Estimation Doesn’t Work. Here’s Why.” by Uplevel: Uplevel discusses the variability and biases inherent in story point estimation, noting that different teams often assign points inconsistently. The article highlights the inability of story points to capture unplanned work and suggests that they may not be effective for enterprise time allocation.

“Why Story Points Don’t Work” by Alejandro Wainzinger: This piece critiques the accuracy of estimating complexity using story points, arguing that such estimations are inherently flawed and should not be a primary goal.

“Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Replication Study”: This academic study examines the effectiveness of deep learning models for agile effort estimation and suggests that current methods, including story points, may not provide accurate estimates.

Continue reading

Software Development