This standard ensures changes are introduced with minimal failures and maximum resilience by measuring and managing Change Failure Rate (CFR)-a core DORA metric. It enables high-velocity delivery without compromising quality, stability, or trust.
Aligned to our "Resilience Over Uptime" and "Secure by Design" policies, this standard drives investment in robust testing, observability, and safe deployment practices. Without it, change introduces risk blindly, erodes confidence, and limits the ability to innovate at pace.
Level 1 – Initial: Change failures are not tracked systematically. Most issues are only identified post-release through user reports or major outages. Root cause analysis is rare or informal.
Level 2 – Managed: Some changes are linked to incidents or rollbacks, but criteria are unclear and processes are inconsistent. Post-incident reviews are ad hoc and learning is not widely shared.
Level 3 – Defined: Change Failure Rate is consistently tracked across systems. Teams agree on what constitutes a failed change and incorporate this into retrospectives and quality reviews.
Level 4 – Quantitatively Managed: CFR is a visible delivery health metric. Teams use CFR data to inform test coverage, deployment safety, and risk mitigation. Proactive techniques (e.g., automated rollbacks, staging validation) are applied.
Level 5 – Optimising: CFR insights drive systemic improvements in testing, observability, and architectural resilience. Failure trends shape platform capabilities, and shared learning reduces risk across the engineering organisation.
Specifically, a failed change is defined as one that introduces:
This definition applies equally across software releases, infrastructure rollouts, data platform updates, and operational configuration changes.