Standard : Failure patterns are used to inform architectural investment

Purpose and Strategic Importance

This standard ensures that patterns from failures are systematically analysed and used to guide architectural decisions. It turns operational pain into long-term improvement, enabling teams to invest in resilience where it matters most.

Aligned to our "Post-Incident Learning Culture" policy, this standard promotes smarter design, reduces repeat failures, and supports continuous learning. Without it, teams risk fixing symptoms instead of causes-slowing progress and weakening system reliability.

Strategic Impact

Improved consistency and quality across teams
Reduced operational friction and delivery risks
Stronger ownership and autonomy in technical decision-making
More inclusive and sustainable engineering culture

Risks of Not Having This Standard

Slower time-to-value and increased rework
Accumulation of inconsistency and process debt
Reduced trust in engineering data, systems, or ownership
Loss of agility in the face of change or failure

CMMI Maturity Model

Level 1 – Initial: Failures are resolved in isolation without identifying patterns or informing architecture. Root causes are often undocumented or revisited multiple times.
Level 2 – Managed: Some teams capture learnings from incidents, but connections to architectural decisions are informal and inconsistent. Improvements are reactive and localised.
Level 3 – Defined: Teams systematically review failure patterns and integrate insights into architectural discussions and planning. Documentation and learning loops are standardised.
Level 4 – Quantitatively Managed: Failure trends are tracked and analysed across systems. Data informs prioritisation of architectural investments, and improvements are monitored for impact.
Level 5 – Optimising: Post-incident learning directly shapes platform and system architecture. Patterns drive long-term resilience strategies, and shared learning prevents recurrence across the organisation.

Key Measures

Adoption rates and coverage across teams
Impact on delivery metrics, quality, or team health
Evidence of ownership, governance, or learning loops