• Home
  • BVSSH
  • Engineering Enablement
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : Failure patterns are used to inform architectural investment

Purpose and Strategic Importance

This standard ensures that patterns from failures are systematically analysed and used to guide architectural decisions. It turns operational pain into long-term improvement, enabling teams to invest in resilience where it matters most.

Aligned to our "Post-Incident Learning Culture" policy, this standard promotes smarter design, reduces repeat failures, and supports continuous learning. Without it, teams risk fixing symptoms instead of causes-slowing progress and weakening system reliability.

Strategic Impact

  • Improved consistency and quality across teams
  • Reduced operational friction and delivery risks
  • Stronger ownership and autonomy in technical decision-making
  • More inclusive and sustainable engineering culture

Risks of Not Having This Standard

  • Slower time-to-value and increased rework
  • Accumulation of inconsistency and process debt
  • Reduced trust in engineering data, systems, or ownership
  • Loss of agility in the face of change or failure

CMMI Maturity Model

  • Level 1 – Initial: Failures are resolved in isolation without identifying patterns or informing architecture. Root causes are often undocumented or revisited multiple times.

  • Level 2 – Managed: Some teams capture learnings from incidents, but connections to architectural decisions are informal and inconsistent. Improvements are reactive and localised.

  • Level 3 – Defined: Teams systematically review failure patterns and integrate insights into architectural discussions and planning. Documentation and learning loops are standardised.

  • Level 4 – Quantitatively Managed: Failure trends are tracked and analysed across systems. Data informs prioritisation of architectural investments, and improvements are monitored for impact.

  • Level 5 – Optimising: Post-incident learning directly shapes platform and system architecture. Patterns drive long-term resilience strategies, and shared learning prevents recurrence across the organisation.


Key Measures

  • Adoption rates and coverage across teams
  • Impact on delivery metrics, quality, or team health
  • Evidence of ownership, governance, or learning loops
Associated Policies
  • Post-Incident Learning Culture
Associated Practices
  • Root Cause Analysis (RCA)
  • Blameless Postmortems
  • Behaviour-Driven Development (BDD)
  • Contract Testing
  • End-to-End (E2E) Testing
  • Exploratory Testing
  • Integration Testing
  • Mutation Testing
  • Non-functional Requirement Testing
  • Test-Driven Development (TDD)
  • Visual Regression Testing
Associated Measures
  • Percentage of Incidents Linked to Known Architectural Risks
  • Postmortem Action Completion Rate

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering