Standard : Percentage of Incidents Linked to Known Architectural Risks

Description

% of Incidents Linked to Known Architectural Risks measures the proportion of production incidents that originate from components or patterns previously identified as architectural weaknesses or high-risk areas. This metric provides visibility into whether the system’s known weaknesses are being addressed — or simply tolerated over time.

High percentages may indicate underinvestment in known problem areas. Conversely, a reduction in this number suggests that architectural investments are targeting and resolving systemic weaknesses.

How to Use

What to Measure

Total number of incidents in a given period.
Number of those incidents linked to components or patterns on your architectural risk register.

Formula

% Linked to Architectural Risk = (Incidents linked to known risks / Total incidents) x 100

Instrumentation Tips

Maintain an up-to-date architecture risk register with clear mappings to systems/components.
During postmortems or triage, classify incidents as related or unrelated to known risks.
Use tagging in your incident management system (e.g. "known-risk") to support automation.

Why It Matters

Informs architectural investment: Helps prioritize high-value refactoring work.
Improves resilience: Targeted fixes reduce recurring failures.
Drives accountability: Ensures known risks are not ignored indefinitely.
Supports learning culture: Encourages teams to reflect and improve based on incident trends.

Best Practices

Regularly review and update your architectural risk register.
Establish clear ownership of high-risk areas.
Include “known risk?” as a question in incident postmortems.
Use dashboards to visualize incident clustering by component or system boundary.
Align risk remediation efforts with OKRs or capacity planning.

Common Pitfalls

Risk register is outdated or too vague to be useful.
No clear link between incident root causes and architecture components.
Underreporting due to lack of consistent incident analysis.
Over-attribution to known risks without validating root cause.

Signals of Success

Decreasing % of incidents linked to known risks over time.
Architectural work is visibly informed by incident trends.
Fewer repeat incidents in high-risk components.
Engineering teams proactively discuss risk patterns during planning and retrospectives.

[[Recurring Incident Rate]]
[[Postmortem Completion Rate]]
[[Mean Time to Detect (MTTD)]]
[[Technical Debt Remediation Throughput]]
[[Architecture Refactoring Frequency]]