Invest in Proactive Incident Readiness
This standard mandates proactive investment in incident readiness to prepare teams for real-world failures and improve system resilience.
1. Invest in Proactive Incident Readiness:
Teams should not wait for failures to happen before improving system resilience. This approach ensures preparedness and minimises the impact of incidents.
- 1.1 Simulated Failure Testing:
- 1.1.1 Game Days and Chaos Engineering:
- Run game days, chaos engineering exercises, and disaster recovery tests to prepare for real-world failures.
- Automate the execution of simulated failure tests.
- 1.1.2 Runbook Testing:
- Ensure all teams have documented and tested runbooks for common failure scenarios.
- Automate the validation of runbook effectiveness.
- 1.2 Incident Response Validation:
- 1.2.1 Escalation Path Validation:
- Validate escalation paths and incident response processes through live drills.
- Automate the tracking of escalation path validation.
- 1.2.2 Response Process Drills:
- Conduct regular incident response drills to ensure team readiness.
- Automate the scheduling and execution of drills.
By investing in proactive incident readiness, organisations can improve system resilience and minimise the impact of incidents.