• Home
  • BVSSH
  • Engineering Enablement
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : Operational readiness is tested before every major release

Purpose and Strategic Importance

This standard ensures operational readiness is tested before every major release-covering monitoring, alerting, rollback plans, and support handovers. It builds confidence that systems can perform reliably under real conditions.

Aligned to our "Resilience Over Uptime" policy, this standard reduces post-release surprises and enables safer, faster delivery. Without it, releases carry hidden risks that impact users, teams, and operational stability.

Strategic Impact

  • Improved consistency and quality across teams
  • Reduced operational friction and delivery risks
  • Stronger ownership and autonomy in technical decision-making
  • More inclusive and sustainable engineering culture

Risks of Not Having This Standard

  • Slower time-to-value and increased rework
  • Accumulation of inconsistency and process debt
  • Reduced trust in engineering data, systems, or ownership
  • Loss of agility in the face of change or failure

CMMI Maturity Model

  • Level 1 – Initial: Operational readiness is unstructured or overlooked. Testing is ad hoc, with limited checks before release. Issues are often discovered in production.

  • Level 2 – Managed: Some teams conduct readiness checks (e.g., monitoring, alerting), but scope and rigour vary. Handover and rollback plans are inconsistent.

  • Level 3 – Defined: Readiness criteria are standardised and documented. All major releases include checks for monitoring, alerting, rollbacks, support readiness, and system health.

  • Level 4 – Quantitatively Managed: Readiness assessments are automated and auditable. Compliance is tracked and informs post-release reviews. Gaps are addressed before release is approved.

  • Level 5 – Optimising: Readiness practices continuously evolve through feedback and incident learnings. Teams rehearse operational scenarios and embed readiness as part of their definition of done, driving improved resilience and confidence.


Key Measures

  • Adoption rates and coverage across teams
  • Impact on delivery metrics, quality, or team health
  • Evidence of ownership, governance, or learning loops
Associated Policies
  • Resilience Over Uptime
Associated Practices
  • Compliance-as-Code
  • Dependency Management Policies
  • Event Sourcing
  • Immutable Infrastructure
  • Secure Code Training
  • Operational KPIs for Dev Teams
  • Custom Metrics Instrumentation
  • Health Checks & Readiness Probes
  • Log Correlation for RCA
  • On-Call Rotation Health Checks
  • Runbooks and Playbooks
  • User Session Replay Tools
  • Feedback Loops from Ops to Dev
  • Incident Response Playbooks
  • Deployment Freeze Windows
  • Container Security Scanning
  • Data Encryption-in-Transit & at-Rest
  • Secure API Gateways
  • Threat Intelligence Feeds
  • Threat Modelling Workshops
  • Vulnerability Management Dashboards
  • End-to-End (E2E) Testing
  • Ensemble Testing
  • Load & Performance Testing
  • Shadow Testing in Production
  • Design for Failure
  • Observability-Driven Design
  • Sprint Demos for Stakeholders
Associated Measures
  • Mean Time to Detect (MTTD)
  • Error Budget Consumption
  • Service Availability (Uptime)

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering