Ragan McGill

Practice : Drift Detection & Correction

Purpose and Strategic Importance

Drift Detection & Correction is the practice of identifying and resolving differences between the declared state of infrastructure (e.g. in version-controlled code) and its actual state in the runtime environment. Left unchecked, drift can lead to inconsistencies, unexpected behaviour, security gaps, and deployment failures.

By detecting and correcting drift automatically, teams ensure infrastructure remains secure, predictable, and compliant - maintaining alignment between what’s intended and what’s actually running.

Description of the Practice

Drift detection tools monitor infrastructure and alert when real-world state deviates from declared configuration.
Correction mechanisms may auto-reconcile or guide teams to manually address issues.
Tools like Terraform, Pulumi, AWS Config, or GitOps agents (e.g. ArgoCD) are used to detect and resolve drift.
Drift insights are logged, visible, and traceable to encourage proactive remediation.
Remediation workflows are governed and reviewed, not automated blindly.

How to Practise It (Playbook)

1. Getting Started

Use your Infrastructure as Code tool (e.g. terraform plan, pulumi preview) to compare live and declared state.
Set up periodic scans or integrate drift detection into deployment pipelines.
Track findings and notify teams via Slack, dashboards, or alerting tools.
Agree on ownership for reviewing and resolving drift.

2. Scaling and Maturing

Automate reconciliation of non-critical drift (e.g. tagging, configuration updates).
Build dashboards showing drift by service, environment, or team.
Include drift review in incident retros and release reviews.
Tag drift root causes: config bypass, pipeline gaps, or misaligned controls.
Where appropriate, implement self-healing via GitOps tools (e.g. ArgoCD sync).

3. Team Behaviours to Encourage

Never patch infrastructure manually - always update through code.
Investigate root causes of drift and strengthen prevention mechanisms.
Share drift patterns across teams to avoid repeated issues.
Treat drift detection as a signal of maturity, not failure.

4. Watch Out For…

Excessive alerts that create noise and are ignored.
Auto-correction without audit trails or human verification.
Teams manually patching resources outside IaC or GitOps flows.
Assuming no drift means no issues - some drift is subtle or silent.

5. Signals of Success

Teams are aware of and respond to infrastructure drift alerts.
Drift is addressed through version control, not ad hoc changes.
Misalignments are rare, visible, and resolved promptly.
Platform and security teams have confidence in declared vs. actual state.
Reconciliation processes improve system stability and trust.