Ragan McGill

Practice : Synthetic Monitoring

Purpose and Strategic Importance

Synthetic Monitoring is the practice of simulating user interactions or requests against your system on a regular schedule to detect issues before real users are affected. It provides proactive visibility into performance, availability, and functionality across environments - including production.

By identifying issues early, synthetic monitoring enhances reliability, reduces incident response time, and builds confidence in both new releases and steady-state operations. It’s an essential part of a robust observability strategy.

Description of the Practice

Predefined scripts or test journeys simulate user interactions or service calls.
Synthetic checks run on a scheduled basis (e.g. every minute) from multiple regions or data centres.
Checks validate availability, latency, transaction correctness, and critical workflows.
Alerts are triggered when thresholds are breached or failures occur.
Tools include Datadog Synthetics, Pingdom, New Relic Synthetics, AWS CloudWatch Synthetics, and custom scripts.

How to Practise It (Playbook)

1. Getting Started

Identify critical user journeys (e.g. login, checkout, API ping) that should be monitored proactively.
Use a synthetic monitoring tool to create scripted checks simulating those paths.
Schedule checks from multiple regions to ensure global performance.
Integrate with alerting tools (e.g. PagerDuty, Slack) to route failures quickly.

2. Scaling and Maturing

Add synthetic checks for multiple personas, browsers, devices, and APIs.
Correlate synthetic results with real-user monitoring (RUM) for holistic visibility.
Use synthetic data in CI/CD pipelines for pre-deployment validation.
Review synthetic failures regularly and refine scripts to match evolving UX or APIs.
Define service-level objectives (SLOs) based on synthetic performance benchmarks.

3. Team Behaviours to Encourage

Treat synthetic failures as serious signals - even if users aren’t impacted yet.
Include synthetic coverage in test planning and release sign-offs.
Collaborate with product and operations to ensure critical paths are represented.
Review synthetic dashboards during on-call, retros, and incident postmortems.

4. Watch Out For…

False positives from brittle scripts that break with minor changes.
Infrequent checks that miss short outages or slowdowns.
Neglecting to update synthetic scripts after product changes.
Monitoring low-value or unimportant flows - focus on customer value.

5. Signals of Success

Teams detect issues proactively before they affect customers.
Synthetic results align with system health and user experience.
Release confidence improves due to pre- and post-deploy checks.
Synthetic coverage is visible, reviewed, and maintained.
Monitoring is used not just for alerting - but for learning and reliability.