Ragan McGill

Practice : SLOs, SLIs, and SLAs

Purpose and Strategic Importance

SLOs (Service-Level Objectives), SLIs (Service-Level Indicators), and SLAs (Service-Level Agreements) are foundational concepts in Site Reliability Engineering (SRE) that help teams define, measure, and manage service performance and reliability.

They provide a shared language and framework for aligning technical work with business expectations, balancing innovation and stability, and making data-driven decisions about system health and customer trust.

Description of the Practice

SLIs are quantitative measures of service performance (e.g. latency, availability, error rate).
SLOs define the acceptable target for an SLI over a given time window (e.g. 99.9% availability per month).
SLAs are formal agreements - often external - with customers that include SLOs and potential consequences for failure (e.g. penalties, refunds).
Internally, SLOs and SLIs guide engineering decisions; externally, SLAs manage contractual expectations.

How to Practise It (Playbook)

1. Getting Started

Identify critical user journeys and define SLIs that reflect real user experience (e.g. “successful logins within 500ms”).
Set realistic SLO targets based on historical data and business needs.
Publish and visualise SLO performance - ideally as a shared dashboard.
Clarify which SLOs are internal vs. tied to external SLAs.

2. Scaling and Maturing

Create a tiered reliability model - not every service needs the same SLOs.
Revisit and refine SLOs periodically as systems, usage, and priorities evolve.
Pair SLOs with error budgets and define actions for depletion scenarios.
Align SLOs with product planning and prioritisation to guide technical investment.
Use SLI/SLO metrics in post-incident reviews and ops health reports.

3. Team Behaviours to Encourage

Treat SLOs as levers, not limits - use them to guide work, not punish failure.
Design SLIs from the user’s perspective - what would they consider success?
Make SLOs visible and meaningful to both engineering and product teams.
Balance reliability with velocity - perfection isn’t the goal, informed trade-offs are.

4. Watch Out For…

Setting SLOs too aggressively or arbitrarily - it leads to constant failure.
Choosing SLIs that don’t reflect customer experience.
SLAs driving internal priorities too strongly - use internal SLOs for guidance.
Losing trust in the numbers - ensure telemetry and calculations are accurate.

5. Signals of Success

Engineers and product teams speak a shared language around reliability.
SLOs are visible, used in planning, and updated as the system evolves.
Error budgets help balance shipping velocity with system health.
Incidents lead to SLO-informed learning and investment decisions.
SLAs are met consistently, with fewer surprises or escalations.