Practice : SLOs, SLIs, and SLAs
Purpose and Strategic Importance
SLOs (Service-Level Objectives), SLIs (Service-Level Indicators), and SLAs (Service-Level Agreements) are foundational concepts in Site Reliability Engineering (SRE) that help teams define, measure, and manage service performance and reliability.
They provide a shared language and framework for aligning technical work with business expectations, balancing innovation and stability, and making data-driven decisions about system health and customer trust.
Description of the Practice
- SLIs are quantitative measures of service performance (e.g. latency, availability, error rate).
- SLOs define the acceptable target for an SLI over a given time window (e.g. 99.9% availability per month).
- SLAs are formal agreements - often external - with customers that include SLOs and potential consequences for failure (e.g. penalties, refunds).
- Internally, SLOs and SLIs guide engineering decisions; externally, SLAs manage contractual expectations.
How to Practise It (Playbook)
1. Getting Started
- Identify critical user journeys and define SLIs that reflect real user experience (e.g. “successful logins within 500ms”).
- Set realistic SLO targets based on historical data and business needs.
- Publish and visualise SLO performance - ideally as a shared dashboard.
- Clarify which SLOs are internal vs. tied to external SLAs.
2. Scaling and Maturing
- Create a tiered reliability model - not every service needs the same SLOs.
- Revisit and refine SLOs periodically as systems, usage, and priorities evolve.
- Pair SLOs with error budgets and define actions for depletion scenarios.
- Align SLOs with product planning and prioritisation to guide technical investment.
- Use SLI/SLO metrics in post-incident reviews and ops health reports.
3. Team Behaviours to Encourage
- Treat SLOs as levers, not limits - use them to guide work, not punish failure.
- Design SLIs from the user’s perspective - what would they consider success?
- Make SLOs visible and meaningful to both engineering and product teams.
- Balance reliability with velocity - perfection isn’t the goal, informed trade-offs are.
4. Watch Out For…
- Setting SLOs too aggressively or arbitrarily - it leads to constant failure.
- Choosing SLIs that don’t reflect customer experience.
- SLAs driving internal priorities too strongly - use internal SLOs for guidance.
- Losing trust in the numbers - ensure telemetry and calculations are accurate.
5. Signals of Success
- Engineers and product teams speak a shared language around reliability.
- SLOs are visible, used in planning, and updated as the system evolves.
- Error budgets help balance shipping velocity with system health.
- Incidents lead to SLO-informed learning and investment decisions.
- SLAs are met consistently, with fewer surprises or escalations.