• Home
  • BVSSH
  • Engineering Enablement
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Practice : SLOs, SLIs, and SLAs

Purpose and Strategic Importance

SLOs (Service-Level Objectives), SLIs (Service-Level Indicators), and SLAs (Service-Level Agreements) are foundational concepts in Site Reliability Engineering (SRE) that help teams define, measure, and manage service performance and reliability.

They provide a shared language and framework for aligning technical work with business expectations, balancing innovation and stability, and making data-driven decisions about system health and customer trust.


Description of the Practice

  • SLIs are quantitative measures of service performance (e.g. latency, availability, error rate).
  • SLOs define the acceptable target for an SLI over a given time window (e.g. 99.9% availability per month).
  • SLAs are formal agreements - often external - with customers that include SLOs and potential consequences for failure (e.g. penalties, refunds).
  • Internally, SLOs and SLIs guide engineering decisions; externally, SLAs manage contractual expectations.

How to Practise It (Playbook)

1. Getting Started

  • Identify critical user journeys and define SLIs that reflect real user experience (e.g. “successful logins within 500ms”).
  • Set realistic SLO targets based on historical data and business needs.
  • Publish and visualise SLO performance - ideally as a shared dashboard.
  • Clarify which SLOs are internal vs. tied to external SLAs.

2. Scaling and Maturing

  • Create a tiered reliability model - not every service needs the same SLOs.
  • Revisit and refine SLOs periodically as systems, usage, and priorities evolve.
  • Pair SLOs with error budgets and define actions for depletion scenarios.
  • Align SLOs with product planning and prioritisation to guide technical investment.
  • Use SLI/SLO metrics in post-incident reviews and ops health reports.

3. Team Behaviours to Encourage

  • Treat SLOs as levers, not limits - use them to guide work, not punish failure.
  • Design SLIs from the user’s perspective - what would they consider success?
  • Make SLOs visible and meaningful to both engineering and product teams.
  • Balance reliability with velocity - perfection isn’t the goal, informed trade-offs are.

4. Watch Out For…

  • Setting SLOs too aggressively or arbitrarily - it leads to constant failure.
  • Choosing SLIs that don’t reflect customer experience.
  • SLAs driving internal priorities too strongly - use internal SLOs for guidance.
  • Losing trust in the numbers - ensure telemetry and calculations are accurate.

5. Signals of Success

  • Engineers and product teams speak a shared language around reliability.
  • SLOs are visible, used in planning, and updated as the system evolves.
  • Error budgets help balance shipping velocity with system health.
  • Incidents lead to SLO-informed learning and investment decisions.
  • SLAs are met consistently, with fewer surprises or escalations.
Associated Standards
  • Changes are introduced into production frequently and sustainably (DF)
  • Delivery pace is sustainable and protects team wellbeing
  • Engineering lead time is minimised from start of work to safe deployment (LTFC)
  • Teams track time-in-status across their delivery flow
  • Work in progress reflects current business priorities

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering