Domains

HomeGlossarySite Reliability Engineering
Back to Glossary
DevOps

Site Reliability Engineering

Quick Definition

An engineering discipline that applies software engineering practices to infrastructure and operations problems to create scalable and reliable systems.

Detailed Explanation

SRE was pioneered by Google and bridges the gap between development and operations. SREs use software engineering to automate operational tasks, manage infrastructure, and ensure system reliability. Key SRE concepts include Service Level Objectives (SLOs), error budgets (the acceptable amount of unreliability), toil reduction (eliminating manual, repetitive work), and postmortems (blameless analysis of incidents). SRE is complementary to DevOps — both aim to improve reliability and delivery speed, but SRE provides more prescriptive practices around reliability engineering.

Related Terms

Relevant Frameworks

DevOpsSRE

Recommended Courses