Back to Glossary
DevOps
Site Reliability Engineering
Quick Definition
An engineering discipline that applies software engineering practices to infrastructure and operations problems to create scalable and reliable systems.
Detailed Explanation
SRE was pioneered by Google and bridges the gap between development and operations. SREs use software engineering to automate operational tasks, manage infrastructure, and ensure system reliability.
Key SRE concepts include Service Level Objectives (SLOs), error budgets (the acceptable amount of unreliability), toil reduction (eliminating manual, repetitive work), and postmortems (blameless analysis of incidents).
SRE is complementary to DevOps — both aim to improve reliability and delivery speed, but SRE provides more prescriptive practices around reliability engineering.
Related Terms
Relevant Frameworks
DevOpsSRE