For more than a decade, two identical concepts have coexisted in the information technology world — DevOps and Site Reliability Engineering (SRE). At first glance, they may seem like rivals. But a closer look reveals that the supposed competitors are supplementary pieces of the picture that complement each other perfectly.
This article defines DevOps and SRE, describing how they make it easy to build robust software, where they overlap, how they differ, and when they can work together effectively.
What is SRE?
The first question we are going to answer is “What does SRE mean?” SRE stands for Site Reliability Engineering. In a few words, SRE teams solve operational, scaling, and reliability issues.
This approach arose in the early 2000s at Google to keep a large, complex system running for over 100 billion queries a day. According to Ben Traynor Sloss, Google’s VP of Engineering, who invented the term SRE, “SRE is what happens when you ask a software engineer to design an operations team”.
SRE’s focus is on system stability, which is measured as the most fundamental characteristic of any product. The pyramid below picturizes the elements that affect reliability, from the simplest (monitoring) to the most complicated (reliable product launch).
As soon as the system becomes “sufficiently reliable”, SRE changes efforts to adding extra features or building new products. In addition, there is a strong focus on results tracking, measurable productivity improvements, and operational task automation.
Let’s overview the fundamental principles of SRE:
- Creating CI/CD DevOps processes to automate infrastructure scaling;
- Operational load limit: SRE is only 50% of the routine work. At least 50% should be devoted to improving the system, not to fighting fires;
- The development team must be responsible for at least 5% of the Ops workload. If the load increases due to the developers’ fault, they deal with all unnecessary tasks;
- Creating a Service Level Agreement (SLA), Service Level Objectives (SLO), and Service Level Indicators (SLI) for your services and measuring system performance against them;
- Setting up an error budget to control the rate the changes are introduced into production based on quality;
- Implementing in-depth monitoring to see latency, congestion, traffic, and errors;
- Writing response scripts to address issues based on clear symptom-based alerts. Arranging automated runbooks that comprise each scenario and testing them regularly to keep the team’s skills at a high level;
- Performing flawless autopsies and correcting any errors found;
- Having a common recruitment pool for SRE and engineering teams. Let SRE grow into developers.
What is DevOps?
DevOps is a set of cultural principles, approaches, and tools that improve the ability of companies to build applications and services at high speed. With DevOps, product development and optimization are faster than traditional software and infrastructure management processes.
Patrick Debois, a Belgian IT consultant, and Agile practitioner invented the term in 2009. Its main principles are similar to those of SRE: applying engineering methods to operational assignments, measurements, and using automation instead of tedious tasks. But its focus is much wider.
Five Key Pillars of DevOps:
- No more silos. The idea is based on the fact that the absence of cooperation and information flow between teams decreases performance;
- Failure is okay. DevOps dictate learning from mistakes, and not wasting resources on an unreachable purpose — preclusion all collapses;
- The change must be progressive. Changes are most efficient and less risky when they are tiny and regular. This point in combination with automated testing of small packages of code and rollback of failed ones is at the heart of the concepts of ongoing integration and ongoing delivery (CI/CD);
- The more automation, the better. DevOps concentrates on automation to provide upgrades faster and save hours of monotonous work.
- Metrics are crucial. Every change should be measured to see if it delivers the desired results.
How is SRE Related to DevOps?
SRE and DevOps are not competitors since SRE provides a hands-on approach to solving most DevOps problems.
Now, let’s take a look at the way teams use SRE to implement DevOps principles and philosophy:
Using Instruments and Automation
Both DevOps and SRE use automation to make better processes and service delivery. While DevOps encourages the adoption of automation tools, SRE guarantees that every team member can access updated automation instruments and technologies. SRE allows teams to share the same tools and services through scalable application programming interfaces (APIs).
Implement Incremental Changes
DevOps involves slow, incremental changes to ensure continuous improvement. SRE supports this by letting teams perform small, frequent updates that decrease the changes’ influence on application availability and stability.
Reducing Organizational Silos
The DevOps task ensures that different software development departments/teams work harmoniously on a common goal. SRE achieves this by sharing ownership of projects between teams. With SRE, every team uses the same instruments, methods, and codebase to support uniformity and smooth cooperation.
Summing up, both can be viewed as leveraged resources. There is no 1:1 ratio in DevOps vs. software engineers or site reliability engineers. O’Reilly’s Building Safe and Reliable Systems compared to the first edition of the Google SRE book discusses the command structure that poisons SREs as consultants/experts.
Creating solutions at scale requires specialized engineers to help solve issues and expand opportunities. DevOps engineers, SREs, and other experts such as application security engineers fall into specialized consultants. Google, in its SRE book, describes all the experience in several areas required to deploy and support a product like Gmail.
SRE Tools vs. DevOps: Same Solutions Fit Both
Matthew Flaming, the vice president of software engineering at New Relic Application Software Monitoring, depicts software reliability engineering as “the clearest embodiment of DevOps principles in one role.” That being said, the SRE and DevOps toolbox can be largely the same and typically includes the following items:
- Containers and microservices make it easy to build a scalable system. Thus, Docker containerized applications for building and deploying, and Kubernetes is an essential piece of the SRE/DevOps toolkits for container orchestration;
- CI/CD tools like Jenkins or CircleCI support the idea of incremental change, allowing teams to create, test, and deploy code faster;
- Infrastructure as code (IaC) is the same as “automate everything”. AWS, Terraform, CloudFormation, and Puppet are some of the most popular tools for automating infrastructure deployment and configuration;
- Automated functional and non-functional testing in production can be done using Selenium, Zephyr, Nexus Lifecycle, Veracode, and other instruments;
- Stability testing is necessary to ensure the system’s ability to withstand real-world conditions. Popular tools for this task are Netflix’s Chaos Monkey, ChaosIQ, and Gremlin;
- Monitoring systems play a critical role in SRE and DevOps environments. Services provided by Prometheus, DataDog, Broadcom, PRGT Network Monitor, and many other systems enable continuous, metric-based monitoring of network and app performance in cloud environments.
DevOps vs. SRE: Multi-Tasking Team vs. Cross-Functional Team
In recent years, the roles of SRE and DevOps have become critical in lots of organizations. But that doesn’t mean everyone agrees on exactly what the SRE and DevOps teams are doing. Similarly, there is no one-size-fits-all job description for DevOps and Site Reliability Engineer. Below we will highlight the most important points of the DevOps and SRE functions.
The SRE Role and the SRE Team
A usual SRE team consists of either software developers with operational experience or IT operations professionals with software development skills. At Google, these teams are commonly a balanced mixture of those who are more software savvy and those who are more systems savvy. Other companies are building SRE teams by adding a set of software development skills and approaches to existing operations and personnel.
In addition to operations and software development, areas of expertise relevant to the SRE role include monitoring systems, factory automation, and systems architecture.
So, what does a site reliability engineer do? All SRE team members share duties for code deployment, system support, automation, and change management. And the goal of each site can be changed by a reliability engineer over time depending on the existing focus of the team — creating new functions or increasing system reliability.
DevOps Engineer role and DevOps team
Unlike the SRE team, where each expert is a do-all, the DevOps team consists of various experts with particular duties.
The team structure varies from organization to organization and commonly includes (but is not bound to) the following experts:
- A product owner who understands how exactly the service should work to benefit customers;
- A team leader delegating tasks to other members;
- A cloud architect who creates a cloud infrastructure for the smooth operation of services in a production environment;
- A programmer for coding and testing;
- A tester who implements quality methods for product development and delivery;
- A release manager who plans and coordinates releases;
- A system administrator who is responsible for cloud monitoring.
For sure, this is not a comprehensive list of DevOps roles. Commonly, such a cross-functional team invites an SRE to assure the availability of services. Usually, when site reliability engineers work as part of a DevOps team, they have a more restricted scope of duties than in fully dedicated SRE teams.
Difference Between Job Roles of SRE and DevOps
The differences between SRE and DevOps in terms of work roles are best explained by the day-to-day tasks of the people working in those roles.
SRE job description usually includes these responsibilities:
- Writing code and managing configurations for automation;
- Monitoring software infrastructure, tracking and resolving tickets for fixing bugs;
- Planning for software deployment with a fixed infrastructure using CI/CD;
- Ensuring the binaries and configurations are applicable for integration and deployment in different environments.
On the other hand, DevOps engineers typically focus on:
- Simplifying the development and deployment of software for the development team as much as possible;
- Using tools such as Jenkins, Kubernetes, and Docker to automate build, test, and deploy software according to CI/CD priorities;
- Setting up, maintaining, and documenting infrastructure components;
- Developing workflows to enable CI/CD for projects;
- Setting up and maintaining various virtual environments (VMs, Containers);
- Implementing and supporting cluster environments.
SRE vs. DevOps
Both approaches enable minimal separation between development and operations teams. But we can summarize the main distinction as follows: DevOps concentrates more on cultural and philosophical shifts, while SRE is more pragmatic and practical.
In the table below, we highlighted the key distinctions in the way the concepts operate:
|SRE was created with a concrete purpose: to build a set of methods and metrics to improve cooperation and service delivery.
|DevOps is a set of philosophies that enable cultural thinking and collaboration across disparate teams.
|SRE includes prescriptive ways to achieve reliability.
|DevOps works like a template that guides collaboration to bridge the gap between development and operations.
|SRE’s responsibility is primarily focused on improving the availability and reliability of the system.
|DevOps focuses on the rapidity of development and delivery while ensuring continuity.
|The team consists of site reliability engineers who have both operational and development experience.
|The DevOps team includes testers, programmers, engineers, SREs, and more.
What’s the Difference Between DevOps and SRE?
When do Businesses Need DevOps and SRE?
Despite all the misunderstandings, one thing is the truth: site reliability engineers and DevOps are not contradictory factions, but rather two relatives working towards the same purpose and with the same instruments, but with slightly distinctive focuses.
While the SRE culture values reliability over the speed of change, DevOps instead emphasizes scalability throughout the product development cycle. Although, both approaches try to find a balance between the two poles and can complement each other in terms of methods, practices, and solutions.
Depending on their size and goals, businesses can implement various DevOps scenarios, SREs, or even combinations of both.
Modeled after Google, SRE teams are suitable for large technology corporations such as Adobe, Twitter, or Amazon that handle billions of requests daily and prioritize the availability of their services.
DevOps culture and cross-functional teams benefit any business operating in a highly competitive environment where even a bit shorter time-to-market provides a huge advantage. Besides, the DevOps team can be reinforced with a site reliability engineer to control system performance and guarantee system stability.
Some companies have two teams, SRE and DevOps. The first one is answerable for supporting and maintaining the present service, while the latter builds and delivers new apps.
Smaller businesses usually look for someone to operate cloud infrastructure and automate tasks, using various titles for the same duties — DevOps Engineer, DevOps Manager, SRE, Cloud Computing Engineer, or even CI/CD Engineer.
Whatever the size of your business is, someone in your company is probably already doing SRE work, facilitating collaboration between developers and IT, or writing scripts to automate tedious tasks. If you find these people and formally identify their work, they can form the foundation of a performative SRE or DevOps team, whatever you prefer.
Which is Better DevOps or SRE for Your Career?
According to PayScale, a DevOps annual salary is around $95,000, while the average salary for an SRE is $116,000. There must be a reason why SRE gets more. Usually, they are responsible for a variety of practical activities each day. They must monitor, code, attend calls, resolve tickets, ensure systems are available, and plan. Thus, they are usually paid more than DevOps engineers.
With the fundamentals of SRE and DevOps in mind, engineers can make informed decisions about where to build their careers, which skill sets to improve, and which industry advances to learn and follow. This helps them not only make intellectual decisions better but also create a better career structure.
So, is there a difference between DevOps and SRE? Google, the “founder” of the SRE name, has clearly defined this, along with a straightforward set of expectations. DevOps seems to be more like a “free spirit” whose definition and perspective vary from organization to organization.
However, DevOps and SRE teams aren’t all that different. Both help brings development and operator teams together while sharing similar responsibilities with a focus on automation and reliability.