Automation, automation, automation. If Steve Ballmer were doing his developers dance today, instead of in 2001, those are probably the words he would have been shouting. After all, we’re told that automation is the key to building agile, cloud-native, highly-available, high-performing systems (to name just a handful of buzzwords that go hand-in-hand with automation today).
But, the fact is, there are some things that shouldn’t be automated. To prove the point, let’s consider infrastructure automation. Although, in many cases infrastructure automation is key to enabling predictable, efficient workflows, there are situations where it’s not the right approach. I’d like to discuss them here in order to make a bigger point about when and when not to embrace automation within modern software delivery workflows.
What is infrastructure automation?
When I talk about infrastructure automation, I mean the use of scripts or templates to provision, configure, or deploy the environments that host applications.
Infrastructure-as-Code (IaC) is one type of infrastructure automation solution (and the one that gets the most press these days). But, infrastructure automation as a whole is a broader category covering any type of solution that automates the tasks required to prepare infrastructure to host an application. A script that automates operating system installation, or that customizes Kubernetes policies, is also a form of infrastructure automation. So are tools like CloudFormation, Terraform, and Ansible.
Generally speaking, infrastructure automation offers a range of benefits, including:
- Minimizing the amount of manual effort required to set up infrastructure
- The ability to replicate or copy environments easily
- Repeatability and consistency within configurations. Your environments will always be set up the same way, no matter which engineer sets them up
When not to automate your infrastructure
However, this doesn’t mean that infrastructure automation is always the right solution. Here’s a look at several scenarios where it doesn’t make sense:
Your automation tools put critical data at risk
It’s one thing to use scripting or another automation agent to just deploy code. It’s another to use automation when dealing with data.
If you have a critical database that you can’t risk losing, using automation tools to provision the environment where the database lives is risky. It’s a bad thing if a script deletes a database permanently because it wipes a cluster clean as part of a provisioning operation. Or, if the automated script doesn’t know where to place critical data, it could potentially ignore the data and delete it.
Obviously, you can mitigate the risk by configuring your automation tools in such a way that critical data resources are protected from deletion (for example, by setting a “Retain” policy for such resources in CloudFormation or a lifecycle block in Terraform with prevent_destroy set to true). It’s also always a best practice to have critical data resources backed up so they’re protected against accidental deletion and other types of threats.
Still, there’s an argument to be made that when you’re dealing with truly critical data, it’s best to provision environments manually to reduce the risk of accidental deletion.
Configuration infrastructure automation is not worth the effort
Before setting up your automation tools, it’s wise to do a cost-benefit analysis. The effort required to create playbook templates or write automation scripts is significant, and sometimes it’s just not worth it. If you will only be creating a handful of environments, you can spend less time overall if you set them up manually as opposed to configuring automation tools to set them up.
In other words, don’t automate just because you can, make sure the automation serves the broader goal of saving time and effort in the long run.
More infrastructure automation means more code to manage
Likewise, it’s worth thinking about the extent to which infrastructure automation tools contribute to your overall incident management and operational maintenance burden before you set them up. The more templates and scripts you create, the more code you have to manage and keep updated over the long term. You also have to keep it secure and make sure that only the people who should have access to it have it, which is a challenge in its own right.
So, before automating everything just for automation’s sake, step back and assess whether having more DevOps tooling to maintain and secure is worth the time and effort that the tooling will save you. If you’re going to reuse your scripts or templates hundreds of times, then it’s well worth the trouble, but it may not be if you’ll only use them sporadically.
Infrastructure automation makes you less agile (in some ways)
Infrastructure automation makes you more agile in the sense that it makes it easy to spin up new environments quickly. But in another sense, it decreases agility. This is because recreating an environment using automation tools usually requires the existing environment to be scrapped completely and recreated from scratch.
In other words, you can’t usually use automation tools to take an environment that’s already running and modify it or tweak it. In most cases, you’d have to wipe out everything you currently have in place and recreate it from the ground up. That’s the way most infrastructure automation tools are designed to work.
From an agility standpoint, this means it can be harder to react quickly to subtle changes. Maybe you just want to add a few servers to your cluster in response to a spike in demand or you want to modify a few configuration files in a host OS. If you rely on infrastructure automation too heavily, making these changes will likely be a much larger (and more resource-intensive) affair than it would be if you can make them manually.
Your infrastructure isn’t consistent (or you plan to change it)
A final consideration that makes infrastructure automation a poor solution in some cases is the fact that infrastructures are not always consistent. Infrastructure automation works well if you need to spin up environments constantly on infrastructure composed of the same types of servers, clusters, operating systems and so on. Automation is less useful if your deployment environments vary – if you use multiple clouds at the same time, for example, or if you update your infrastructure frequently.
Conclusion: When you should use infrastructure automation
The point of this post is not to underline the limitations of infrastructure automation. Instead, the goal is to highlight reasons why automation doesn’t always make sense, offering a more grounded perspective on automation for DevOps folks.
The overriding lesson is that it’s a mistake to automate simply for automation’s sake. You should ensure the automation you implement serves larger goals of increased efficiency or reliability. In many situations, automation does just that. But, when the effort required to automate outweighs the benefits, you shoot yourself in the foot by pursuing automation at all costs. That’s true of infrastructure automation as well as of any other type of automation in DevOps.