Infrastructure-as-Code (IaC) tools are exemplary software solutions that Developers and DevOps teams use to describe common infrastructure components like servers, VPCs, IP addresses or VMs in a configuration language. Once ready to deploy, they use this configuration as a blueprint to provision actual infrastructure services on demand. The benefit they are getting here is better control of the change process, and more efficiency and consistency when deploying changes.
However, the devil is in the details and trying to implement those tools in real life often comes with risks. In this article we explain those risks with using IaC, coming from a developer’s point of view.
Biggest Risks of IaC
Steep learning curve
To start with, using IaC tools without taking the dedicated time to learn some of their quirks and limitations can become not just an inconvenience, but a serious problem.
Many of the most prevalent IaC tools like – Terraform and CloudFormation – are incredibly simple to start. But they become very complicated as your requirements change over time. For example, trying to utilize custom resources in CloudFormation requires some insight into how CloudFormation works under the hood. Failure to delve into the inner details will put your organization at increased risk of configuration drift, pushing the wrong infrastructure components or obtaining a system in an incomplete state.
Another example that requires some deep insight of how Terraform works is with Terraform Imports. The official page mentions in the warning box that:
“If you import the same object multiple times, Terraform may exhibit unwanted behavior“
However, you will need to delve into the details in order to understand the nature of this unwanted behavior. What does it really mean? Does it create configuration drift? Does it break the local tfstate file? How can you prevent importing the same object multiple times? All of those questions need time and dedication to answer. As a developer, you should reserve that time as part of the process of learning the tool while using it for real projects.
Some IaC tools have a distinct set of phases that you need to perform in a specific order. For example, in Terraform we have the following steps:
- terraform plan : Creates an execution plan for what terraform will perform against the infrastructure.
- terraform apply : Applies the plan.
Failure to perform a plan review before using apply can result in destructive changes. This is because a visual inspection beforehand gives you one more chance to review the changes before applying. The caveat here is that, by default, there is no requirement to follow the order. Directly applying the changes, skipping the first step, could lead to unfortunate mistakes.
The repercussions of missing a destructive change can be excruciatingly painful. It is recommended that you invest in establishing automated protections to apply changes to infrastructure. For example, instead of manually reviewing terraform plan outputs (which could be lengthy), you should invest in PR (pull request) automation tools like Atlantis to help in discovering any unintentional destructive changes early.
Terraform and other tools can help in maintaining infrastructure components and associated attributes, as long as they are managed solely by them. For example, Terraform cannot detect any drift of resources if you have applied while using other tools like Chef or Puppet. This could be a major problem for organizations still working on a variety of IaC tools that are unaware of each other or overlap their body of work.
Scanning for drift helps with figuring out those deviations between the IaC tool and the real infrastructure. Because each case is different, you will find there are virtually no public or private services that do that for you accurately. The best option for managing configuration drift for your organization is to make sure there is no overlap between the various IaC tools; any changes to the infrastructure are monitored by one IaC tool; and some sort of reporting and health checking are integrated. External checks can also sometimes help to verify that.
Difficulty protecting sensitive data and exposing ports
Although Terraform and CloudFormation can provision infrastructure components, there are some questions that need answering. For example, how would you know during the process if it leaked sensitive data? How would you know if the S3 bucket it manages in fact contains the security profiles you specified? Are you using a provider that leaks sensitive data to stdout? Did you unexpectedly open a port to the internet?
These are only a subset of questions you need to answer when delegating your Infrastructure management to IaC tools. It’s easier to make a mistake and expose resources to the public web than the opposite, so there are a lot of things at stake here.[In fact, in an analysis done by Checkmarx in August 2021, one of the most common IaC misconfigurations is having HTTP port 80 and ELB ports being left open to the public.]
How Should Developers Detect and Mitigate the Most Common IaC Risks?
As mentioned, most of the common risks can be mitigated by adhering to some common security best practices so it’s not all doom and gloom. Here is what you can start practicing:
- Spend some time with the tools: Spend time to fully understand how each tool works: its quirks, its open issues and its best practices. Participate in meetups and subscribe to events so you can learn from other industry experts. This will give you an advantage when trying to figure out how to perform tasks – simple and advanced – without compromising the security posture.
- Establish common engineering processes and best practices: Practice peer code reviews, CI/CD checks, linting, and verification. This can reduce the number of common accidents and mistakes that happen when you rush things.
- Use purpose-built IaC security tools: Look around for purpose-built tools like Checkmarx KICS (Keeping Infrastructure as Code Secure) that help you establish a secure and efficient infrastructure-as-code pipeline. The benefit of this tool is that you can configure it to match your organization’s policies due to its great extensibility and cloud provider coverage.
How and Where Can Developers and Security Teams Learn More About the Risks?
First and foremost, you can learn more about the risks of IaC by asking local IaC experts or official communities about some of their experiences with IaC. You may find that many of the issues they encounter prepare you for what to expect and what to look out for.
Furthermore, you should subscribe to reputable email lists and blogs from vendors that specialize in those areas, and offer tailored solutions like KICS for tackling most of the problems explored in this article.
Lastly, you should perform your own investigation to assess the pros and cons of each tool based on your own requirements and success criteria. The one-size-fits-all approach isn’t suitable for all lines of business. By performing all the necessary steps, you can hopefully mitigate most of the critical risks of using IaC tools.