How Feature Flags Support Incident Response and Management


· ·

Every year I get on some technical kick. These fascinations usually end up being some sort of design pattern or process. In 2020, I’m really into feature flags – a big fan. Feature flags are a relatively basic implementation of something we understand well, that user functionality is comprised of various technical components. Because, in modern applications those components have various lifecycles and developers involved in creating them, creating a highly useful management structure for the feature.

However, the market has been a bit duped into thinking feature flags are a newfangled thing to support testing in production, and canary releases. It’s not, it goes far beyond that, and one of those areas is in how those who are on-call support the application.

I look at feature flags like logic tags that spread across the entire delivery chain – from planning where features are identified and described all the way to production where they’re either on or off. Feature flags are primarily sold from the perspective of the on/off switch but the continuity is critically important for everyone.

When it comes to supporting the application, this continuity is a huge tool to those on-call – IT Ops, DevOps, SRE, etc. in the following ways:

Turning it off but not on again

Greater control is the key value of feature flag design patterns and their associated management tools and processes. If a newly released feature or service is the impetus for an incident then you can turn it off. And, this turn off ability doesn’t need to involve developers at all. Now, these on-call engineers can do it themselves. Feature flags help solve a very specific type of issue, one that usually surfaces quickly after a release.

Feature flags aren’t even necessarily only about features – the act of turning it off is a mental flag for the entire team to address the new feature during work hours. A good incident response tool will help you keep track of which features are causing issues. Let the hunt begins as to what exactly in that feature caused the issue.

A quick assessment of blast radius

You likely won’t run into many of the issues listed above. If you’re on-call and get an alert, it’s likely the incident isn’t directly related to a full feature release or toggle. So, incident resolution might not be as convenient as turning the feature off. But, knowing the feature tied to the incident, in the context of the alert, should be within quick reach. By knowing not only the impacted infrastructure but also the feature, the team can quickly assess the blast radius and how this impacts users.

It’s not always clear how technical issues are manifesting themselves from the user’s perspective. Having feature flags can help you detect these problems. And, this assessment also helps determine severity and where to route the issue. Often, it’s easier to tie features to developers and teams rather than tie them back to specific services.

Peeling back dependency layers

The continuity and visibility into impacted services can help during triage in order to better understand feature-based incidents across all application layers involved. This assumes that on-call personnel have a good understanding of how the application works. But, in this case, this is a good assumption to have. When engineers do understand their services, they can use the feature to understand what backend processes and services, as well as what front-end applications are involved to help better assess root cause.

Post-incident review

Finally, feature flags can be used alongside post-incident reviews to get a better understanding of how features relate to issues. There could easily be a trend of one feature being tied to the vast majority of issues. Or, perhaps infrastructure planning is lagging relative to new feature releases. If a feature flag is correlated with numerous incidents, this can be a key indicator of instability and frequent issues due to that feature. And, you can gains insights as to faster application velocity impacts issues in production and drives more efficient teams.

All in all, I love feature flags! Development teams are so in the weeds of their specific technical implementation, they seldom think about how it relates to user functionality. Implementing feature flags as a standard practice forces the entire development organization to think about the end-user. One of the key, often forgotten, benefits of feature flags is helping people who are on-call do their jobs better, faster, and with greater understanding.

is a bad-coder-turned-technology-advocate who understands the challenges and needs of modern engineers, as well as how technology fits into the business goals of companies in a demanding high-tech world. Chris speaks and engages with end-users regularly in the areas of modern AppDev, Site Reliability Engineering, DevOps, and Developer Relations. He was one of the original founders of the developer marketing agency Fixate IO, and currently works as a Sr. Manager in HubSpot’s Developer Relations team.


Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Skip to toolbar