Best Practices for Status Pages

255 VIEWS

Status pages have become the end-users window into your team’s operations. Companies with status pages are doing the right thing for their users, building in some transparency while mitigating frustration and support contact. In order for the benefits of status pages to pay off, organizations need to treat them as something more than active wiki-pages run by support.

Status page 101

The status page basics are simple: You have a publicly accessible page that lists the state of your application services and regions, usually with the colors green, yellow and red. The primary purpose of the page is to let users know when there are issues, or if there’s something wrong on their end. Many added benefits of a status page are as follows:

1) Mitigate support tickets

If you have a status page, users can visit this page whenever they suspect an issue or detect an incident. If they validate there’s an issue with one of the application services on the page then they’re less likely to submit a support ticket for impacted functionality. This avoids the flood of tickets that often comes with outages.

2) Used in social communication

The first sign of trouble for application issues usually surfaces on social media. Your social team can leverage a status page as part of responses to people who report issues.

3)Demonstrate SLAs

Historical status page data can be used to validate a company’s SLAs and build trust with your product’s customers.

4) Troubleshooting

While status pages aren’t a complete troubleshooting tool when your support team gets a whiff of something going wrong, a status page could potentially act as an initial indicator. This can help support teams anticipate what they expect to receive from customers. A little bit of forewarning can help technical support engineers plan their outage communication and response protocol.

5) Build confidence with your end-users

Status pages are basically a requirement now. By not having one, especially for such a technical audience, you’re not following the status quo of the world. Technical users know things break and appreciate transparency from companies about what broke, when, why, and what’s being done to prevent it in the future.

The biggest problem with status pages is that they still favor technical users and aren’t obvious or easy to get to. Typically, DevOps practitioners are the only ones willing to hunt down status pages. This means, if the organization doesn’t encourage status page adoption, which many people see as a risk and not a benefit, then they’re not likely to be utilized or demonstrate value. For many companies, status pages are just for vanity.

Making the status page a hero

The benefits are easy enough to understand. But, putting them into practice can be surprisingly hard. If you’re the type of organization that uses status pages as vanity so that you can say you have one, then nothing more can happen until that mentality goes away. Organizations should be more afraid of a lack of transparency than too much visibility, and will see the bottom-line benefit when implementing properly-instrumented status pages.

Status pages need to be an automated part of your services

In the early days of status pages, the pages were updated manually. This made for low utility because, when something breaks, the last thing you can expect a support team to do is manually update a page. Status page service status needs to be automated, meaning the dev and support teams need to agree on what shows up, when it’s updated and where the triggers are implemented.

Needs to have periodic written communication on the page

A status page needs to have more than the status of the service. When there’s degradation or a full-fledged incident, there should be accompanying text to explain what’s being done to drive incident resolution and crucial customer-pertinent information about the issue.

The breakdown of services need to be relevant to the users, not you

Vendors are often guilty of creating documentation and status pages that break down services based on how they interact with them. Which, for most organizations, is based on how the team is organized. However, to the user, this categorization can be confusing and meaningless. The services should be broken down from the user’s perspective, usually based on the individual components where they consume functionality.

Historical data is quite beneficial

I won’t go so far as to say historical status page data is a must. But, having historical context, not just current status, helps calm the nerves of people when something’s wrong. Over the long-term, assuming end-users see more green than yellow or red, it gives the sense that incidents are not the norm. Otherwise, people could take a single service outage as a sign the entire application is flawed.

Include post-incident data

In addition to historical data, post-incident annotations and updates are great for transparency. And, more detailed supplemental blog posts or articles for critical (P1) outages are hugely beneficial for visibility and building customer trust.

Needs to be part of your incident response strategy

When status pages started becoming common, they were generally associated with a support activity. But, your status page is really a part of your incident response activity. When there are service issues or degradation, automation will alert those who are on-call and update the status page. When incidents are resolved, the status page will automatically update itself. And then, during post-incident reviews, status pages are an artifact for historical context.

VictorOps status page screenshot

The cost of transparency

A good status page strategy will create positive impacts, albeit costing time and money. But, the cost is negligible compared to not doing anything.

1) When service status is green, no one will pay attention. When a service’s status is yellow or red, everyone will – including your competition. And, while they might try to use that asset against you, the response is easy, just highlight your commitment to the customer.

2) Some will use the status page as a tool to complain. But, let’s be clear, if they don’t use your status page as a complaint tool, they will use something else far worse (e.g. Reddit, Twitter, etc.) The complaints that happen on social media, without a status page, are often more subjective, and that subjectivity can potentially build long-term negative sentiment.

3) Status page transparency can possibly expose internal challenges with the dev and ops teams. For example, a status page can paint a picture of some systemic problems internally, where one service regularly has issues over time but others do not. Or, customers can see that issues clearly correlate with releases. This could be a sign that something needs to be fixed with the team’s development and release process. Customer visibility has nothing to do with the severity of these long-term faults in your development team. Perhaps, being public will push the issue to address something that probably already should have been.

Status pages can be a checklist item for most organizations. Those who leverage them to focus on the power of incident management and automation see huge gains in terms of customer satisfaction, transparency, and technical support cost-reduction.


Chris Riley is a technologist and DevOps advocate who has spent 12 years helping organizations transition from traditional development practices to a modern set of culture, processes and tooling.


Discussion

Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

%d bloggers like this: