It’s easy to talk about the importance of real-time communication when responding to an incident. It’s harder to implement real-time communication practices that work reliably and efficiently. Many of the strategies that seem like great ways to communicate in theory fall short once you attempt to use them in real-world circumstances.
To illustrate the point, here’s a look at five real-time communication mistakes that incident response teams commonly make when attempting to respond to incidents.
1. Relying on Email for Incident Management
As mentioned in a recent blog on how the incident response stack has evolved, incident response teams may still use email to communicate with their teams in real time. Typically there’s a delay of only a few seconds between when an email is sent and when it arrives, especially if everyone is using the same email server or service (which they probably are if they are part of your incident response team). Time-wise, then, email is close enough to real time.
But email poses other problems that make it less than ideal for communicating about an incident quickly. For one, a team member may miss a critical email about an active incident in an inbox that is constantly receiving other messages. For another, it can be hard to sort through long email threads if you need to find information that someone sent in a separate email a few minutes ago. In addition, there is no way to determine who is actively following the communication thread and who is offline with most email tools.
And perhaps the biggest reason not to rely on email is that, even if it takes only a few seconds for an email to appear in a recipient’s inbox, the delay is long enough to create situations where someone asks a question in one email that is answered in another email that was already sent. And unlike instant-messaging tools, there is no way to see when someone is typing a new email to you so that you can wait for their response before continuing the conversation.
In short, even if email allows you to exchange information relatively quickly, email conversations about an ongoing incident can quickly become overwhelming or convoluted, hampering your team’s ability to communicate effectively.
2. Not Having a Dedicated Incident Communication Channel
Having no predefined communication channel and leaving it up to your team to decide how to communicate on an ad hoc basis instead— is even worse than relying on an ineffective communication solution like email.
This approach is a recipe for scenarios wherein your team tries to use multiple communication methods at once, with no one being sure where to look for the latest information, or even with team members being left out of the conversation entirely. You end up with a tangled web of instant messages, emails, text messages, phone calls, and face-to-face meetings, making it very difficult to respond effectively to the incident at hand.
3. Relying on Human-Only Communications
IT incidents involve machines. You can’t resolve them effectively if you aren’t communicating with the machines to get all of the information you need as quickly as you need it.
Toward this end, your real-time communication strategy should allow you to easily incorporate data from machines into the conversation. If your monitoring tools detect new information about the state of a failed system, for example, that information should be automatically shared with everyone who is responding to the incident. If you rely on humans to detect and share this information manually, you run the risk that no one will notice it in time, or that only some people will see it, which leads to a situation where not everyone has the same visibility and it becomes harder to collaborate in unison.
4. Lack of Defined Roles and Responsibilities
Defining who will respond to a given incident is one thing; ensuring that they are actually responding and actively following the conversation as the team handles the issue is another.
That’s why you should strive to make sure that your communication strategy allows you to clearly identify who is paying attention. If someone needs to sign off, there should be a protocol in place so that others know that is happening.
Without this type of visibility, you can end up sending requests to someone who is offline, or wasting time (and mental energy) sending out messages asking the team to check in so that you know who is available. Avoid these issues by automating visibility into the conversation state as much as possible.
5. An Overly Rigid Real-Time Communication Strategy
Every incident is unique, and you shouldn’t expect the same communication strategy to allow you to respond to every issue effectively. You should have multiple processes and protocols on hand instead, with each one tailored to a different type or level of incident.
For example, communication surrounding a small-scale incident that requires response from only two or three engineers could take place efficiently enough through text message. That approach would become unwieldy for a larger incident that demands the attention of a dozen team members. In the latter case, a Slack channel would be a better solution.
To ensure that your team knows which communication approach to take for incidents of different types, include these specifications in your incident response playbooks.
Communicating in real time may sound simple, but it’s hard to achieve in practice. By selecting the right communication tools and channels, automating communication (along with other parts of your incident response process that lend themselves to automation) as much as possible, and tailoring communication strategies to the nature of the incident, you maximize your team’s ability to communicate effectively and, in turn, resolve issues faster.