8 Tips to Create an Accurate and Helpful Post-Mortem Incident Report
Stuff breaks, it’s inevitable. People make mistakes, technology breaks down, and processes aren’t infallible. But, when incidents happen, ..
When to Reject Builds Based on CVE Severity
In a perfect world, you would instantly fix your application every time a relevant CVE (Common Vulnerability and Exposure) was issued. (In an even ..
When and Why to Adopt Feature Flags
What if there was a way to deploy a new feature into production but not actually turn it on until you’re ready? Well, there is. These ..
Leveraging Incident Management Reporting to Improve Your Response Strategy
The discussion about incident management tends to focus on what happens in real-time, when an incident is actually occurring. To a degree, ..
The Importance of Log Monitoring for Incident Response
One aspect critical to a development organization’s application quality is the implementation of a high-functioning incident response ..
5 Incident Response Metrics and How to Use Them
wo categories a software organization should always strive to improve in are application quality and incident response. Data analysis ..
Evolving an Incident Response Strategy as Teams and Services Grow
The typical path for a growing software development organization involves, by definition, growth. In this context, this likely means providing ..
Zero Impact SQL Database Deployments
The connection between an application and SQL database raises some interesting and complicated challenges when the time comes to update ..
ITIL 4 and Incident Response
If the IT industry were a religion, ITIL would be its sacred text – or at least one of them. Like a sacred text, ITIL lays out the concepts ..
Leveraging Incident Response for Application Quality
Incident response tools are most often used for production applications. But, their benefits can extend far beyond that, well through the entire ..
Why It Should Be Service, Not Site Reliability
Making sure services, applications and infrastructure are up and running has always been a role in technical organizations. But, as application ..
Source Code Control: Trunk-Based Development vs. GitFlow
Managing source code with a defined method is one vital aspect of implementing an effective organizational application development lifecycle. ..
Preventing Incident Severity Progression
When service degradation and service outages go from bad to worse, it’s an awful feeling. It’s bad enough that something broke in general. ..
Best Practices for Status Pages
Status pages have become the end-users window into your team’s operations. Companies with status pages are doing the right thing for their ..
When You Shouldn’t Use Infrastructure Automation
Automation, automation, automation. If Steve Ballmer were doing his developers dance today, instead of in 2001, those are probably the words ..
Living in an Interrupt Driven Culture
I don’t know how the term “shaving the yak” caught on, but I’m willing to embrace it for all of its weirdness and meaning. We all spend ..
How Bulkhead and Sidecar Patterns Support Incident Response
How you build your application absolutely impacts the lives of those in charge of supporting it. This isn’t a correlation we generally ..
Docker Swarm vs. Kubernetes for Single-Host Implementations
Most discussions about the merits of Docker Swarm vs. Kubernetes focus on large-scale deployments. But, what if you’re running your containerized ..
Securely Keeping Kubernetes Secrets in Git
At first glance, storing Kubernetes secrets seems simple enough. Kubernetes stores them automatically in etcd, a key-value store. And, Kubernetes can ..