monitoring tools

Best Practices for Monitoring Containers Effectively


· · ·

Containers are a great technology, but they create new challenges when it comes to monitoring.

Why? Because containers enable a microservices approach, where each application is a collection of many services, with each built, deployed, and managed individually. This facilitates more agile applications, but it also makes for more complex monitoring.

How can you adapt to meet the changing monitoring requirements of our increasingly containerized world? This article outlines best practices for container monitoring.

Monitor individual container metrics

It’s not just the modern, cloud-native apps that are leveraging containers, but also the “traditional” on-premises enterprise apps that see big gains when they are containerized. These apps typically transition from being a single monolithic stack to gradually breaking out into microservices. During this transition, slowly but surely, everything changes, from the kind of infrastructure used, to features that are added and removed, changes in team structure, and most of all, monitoring tools that track all of this change.

The big change with containerization is that you now need to monitor not just physical hosts, but also the services that run on them, and the containers that make up these services. Monitoring has more layers with containers added. Keeping track of CPU, memory, I/O, and network stats for every container is a daunting task. You can’t manually run commands, or check error logs to spot anomalies. You need a smarter way to keep track of all this data.

Container monitoring solutions should be able to give you metrics at the top level, and allow you to drill down to the performance of each container. As you transition from monitoring a few servers to monitoring hundreds or thousands of containers, having the right mix of monitoring tools is key to ensuring a smooth transition.

Leverage open-source tools and standards

When it comes to monitoring, you don’t want to be locked into proprietary workflows or integrations. The best way to avoid this is to develop a monitoring strategy based on open-source tools and standards.

There are lots of open source monitoring tools available today, and it could take an entire series of blog posts to cover them all in detail. Prometheus, InfluxDB, and Graphite are among the most popular. So are other tools like Elasticsearch, Logstash, and Kibana, which make up the ELK stack.

Some of these tools handle the first step of data collection and data transfer—for example, Fluentd, and Filebeat. Others, like InfluxDB, Graphite, and Elasticsearch, are powerful data analysis and processing tools that can process large quantities of data in real time. And to complete the loop, there are visualization and UI tools like Kibana and Grafana that put more power in the hands of end users, allowing them to manipulate data according to their needs. Some tools like Prometheus take on more than just one part of the monitoring mix.

Open source monitoring tools can be used effectively to plug weak spots in your monitoring toolchain. They are extensible, and can be easily integrated with other monitoring tools. Built to monitor modern cloud-based apps, these monitoring tools are an essential part of container monitoring.

Automate service discovery

In microservices apps, services are added and removed all the time. You move containers between hosts; autoscaling groups add and remove instances dynamically; then, there’s failover and auto replication. All this makes for infrastructure that’s always changing.

Manually connecting services every time their network location changes is not feasible, and isn’t natural to container-native infrastructure. You need to automate service discovery to ensure your monitoring evolves along with your stack.

Some monitoring tools have service discovery built-in, while others require you to manage service discovery using a tool like etcd. Service discovery plays an important part in scaling clusters of containers. When v3 of etcd was released last year, it enabled Kubernetes, which depends on etcd for service discovery, to scale to 5,000 nodes and 150,000 pods.

Act on data in real time

Containers are ephemeral. A lot of the performance data they produce loses its value after a few minutes. Monitoring tools need to adapt to this trend. Today, many monitoring tools are out to spot issues before they escalate. They do this using predictive analytics.

Log data, for example, is monotonous to read manually, and requires automated analysis by algorithms to make it actionable. One way to reduce the time it takes to respond to data is to use data visualization. It’s a lot easier to spot a spike or drop in a chart than it is to spot the same pattern in a series of thousands of lines of log data.

A powerful machine learning algorithm needs to be backed up by a meticulous alerting system. The goal is to avoid alert fatigue, and ensure the right people are notified about the right incidents.

Docker provides metrics, events, and logs. These, however, are basic performance data for containers, and don’t come with the analytics chops of advanced monitoring tools. It makes sense to push this data out to a third-party monitoring tool where you can derive more value from the data. Having a powerful platform to interact with data can make all the difference in how quickly you can respond to issues.

Containers bring new challenges to monitoring. However, there are many solutions available today to meet the demands of containerized apps. Open-source tools are an essential addition to any monitoring stack. Service discovery is a challenge with containers, and monitoring tools should leverage a capable service registry like etcd to ensure monitoring doesn’t stop when services change.

Finally, you often have just a few minutes to act on performance data for containers. You need a tool that is equally intelligent and quick at notifying you of issues. Container monitoring is not easy, but with the right approach, and the right toolset, any team can have a smooth transition to containerized applications.

Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications. Twain is a regular contributor at Fixate IO.


Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

%d bloggers like this: