The advent of microservice architectures has had a marked effect on the development, deployment, and maintenance of complex systems. A key component for maintaining these systems, however, is to have a robust monitoring system in place, so it notifies you when problems arise.
In this article, I’m going to introduce you to the monitoring tools available from LogDNA, and walk you through how to set up a basic monitoring solution for your environment. We’ll discuss why monitoring is essential, what elements a good monitoring plan should have, and then get our hands dirty with an example.
Complexities of Monitoring When You Have Microservices
Not long ago, I was on a team which regularly deployed updated versions of a web application. For the first few hours, we watched the logs from the server scrolling through a terminal window on our screens, and when problems arose, we could download the logs and grep the problems to figure out what went wrong.
With microservice architecture, the servers can be smaller, but there are significantly more of them—especially on a platform which might automatically scale the number of instances up and down, and replace downed instances. Trying to determine which server has an issue and trying to locate the logs before the server terminates can become an exercise in futility.
Aggregation services provide engineering teams with a single access point for all logs and include the ability to search, identify potential problems before they cause outages, and automate notifications when errors are detected.
Implementing a Monitoring Plan with LogDNA
In this article, we’re going to examine LogDNA, which is a platform that allows you to aggregate and analyze logs from microservices. If you don’t already have an account with LogDNA, they offer a 14-day free trial which grants you access to all of the features you’ll need to work through this demo with your microservices.
For this demo, I used a microservice which I’ve developed and deployed on an AWS EC2 instance. I’m going to walk through the steps to install the LogDNA agent directly on my microservice. If you are supporting an enterprise microserver ecosystem, I would recommend adding the agent to your base image, thus allowing all services built using that image to have the agent by default.
Log into your account, and let’s get going.
Installing the LogDNA Agent
There are several ways to retrieve the logs from your microservices. The first thing we’ll walk through is installing the agent on a Linux system. Navigate to your dashboard and look for the Getting Started information. Click on the Add log sources link.
Figure 1. Adding Log Sources from the LogDNA Dashboard
Your account is assigned a unique ingestion key. This key ensures that logs which are ingested by LogDNA are directed to your account. Fortunately, each of the installation scripts includes this unique key, so you don’t have to worry about copying it into the right place within each script.
I’ll be installing the Amazon Linux agent, but there are agents available for multiple Linux distributions, Windows, and MacOS. They also support log aggregation from Kubernetes, Docker, and other container platforms. You’ll see a sampling of the options below.
Figure 2. Agents and Platform Integrations Available
The Linux installation I selected required me to SSH into the server and copy and paste several command-line instructions. The entire installation was complete within a minute or two, and logs began appearing in my LogDNA account within a few minutes after installtion and startup.
Working with Your Logs
In the screenshot below, you can see a couple of actions displayed. Once the logs began to display in the dashboard, I performed a search for any log entry which contained the word ERROR. The list of log entries was filtered to contain only those matching my request.
When I clicked on the arrow next to one of the entries, I was able to see more detailed information about the error in question. With a few clicks, I know which instance generated the error, what the error message was, the IP address of the machine, and which logs were responsible for capturing the message.
At this point, even if the server terminates, I can still view the logs, including log entries within the same time frame, allowing me to reconstruct what might have caused the error, and provide enough information for an engineer to prevent the error from reoccurring.
Figure 3. Analyzing System Logs Within LogDNA
Once you have your log data flowing into LogDNA, you can use the data to create dashboards for specific metrics, add other members of your organization to your account, and set up alerts for specific conditions. You can learn more about all of these features in the extensive LogDNA documentation.