Application monitoring is a well-established discipline that dates back decades and remains a pillar of software management strategies today. However, as software environments and architectures have evolved, monitoring techniques have needed to evolve along with them.
That’s why many teams today rely on distributed tracing to glean insights that they can’t gather from application monitoring alone. Distributed tracing provides a deeper level of visibility into complex distributed environments than application monitoring can achieve.
Here’s a breakdown of the similarities and differences between distributed tracing and application monitoring, along with tips on which technique to use when.
What Is Application Monitoring?
Application monitoring refers to the monitoring of an entire application as a single unit. Application monitoring tracks metrics like overall application availability and response time.
When you perform application monitoring, you don’t pay much attention to what happens inside the application, such as which processes are running within it. You monitor only what it looks like from the surface.
How Does Application Monitoring Work?
There are a variety of application monitoring techniques, but the most common approach is to gather data from log files in order to monitor how long the application takes to respond to requests, how response rates vary over time and for different groups of users, and so on.
In situations where you don’t have access to conventional log files – which is often the case when applications are deployed in the cloud – application monitoring typically depends on collecting metrics that are exposed by the services that power your application. Practically speaking, however, the process of collecting and analyzing these metrics is more or less the same as working with traditional log files, especially if you use a log aggregator that can handle both on-prem and cloud-based metrics.
What Is Distributed Tracing?
Distributed tracing is the tracking of performance and availability across the various components that comprise an application. When you perform distributed tracing, you monitor how requests are handled as they “flow” between different services that run inside the application.
Thus, distributed tracing goes deeper than surface-level monitoring. The primary goal of distributed tracing is to understand how each individual component of the application performs, rather than only monitoring the application as a whole.
How Does Distributed Tracing Work?
Distributed tracing works by collecting data from each component of an application as it responds to a request. Typically, the application components have to be programmed to expose this data in some way, either by writing it to log files or reporting it to a tracing agent that runs alongside the application to collect data about traces.
Once data from each individual application service or component has been collected, it is aggregated and correlated in order to allow engineers to understand how each component behaved as it responded to a given request. In this way, the team can identify which specific part of the application is causing a performance bottleneck or downtime, and focus their efforts on fixing that component.
Distributed Tracing Example
To understand how distributed tracing works in practice, consider an application that consists of three basic services: a frontend interface that accepts user input, a database that stores and serves data, and a backend service that connects the database to the frontend.
Now, imagine a user who submits a request to the frontend that requires data to be pulled out of the database. The request takes longer than expected, and the engineering team needs to figure out why.
With application monitoring alone, it would be difficult to pinpoint the source of the problem. Application monitoring would tell the team that the overall response is slow, but it wouldn’t reveal which individual service was causing the bottleneck.
A distributed trace, however, would follow the request as it flows across the three services that compose the application. In other words, it would track how long the frontend service takes to process the request and pass it on to the backend service, then how long the backend service takes to pass the request to the database service, which in turn serves the data and sends it back to the backend service, which finally sends it to the frontend service once again in order to serve to the user.
By tracking the response time of each individual service, it would be obvious if, for example, the backend service was experiencing a performance degradation that was causing a bottleneck. In that case, the team would know that it needs to fix whatever is wrong with the backend service (which could be a bug in the code, a lack of capacity, or something else).
This is a simplistic example of distributed tracing. In the real world, modern applications typically consist of a dozen or more services, not just three. There may also be multiple instances of the same service running in each environment, and it’s not always possible to know ahead of time which instances will handle a request. Thus, there are more variables to track and account for in real-world distributed tracing than the example above would suggest.
When to Use Distributed Tracing
Because distributed tracing reveals visibility into the individual components of an application, it is typically deployed to monitor distributed, microservices-based applications. If your application runs as a series of microservices rather than as a monolith, distributed tracing is vital for monitoring the application in an efficient way.
Again, without distributed tracing, you would only be able to track performance at the level of the application as a whole, making it difficult to pinpoint exactly where problems lie. As a result, fixing a problem would require overhauling the application as a whole, rather than the more efficient approach of fixing just the problematic component.
Challenges of Distributed Tracing
That said, it’s important to consider the challenges of distributed tracing when incorporating it into your monitoring strategy. Those challenges include:
Instrumentation: You need a way to collect data from individual services, which is more complicated than simply reading logs written by an application as a whole.
Dependency mapping: Making the most of distributed tracing requires an understanding of the dependencies between different services. This means that you need to understand the application architecture as a whole, in addition to tracking the performance of individual components.
Service instances: As noted above, there may be multiple instances of each service within your application. You’ll need to track the identity of each instance in order to determine whether a problem you detect affects all instances of the service, or just a particular one.
Service deployments: Similarly, services may be updated frequently as developers deploy new versions. For the best results, you’ll have to keep track of which version each service is running and monitor how performance trends change following new deployments.
Aggregation: Gaining the full benefits of distributed tracing requires aggregating and correlating the performance data from all of your services in a single place.
These challenges can all be addressed, but it’s important to plan for them. Unlike application monitoring, distributed tracing requires more effort to implement than simply collecting and analyzing log data.
When to Use Application Monitoring
Application monitoring is most commonly associated with monolithic applications, meaning those that are developed as a single codebase and deployed as a single process (or, sometimes, a handful of different processes). Monoliths don’t have discrete internal components whose performance you can track individually, so distributed tracing doesn’t work on them.
That said, application monitoring can be useful for microservices applications, too. It’s still important with a microservices app to know the overall response time and availability of your application, which are insights that application monitoring reveals. Many teams use application monitoring to track the overall performance of their microservices applications, while also deploying distributed tracing to glean deeper, more granular insight into performance problems when they arise.
In other words, distributed tracing and application monitoring are not mutually exclusive. In modern environments, you’ll typically want to use both techniques to understand what is happening in your software environment.