As organizations have moved toward a microservices design pattern, the need for reliable and performant solutions that enable decoupled services to communicate with one another has grown. RabbitMQ is an open-source message broker designed for this purpose. We’ll discuss what RabbitMQ is, how it works, why it needs to be monitored and how Sumo Logic can effectively do this.
What is RabbitMQ and How Does It Work?
RabbitMQ is a messaging platform that allows for effective communication between decoupled services in a scalable and performant manner. RabbitMQ acts as a broker, receiving messages from producing services and providing functionality that enables subscribing services to consume these messages as needed.
Before diving into the monitoring aspect of RabbitMQ, let’s take a look at how it works. At a high level, RabbitMQ leverages the following concepts and components to accomplish its task:
Producer – In the context of RabbitMQ, the term producer refers to a service that writes messages to RabbitMQ.
Exchange – An exchange acts as a receiver of messages within RabbitMQ. It distributes the received messages to a queue or set of queues depending upon circumstances, discussed below.
Queue – A queue is the location from which consumers read messages. Messages are routed to a queue via an exchange.
Consumer – The term consumer refers to a service that needs to consume messages from RabbitMQ. Consumers subscribe to a particular queue or set of queues from which they require messages.
There are a variety of different methods by which messages are routed from an exchange to a particular queue or set of queues. These methods are known as exchange types. The basic exchange types are defined as follows:
Direct exchange – The direct exchange type routes messages based upon the routing key defined for the message. The exchange will do a direct comparison of the routing key and the key defined for the bindings that connect queues to the exchange. If the routing key and binding key are a perfect match, it will route the message to the queue that is connected to the exchange via that binding.
Fanout exchange – The fanout exchange type routes all messages to all queues bound to the exchange.
Topic exchange – The topic exchange type is similar to the direct exchange type in that it does a comparison between the routing key on the message and the binding keys that connect queues to the exchange. However, instead of requiring that they match perfectly, messages can be distributed to queues where the routing key satisfies a particular pattern defined by a binding. See this tutorial on the RabbitMQ website to gain a more in-depth understanding.
Header exchange – With the header exchange type, the routing key is ignored. Instead, a message is routed to a queue or set of queues based upon matching header properties in the message emitted to the exchange and the bindings that tie queues to the exchange.
Consuming services subscribe to queues. This enables them to consume the messages that they care about from the producing services that emit messages to the exchange. In this way, organizations can ensure effective communication between their systems without needing to adhere to a more tightly-coupled architecture.
The Importance of Monitoring RabbitMQ
For applications and services that rely upon RabbitMQ to provide a means of communication that drives critical functionality, problems related to poor performance, issues with availability, and so on can have a devastating impact. For instance, if messages are failing to be processed by the exchange and routed to the correct queues, it becomes likely that downstream systems will not have the information they need to function as designed. RabbitMQ must be monitored effectively to provide teams with insight into its performance, so they know when a problem exists. Furthermore, capabilities that make it easier to analyze this data, such as visualizations, can help teams attain much-needed context to identify the root cause and get the problem resolved quickly.
Sumo Logic’s RabbitMQ Overview dashboard gives you insights into performance across brokers, queues, exchanges and messages.
Sumo Logic is a log and metrics monitoring and analysis platform that provides these capabilities. Let’s dig deeper into how Sumo Logic can assist in the effort to comprehensively monitor RabbitMQ clusters.
Monitoring RabbitMQ Logs and Metrics With Sumo Logic
The first action to take in monitoring RabbitMQ with Sumo Logic is to configure log and metric collection for RabbitMQ clusters. Let’s take a deeper look at how this is accomplished.
Configuring log and metric collection for RabbitMQ and Sumo Logic is a two-step process, the first of which is to configure fields within the Sumo Logic UI. These fields are used to attach relevant metadata to your RabbitMQ logs and metrics, and this metadata is needed for the dashboards to populate correctly within the RabbitMQ app.
When creating these fields, it’s important to consider the environment in which RabbitMQ is running. If this environment is a Kubernetes environment, the necessary metadata fields will be different than if the environment is a non-Kubernetes environment. More information on which fields are necessary to configure for each environment type can be found here, while information on general field configuration in Sumo Logic can be found here.
After configuring the appropriate fields, it’s time to configure collection for RabbitMQ logs and metrics. Similar to field configuration, this process differs depending upon the type of environment in which you are working. Sumo Logic provides detailed instructions for configuring log and metric collection for RabbitMQ running in non-Kubernetes environments as well as in Kubernetes environments. To better understand the collection configuration in each environment, consider the following as a high-level explanation.
Telegraf is used for metrics collection. Telegraf is installed on the same host as RabbitMQ and collects metrics using the RabbitMQ input plugin.
Upon collecting metrics data, Telegraf sends this data to Sumo Logic using the Sumo Logic output plugin. The metrics are sent to an HTTP logs and metrics source, which is a URL endpoint that enables a Sumo Logic Hosted Collector to consume the data.
Log collection is accomplished using an Installed Collector. An installed collector is installed on the RabbitMQ host machine and is configured to collect logs that are then sent on to Sumo Logic.
The components of the RabbitMQ collection in a non-Kubernetes environment to send metrics to Sumo Logic
Again, Telegraf is used for metrics collection, with the Telegraf operator injecting Telegraf sidecar containers in all pods in which metrics need to be collected. Telegraf uses the RabbitMQ plugin to collect the metrics.
Each Telegraf container is scraped by Prometheus. Metrics are then sent on to Fluentd, where they are enriched and forwarded on to an HTTP source (Sumo Logic HTTP endpoint for consumption).
Fluentbit is responsible for log collection in Kubernetes environments. As with metrics data, logs are also sent to Fluentd and forwarded on to an HTTP source.
The metric collection pipeline in a Kubernetes environment to get data to Sumo Logic.
Getting the most from your data with Sumo Logic Monitors and the RabbitMQ App
To help organizations attain deeper visibility into the health of their RabbitMQ clusters, Sumo Logic has built a custom RabbitMQ app. This app can be installed via the Sumo Logic app catalog. The app comes equipped with preconfigured dashboards to assist teams in better visualizing logs and metrics in a manner that allows the greatest amount of value to be gleaned from this data.
Furthermore, Sumo Logic has what are called monitors that enable alerting to help teams ensure that critical personnel are notified when problems are experienced. In the case of RabbitMQ, Sumo Logic provides pre-packaged alerts (available through the installation of monitors) with thresholds that mirror best practices to help ensure optimal performance. When utilized, this alert functionality can result in reduced time to acknowledgment, allowing remediation to begin at the earliest possible moment in time.
Using the RabbitMQ dashboards
After installing the RabbitMQ app, users gain access to a variety of out-of-the-box dashboards. As logs and metrics are delivered to Sumo Logic, these dashboards begin to populate with easy-to-interpret and valuable visualizations that provide increased insight into the health and performance of the RabbitMQ clusters.
Consider the following dashboards and how they might assist an organization in monitoring RabbitMQ.
The Logs dashboard provides teams with the ability to gain a holistic view of log events across RabbitMQ clusters. This dashboard allows teams to more easily identify spikes in errors, to determine the frequency with which problematic events are occurring, and more.
The Queue dashboard enables teams to track data related to queues across RabbitMQ clusters. With this dashboard, teams can track message publish and delivery rate over time, memory usage, and other useful metrics to help infer the state of configured queues.
Sumo Logic’s at-a-glance view of the state of your queues in your RabbitMQ clusters.
The Exchange dashboard provides visualizations of metrics related to exchanges. These include the total number of messages published in and out of exchanges, as well as the message publish rates both in and out of exchanges. Drastic changes in publish rates may be indicative of problems within RabbitMQ that need to be resolved.
With just a few steps, you can maximize the value of the data gleaned from monitoring RabbitMQ logs and metrics with Sumo Logic. First, log and metric collection need to be configured for the RabbitMQ clusters. After this, monitors can be installed to enable alerting, and the RabbitMQ app can be installed to enable access to dashboards that allow teams to better visualize the state of their RabbitMQ instances.
For a deeper dive into how Sumo Logic works with RabbitMQ, please visit the Sumo Logic RabbitMQ documentation pages for a full walkthrough of each step required to take full advantage of this functionality.