Fault diagnosis on Local Area Networks (LANs) can be tricky at the best of times. Things get even more complex when you’re dealing with Wide Area Networks (WANs).
Smokeping is one great tool that can help you deal with the complexity. In this article, we’ll learn how to use Smokeping to identify and fix network problems.
So, what does Smokeping do and how does it work, you ask? Primarily, Smokeping is used to measure jitter and latency using ICMP echos (in other words, pings). It then uses rrdtool to graph the information for later analysis. You can choose how often you want to ping a destination and how many pings you want to send when you do.
Smokeping’s Key Features
- Excellent latency visualisation
- Interactive graph frontend
- Plugins to allow measurement of latency for other protocols
- Master/Slave system for distributed measurement (i.e. multiple locations)
- Alerting system
- Latency charts with the most interesting graphs
- Free and Open-Source
Smokeping Configuration and Best Practices
I recommend these best practices based on my own experience having used Smokeping over a number of years to analyze network latency and illustrate packet loss.
I suggest starting by making sure that 20 pings are sent every 60 seconds. It’s important to consider the relevance of this when you first deploy Smokeping as changes to these parameters will break your RRD graphs and require you to start from scratch. To set this rate, edit the Database file located in Smokeping’s config.d directory:
Next, configure the owner and contact e-mail address of your Smokeping installation by editing the General file in the same directory:
Configuring Ping Targets
Now onto the fun stuff!
When using Smokeping, you’re going to want to start graphing the latency to a multitude of destinations. I tend to pick servers that I want to monitor and then a few “control” subjects, like Google’s DNS servers or my ISP’s DNS servers. This will allow you to compare any lossy graphs to see if there is a common issue related to your Smokeping network connection or if it’s a specific destination. Indeed, you can also see if there are common issues on certain routed paths by doing this.
To configure your ping targets, open up the Targets file in the same directory as mentioned above. There is some default config in here to get you started and it’s quite straightforward to configure. As an example, here is my configuration for BBC News and Google U.K.:
To make this config live, simply restart Smokeping, then head over to your browser to see the changes.
Understanding the Graphs
Assuming you have configured Smokeping correctly, you’ll see something similar to the following (minus the populated graphs, of course). Allow a few minutes for a line measurement to begin to appear.
To further understand what we’re looking at, click into one of your graphs and you’ll see latency information over the last 3 hours, 30 hours, 10 days and 360 days. The data will be averaged over these time periods.
The graph below indicates excellent latency with a consistent ping time of around 12ms and no packet loss.
If there were packet loss, you’d see a percentage figure mentioned alongside the “packet loss” side heading and a colour relative to the percentage on the graph. An example illustrating some packet loss is shown below (note the bits on the graph in blue):
Smokeping is a useful tool that can aid DevOps Engineers and Network Engineers alike in diagnosing packet loss on their networks, or on their ISP’s networks. It can illustrate congestion issues, contention issues and poor routing. Having graphs to provide to your ISP or other service provider when raising a complaint can be very useful point of information to bolster your case, and to assist them in diagnosing a problem.
Although I have tried many other tools for graphing latency, I still find Smokeping to be the best around and would highly recommend it.
- Smokeping: http://oss.oetiker.ch/smokeping/
- Different probes: http://oss.oetiker.ch/smokeping/probe/index.en.html
- Online demo: http://oss.oetiker.ch/smokeping-demo/?target=Customers.OP
- Usage statistics for Smokeping: http://oss.oetiker.ch/smokeping/stats.en.html