Machine Learning for DevOps: Getting LogReduce Signatures


· · · ·

This blog post is the first of a two-part tutorial on using Sumo Logic’s feature LogReduce as a means of pattern recognition in machine data on the Sumo Logic platform. This first part will introduce LogReduce and what is necessary to adequately use this feature on your preferred machine data. The forthcoming second tutorial will discuss how we can interpret the results given by the LogReduce feature on the Sumo Logic platform.

As a DevOps engineer, I am very curious about machine data.
A lot of this data is found in logs.
But, there are a lot of logs, and the logs are very big.
How can I analyze these logs efficiently so that I can find what I am looking for (eg. root cause analysis bug).

Well, logs have a common characteristic. They are repetitious.
This means the same set of data can be found in the logs multiple times: patterns.

This makes my life as a DevOps engineer easier, but manually finding those patterns costs a lot of time, which I do not have.

Here machine learning comes to the rescue.
According to Arthur Samuel, machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed. With the development of algorithms, computers can learn from and make predictions on data.

Knowing this, I bet there is software built to address machine data pattern analysis with machine learning.
There is!
An example is Sumo Logic, a cloud-based machine log management and data analytics platform giving data insight to its customers.
I assume you have basic knowledge about the Sumo Logic platform. Otherwise, you can get quickly up-to-date with this tutorial.

Machine learning and Sumo Logic

One of the features of this platform is that it uses machine learning (via fuzzy logic and soft matching) to find patterns in the client’s data.
I will not tire you with the theory behind fuzzy logic and soft matching.
Let’s get practical!

A cool thing about Sumo Logic is that you can perform pattern recognition of your (big) machine data with the click of just one button: the LogReduce button.
Let’s see how this works.

Testing the LogReduce feature

In order to show how LogReduce works, I need a dataset and a reason to use LogReduce for pattern recognition.

Say I want to analyze the logs on my macOS workstation for a given time period and see if it shows patterns in my data.
Mind you, I am not a macOS log expert, I’m just curious if there are any log patterns to find.

Prerequisites for my macOS log analysis with Sumo Logic:

  1. Register as a free user on the Sumo Logic platform and log in
  2. Enable the streaming data connection between my macOS workstation and the online Sumo Logic analytics platform (use log location /var/log/system.log and use the time zone log file)

If all went well, you will receive an email that your data is ready, and a link to start analyzing your logs.

Clicking the link will open a Sumo Logic search screen, which does not show any results yet in the graph and messages. Click the Start button in the upper right of the screen.

Now the graph and messages will be filled in. Right above, you can also select the time limit of your search. Here, it is configured for the last 60 minutes.

Back to what we see on the screen.
Basically, you’ll see a graph of the number of times messages appear in the log (in the given time limit produced) and a list of the corresponding messages (26730 in total for this experiment).
You will also notice two buttons are now present: LogReduce and LogCompare.
Both are very important for log analysis.
For clarity, we will now discuss LogReduce.

Clicking the LogReduce button will result in a new tab: Signatures.

This tab shows the signatures (aka patterns) which group the similar log messages.
You can also see the LogReduce button is absent here, because the pattern search has been done.
As you can see, from the 26,730 messages, LogReduce generated 622 signatures, which also are counted separately, as seen in the Count column.

Great! This simplifies my search.
I now have a list of signatures (622) representing groups of a total of 26,730 messages
found in the vast log of my macOS system for a given time period.

In my next post on LogReduce, I will discuss how we can interpret these signatures for further pattern analysis of my macOS log.

Cordny Nederkoorn is a software test engineer with over 10 years experience in finance, e-commerce and web development. He is also the founder of TestingSaaS, a social network about researching cloud applications with a focus on forensics, software testing and security. Cordny is a regular contributor at Fixate IO. LinkedIn


Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published.

Skip to toolbar