Apache Log Analysis: Get It Visualized Using R Functions


· · · · ·

In a recent blog post, I described a fictional scenario about “Geordie’s” efforts to structure Apache error logs using R. Now, I want to talk about how Geordie can visualize the data in a graph. In this new scenario, clients are complaining that they suddenly can’t reach Koala due to an HTTP 500 error, and in this case, the goal is to visualize the results using R.

A graph says more than a table with 1,000 values!

With an R function, Geordie can run a function in the RStudio terminal as described in my previous blog. An R package is a collection of functions and datasets developed by the community to tackle a specific computational problem (in this article’s example, visualization). An R package can also be run in the RStudio terminal.

Now, let’s see how Geordie uses R functions to visualize his data.

R functions for visualization

First, Geordie wants to use generic R functions to visualize the data he has via basic plots.
(Examples are plot, boxplot or barplot.)

Geordie is interested in the frequency of HTTP 500 errors in the given period.This frequency, combined with the frequency of the other HTTP errors, can be shown in a table, but a histogram is even better.


This can be accomplished with the function barplot(height), where height is a vector or matrix.

Every function in R is represented as function(arguments), which is shown below.

# Simple Bar Plot: barplot(height) function
counts <- table(df$status)
barplot (counts, main="HTTP status  Distribution", xlab="Number of HTTP status")

In the last blog post, Geordie made a dataframe of the dataset: df.
df$status represents the column status in the dataframe df.

The function table(df$status) creates the results of the variable df$status, which are then represented as the variable counts. So the counts present a table with the frequency of the HTTP status codes.

barplot `(counts, main= “HTTP status Distribution”, xlab=”Number of HTTP status) plots the variable counts in a barplot with the title “HTTP status Distribution”; and at the x-axis, the HTTP statuses, and the y-axis, its frequencies.

Here is the result of the barplot “HTTP status Distribution”:

Great! So now Geordie can see the status code HTTP 500 occurred a couple of times during the given time period.

Pie chart

Can Geordie also visualize this in a pie chart with an R function?
Yes—by using the function pie(x, labels=) where x is a non-negative numeric vector indicating the area of each slice, and labels= notes a character vector of names for the slices.
For this pie chart, x is the already created variable counts, and with labels, he can create more with the paste command.

 #create labels
labs<- paste("",counts,"", "\n", sep="")

And now, Geordie has labeled the HTTP status codes he wants to see in the pie chart.
Can he also colorize the labels?
Yes, he can.
From the barplot, Geordie knows there are seven HTTP status codes—so seven labels.
By using the following array with seven components, Geordie can colorize all labels.

colors = c("blue", "yellow", "green", "violet", "orange", "red", "cyan")

Now Geordie has enough information to set up a pie chart with the R function pie.

pie(counts,labels=labs, main="HTTP status Distribution", col=colors)

Mind you, we have seven labels, but only six are shown. This is due to the fact that HTTP status code 408 only has one occurrence in the log, and is therefore too small to show in the pie chart.

Geordie can also define a legend for his pie chart with the following R code:

# define a legend for the pie chart
legend("topright", names(counts), cex=0.8, fill=colors)

names(counts) represents the column names of the frequency table of the HTTP status codes represented by the variable counts as discussed before.
Cex is the font size and fill=colors shows the coloring with the corresponding HTTP status codes in the legend.

Fantastic! Now Geordie has used two different R functions (barplot and pie) to visualize the Apache error logs.

What have we learned?

In this blog post, I have shown you how Geordie used R to visualize his Apache access logs in graphs with the specific R functions barplot and pie.
Here are the steps repeated:

 # Simple Bar Plot: barplot(height) function
counts <- table(df$status)
barplot (counts, main="HTTP status  Distribution", xlab="Number of HTTP status")
# Simple pie chart with the pie function
counts <- table(df$status)
#create labels
labs<- paste("",counts,"", "\n", sep="")
# colorize the pie chart
# from counts we know we have seven labels for the HTTP status codes
# with the following array we can colorize these labels
colors = c("blue", "yellow", "green", "violet", "orange", "red", "cyan")
pie(counts,labels=labs, main="HTTP status  Distribution", col=colors)
# the label for HTTP status code 408 is not shown, because it's too small to visualize
# define a legend for the pie chart
legend("topright", names(counts), cex=0.8, fill=colors)


R function barplot: https://www.statmethods.net/graphs/bar.html
R function pie: http://www.theanalysisfactor.com/r-tutorial-part-14/

Cordny Nederkoorn is a software test engineer with over 10 years experience in finance, e-commerce and web development. He is also the founder of TestingSaaS, a social network about researching cloud applications with a focus on forensics, software testing and security. Cordny is a regular contributor at Fixate IO. LinkedIn


Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Skip to toolbar