How to Visualize an Apache Log with R Package ggvis

2032 VIEWS

·

Monitoring Apache logs is a daily requirement of a DevOps engineer. A tedious task. Visualization of Apache logs can make monitoring more efficient. An Apache log can be visualized by using R, the statistical computing language and environment. In this blog I will demonstrate how you can visualize an Apache log with the R package ggvis.

Imagine you are a DevOps engineer for a SaaS provider. It is Friday afternoon, almost the end of the work week. You just finished your last meeting of the day. And then the program manager sits at your desk and asks you to analyze the Apache log from the last 24 hours. One of the SaaS clients reported blocking issues (HTTP 500), and the program manager wants the frequency of all HTTP status codes of that day visualized. Nasty—That means overtime! Unfortunately, you do not have ELK implemented. (ELK stands for a combination of Elasticsearch, Logstash and Kibana.) With ELK you can stash, search and visualize the logging data. One of the data scientists is around, and advises you to use R. No worries—With the help of R, you can visualize your Apache logging data.

R

R is an open source statistical computing language and environment. It is widely used by data scientists to analyze data. With the help of RStudio, you can edit and debug your R code. (See the Sweetcode post R for Apache Log Analysis: Get It Structured.) The R code used here is a prerequisite for using the R package ggvis to visualize Apache logs.

 # read the log
df <- read.table('access_log')

# add column names
colnames(df) <-c('host','ident','authuser','date','time','request','status', 'bytes')

# to see the column names and first few rows of our dataframe
head(df)

By using special R packages, you can also visualize the data.

R package ggvis

With the R package ggvis, we are now going to analyze Apache logging data. I will not bother you with the details of ggvis. Instead, I will start immediately with implementation. By using the code editor in R, you can install the R package ggvis, retrieved from an R library.

 # install package ggvis
install.packages("ggvis", lib="/Library/Frameworks/R.framework/Versions/3.3/Resources/library")

Now, you have to load it in the current session:

 # load package
library(ggvis)

Every ggvis graphic starts with a call to the function ggvis().

 p <- ggvis(data set, x = variable used for x-axis, y = variable used for y-axis )

The first argument is the dataset you want to plot; the other arguments in this function describe how to map the variables to visual properties like an XY plot.

OK—so we want the frequency of the HTTP status codes to see how many times HTTP 500 occurred over the last 24 hours.

Our dataframe has a column named ‘status’ representing the HTTP status code.
In order to retrieve the frequency, we make a special table of the column status.

 TABLEstatus <- table(df$status)

Using the function as.data.frame.table() we can change the table TableStatus back into a dataframe, and now we can also retrieve the frequencies per status code.

 DATAFRAMEstatus <- as.data.frame.table(TABLEstatus)
DATAFRAMEstatus

The output now looks like:

Great! Now, for readability, let’s name the columns:

 colnames(DATAFRAMEstatus) <-c('HTTPSTATUS','Frequency')

Now we can use this dataframe in the function ggvis:

p <- ggvis(DATAFRAMEstatus, x = ~HTTPSTATUS, y = ~Frequency)
layer_points(p)

By using the layer_points() function, you can tell R how to visualize the data. With this function you can layer the elements.

Without this function, no plot will appear.

The pipe function

With the %/% function ‘pipe’ (part of the R package magrittr), you can rewrite the above code in a pipe-like fashion, which makes it more readable:

 DATAFRAMEstatus %>%
  ggvis(x = ~HTTPSTATUS, y = ~Frequency) %>%
  layer_points()

The output will deliver the same plot as used with the first ggvis function.

Give your graph a color

Now we have a graph.
Can we also use colors in the graph?
Yes—by using the following:

 DATAFRAMEstatus %>% ggvis(~HTTPSTATUS,~Frequency, fill := "green", stroke := "yellow") %>% layer_points()


By using fill and stroke, you can color the points and even the borders.

Interaction

Beyond visualization, you can also use the ggvis() function to connect them to interactive controls.

Imagine you want to use a tooltip, which shows the exact frequency of the corresponding HTTP status codes.

 DATAFRAMEstatus %>% ggvis(~HTTPSTATUS, ~Frequency) %>%
  layer_points() %>%
  add_tooltip(function(df) df$Frequency)

https://lh4.googleusercontent.com/4wFi8Qyv_NHXJtIXV0ih6mqbbD47u4ZY7gj0mC0X4v1U0tR-9Bk5Njpt6BRSGkD-HYvzMtukh0IgEJNC6nkxinWuIPbQBnoqrDj126tV9tjr7qez-gWuGCjW_v-f7966X2-npMSg

Eureka! We’ve fulfilled the program manager’s request: “I want the frequency of all HTTP status codes of that day visualized.” Now we only have to make a copy of the last graph, email it to program management, and we can go home.

Weekend!

What did you learn?

  1. the package ggvis to visualize an Apache log (graph, color and interaction)
  2. the %/% function (‘pipe’) to make the R code more readable

Cordny Nederkoorn is a software test engineer with over 10 years experience in finance, e-commerce and web development. He is also the founder of TestingSaaS, a social network about researching cloud applications with a focus on forensics, software testing and security. Cordny is a regular contributor at Fixate IO. LinkedIn


Discussion

Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Menu
Skip to toolbar