Plotting in R: ggplot2



ggplot2 is a data visualization package used in R. It is one of the Tidyverse packages optimized for applications in data science. It is designed to make visualizations easy to write and easy to interpret. It enables a programmer to create sophisticated visualizations with minimal code—maximizing design features, yet remaining flexible. The package can be downloaded here, where you will also find additional documentation and examples.

A Grammar of Graphics

The structure of ggplot2 is based on the philosophy outlined by Hadley Wickham (an author of the package) in his article A Layered Grammar of Graphics. At the philosophy’s center is the concept of a grammar, or basic syntactic structure, that defines aesthetically pleasing and statistically correct graphs. The syntax of ggplot2 reflects this structure. Within every graph, there are certain elements that construct the entirety of the whole. These include the geometries (dots, bars, lines, etc), the scales (units, span), text elements (such as the title, axis names, etc.), as well as the data itself, and how it is mapped (the variables in relation). ggplot2 uses these independent elements to define a graph, with each element representing a layer, and the final graph a series of the superimposed layers.

Basic Example

First off, to install ggplot2:


And to import some data:

 wine.names <- c('Class', 'Alcohol', 'Malic acid', 'Ash', 
'Alcalinity of ash', 'Magnesium', 'Total phenols', 'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins', 'Color intensity', 'Hue', 'OD280/OD315', 'Proline')
wine <- read.csv('', col.names = wine.names)

In this article, I’m using data from the UCI Machine Learning Repository. The data contains attributes from a chemical analysis on three different types of wine grown in the same region of Italy. To create a plot, we must first instruct ggplot2 which data we want to use, and how we want to map it. Mapping is done with the aes() command.

 ggplot(data = wine, aes(x = Alcohol, y = Proline))

Here we map the variable Proline onto Alcohol. The output we obtain:

Notice there is nothing on the plot—only the scales with each variable. This is because we have only instructed ggplot2 with the data and the variable mapping (It adjusts scales automatically to fit each variable range, if unspecified). In order to actually plot the data, we must also add a geometric layer. Here, I want to create a scatterplot, which uses points to display the data.

 ggplot(data = wine, aes(x = Alcohol, y = Proline)) + geom_point()

There are many different types of geometries that you can use within ggplot2. (For a comprehensive list, see here.) Using points, we obtain:

We can also change the axis names and add a title.

 ggplot(data = wine, aes(x = Alcohol, y = Proline)) + geom_point() + xlab('Percentage Alcohol') + ylab('Proline Content (mg)') + ggtitle('UCI Wine Data')

Or adjust the scales:

 ggplot(data = wine, aes(x = Alcohol, y = Proline)) + geom_point() + xlab('Percentage Alcohol') + ylab('Proline Content (mg)') + ggtitle('UCI Wine Data') + xlim(10,20)

One of the advantages of ggplot2 is the simplicity of syntax, even when visualizing a large number of attributes. This is particularly useful to discover correlations among variables quickly and efficiently.

 ggplot(data = wine, aes(x = Alcohol, y = Proline)) + geom_point() + xlab('Percentage Alcohol') + ylab('Proline Content (mg)') + ggtitle('UCI Wine Data') + facet_grid(Class~.)

This is done by specifying multiple facets. This breaks the graph into a grid of plots. The number of plots can be specified, or determined automatically by ggplot2. In our case, we’ll divide the data by type of wine, so we obtain three separate plots.

In this article, I’ve shared a relatively simple example using ggplot2. For something more sophisticated, check out this example that reproduces a clever graphical representation of Napoleon’s 1812 campaign to invade Russia.

Dante is a physicist currently pursuing a PhD in Physics at École Polytechnique Fédérale de Lausanne. He has a Master's in Data Science, and continues to experiment with and find novel applications for machine learning algorithms. He lives in Lausanne, Switzerland. Dante is a regular contributor at Fixate IO.


Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Skip to toolbar