Plotting in R: ggplot2



ggplot2 is a data visualization package used in R. It is one of the Tidyverse packages optimized for applications in data science. It is designed to make visualizations easy to write and easy to interpret. It enables a programmer to create sophisticated visualizations with minimal code—maximizing design features, yet remaining flexible. The package can be downloaded here, where you will also find additional documentation and examples.

A Grammar of Graphics

The structure of ggplot2 is based on the philosophy outlined by Hadley Wickham (an author of the package) in his article A Layered Grammar of Graphics. At the philosophy’s center is the concept of a grammar, or basic syntactic structure, that defines aesthetically pleasing and statistically correct graphs. The syntax of ggplot2 reflects this structure. Within every graph, there are certain elements that construct the entirety of the whole. These include the geometries (dots, bars, lines, etc), the scales (units, span), text elements (such as the title, axis names, etc.), as well as the data itself, and how it is mapped (the variables in relation). ggplot2 uses these independent elements to define a graph, with each element representing a layer, and the final graph a series of the superimposed layers.

Basic Example

First off, to install ggplot2:

And to import some data:

In this article, I’m using data from the UCI Machine Learning Repository. The data contains attributes from a chemical analysis on three different types of wine grown in the same region of Italy. To create a plot, we must first instruct ggplot2 which data we want to use, and how we want to map it. Mapping is done with the aes() command.

Here we map the variable Proline onto Alcohol. The output we obtain:

Notice there is nothing on the plot—only the scales with each variable. This is because we have only instructed ggplot2 with the data and the variable mapping (It adjusts scales automatically to fit each variable range, if unspecified). In order to actually plot the data, we must also add a geometric layer. Here, I want to create a scatterplot, which uses points to display the data.

There are many different types of geometries that you can use within ggplot2. (For a comprehensive list, see here.) Using points, we obtain:

We can also change the axis names and add a title.

Or adjust the scales:

One of the advantages of ggplot2 is the simplicity of syntax, even when visualizing a large number of attributes. This is particularly useful to discover correlations among variables quickly and efficiently.

This is done by specifying multiple facets. This breaks the graph into a grid of plots. The number of plots can be specified, or determined automatically by ggplot2. In our case, we’ll divide the data by type of wine, so we obtain three separate plots.

In this article, I’ve shared a relatively simple example using ggplot2. For something more sophisticated, check out this example that reproduces a clever graphical representation of Napoleon’s 1812 campaign to invade Russia.

Do you think you can beat this Sweet post?

If so, you may have what it takes to become a Sweetcode contributor... Learn More.

Dante is a former physicist who left the laboratory to join the ranks of scientists and engineers turning to techniques within data science to solve today’s problems. He is currently pursuing a Master's degree in Data Science for Complex Economic Systems. He lives in Turin, Italy.


Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *