How to Create a Simple Graph With R

In the process of doing an ROI analysis, I wanted a simple area chart showing the negative burn down of our costs while the project was in development, through to the positive savings of the project after completion. Excel didn’t do a great job for me, so I thought I’d give R a try.

As you can see in the image below, Excel put the x axis right through the center of the chart and since I had to use a column chart, it’s more blocky/chunky than I liked. With R and ggplot2, I was able to smooth out those lines and put the x axis in the right place. I also had a little more control over the labels on the x axis, so I could show the intervals at 12/24/… months instead of 1/13/….. Of course it’s super-easy to create charts in Excel, but with a little extra effort in R, you can have a much better final product.

The code should be fairly well commented, but here’s the general idea:

  1. Install ggplot2 and scales to build the chart and format it with dollar signs.
  2. Load data from a csv file. Ours simply contains months as numbers, and the total savings.
  3. Interpolate our 60 months into 1000 separate values to reduce the choppines.
  4. Add a valence column to indicate which values are positive or negative.
  5. Plot the chart.

Area Chart in R vs Bar Chart in Excel

Download the spreadsheets here:
ROI_Calculator (For coming up with the necessary numbers.)
SampleROI.csv (Easy format for importing into R.)


### Begin R Code ###
# If you haven't already, install the packages below.
library(ggplot2) # ggplot2 for creating the charts
library(scales) # scales for the dollar formatting of the axis.

#Load the csv file containing the values:
SampleROI <- read.csv("C:/Users/convalytics/Documents/R/SampleROI.csv") # Make sure to use your own path *** # Interpolate the data into 1000 separate values. (to smooth out the choppiness of having only a few values) interp <- approx(SampleROI$Month, SampleROI$TotalCostVsSavings, n=1000) #Rebuild the data frame with the interpolated values. roi <- data.frame(Month=interp$x, Savings=interp$y) #Add a "valence" column to indicate positive vs negative values. #Essentially selects the not-yet-existing "valence" column on a # subset of the positive or negative values in the "Savings" column, # and inserts a value of "pos" or "neg" accordingly. roi$valence[roi$Savings >= 0] <- "pos" roi$valence[roi$Savings < 0] <- "neg" # Plot the chart ggplot(roi, aes(x=Month, y=Savings, width=1)) + geom_area(aes(fill=valence, alpha=.8,stat="identity")) + scale_x_continuous(breaks=seq(0,60,12),expand=c(0,0)) + scale_y_continuous(breaks=seq(-4000000,6000000,1000000),labels=dollar) + scale_fill_manual(values=c("darkred","darkgreen")) + labs(title="Sample ROI",x="Months from Project Start",y="Running Cost vs. Savings") + theme(plot.title=element_text(face="bold")) # R Code by Jason Green : Convalytics.com # ### End R Code ###

I hope this makes for a good example that you can use to tweak for your own uses. If you have any questions or suggestions, please leave us a comment!