How to Create a Simple Graph With R

In the process of doing an ROI analysis, I wanted a simple area chart showing the negative burn down of our costs while the project was in development, through to the positive savings of the project after completion. Excel didn’t do a great job for me, so I thought I’d give R a try.

As you can see in the image below, Excel put the x axis right through the center of the chart and since I had to use a column chart, it’s more blocky/chunky than I liked. With R and ggplot2, I was able to smooth out those lines and put the x axis in the right place. I also had a little more control over the labels on the x axis, so I could show the intervals at 12/24/… months instead of 1/13/….. Of course it’s super-easy to create charts in Excel, but with a little extra effort in R, you can have a much better final product.

The code should be fairly well commented, but here’s the general idea:

  1. Install ggplot2 and scales to build the chart and format it with dollar signs.
  2. Load data from a csv file. Ours simply contains months as numbers, and the total savings.
  3. Interpolate our 60 months into 1000 separate values to reduce the choppines.
  4. Add a valence column to indicate which values are positive or negative.
  5. Plot the chart.

Download the spreadsheets here:
ROI_Calculator (For coming up with the necessary numbers.)
SampleROI.csv (Easy format for importing into R.)

### Begin R Code ###
# If you haven't already, install the packages below.
library(ggplot2) # ggplot2 for creating the charts
library(scales) # scales for the dollar formatting of the axis.

#Load the csv file containing the values:
SampleROI <- read.csv(“C:/Users/convalytics/Documents/R/SampleROI.csv”) # Make sure to use your own path ***

# Interpolate the data into 1000 separate values. (to smooth out the choppiness of having only a few values)
interp <- approx(SampleROI$Month, SampleROI$TotalCostVsSavings, n=1000)

#Rebuild the data frame with the interpolated values.
roi <- data.frame(Month=interp$x, Savings=interp$y)

#Add a “valence” column to indicate positive vs negative values.
#Essentially selects the not-yet-existing “valence” column on a
# subset of the positive or negative values in the “Savings” column,
# and inserts a value of “pos” or “neg” accordingly.
roi$valence[roi$Savings >= 0] <- “pos”
roi$valence[roi$Savings < 0] <- “neg”

# Plot the chart
ggplot(roi, aes(x=Month, y=Savings, width=1)) +
geom_area(aes(fill=valence, alpha=.8,stat=”identity”)) +
scale_x_continuous(breaks=seq(0,60,12),expand=c(0,0)) +
scale_y_continuous(breaks=seq(-4000000,6000000,1000000),labels=dollar) +
scale_fill_manual(values=c(“darkred”,”darkgreen”)) +
labs(title=”Sample ROI”,x=”Months from Project Start”,y=”Running Cost vs. Savings”) +

# R Code by Jason Green : #
### End R Code ###

I hope this makes for a good example that you can use to tweak for your own uses. If you have any questions or suggestions, please leave us a comment!

0 comments… add one

Leave a Reply

Your email address will not be published. Required fields are marked *