How to Create a Simple Graph With R

In the process of doing an ROI analysis, I wanted a simple area chart showing the negative burn down of our costs while the project was in development, through to the positive savings of the project after completion. Excel didn’t do a great job for me, so I thought I’d give R a try.

As you can see in the image below, Excel put the x axis right through the center of the chart and since I had to use a column chart, it’s more blocky/chunky than I liked. With R and ggplot2, I was able to smooth out those lines and put the x axis in the right place. I also had a little more control over the labels on the x axis, so I could show the intervals at 12/24/… months instead of 1/13/….. Of course it’s super-easy to create charts in Excel, but with a little extra effort in R, you can have a much better final product.

The code should be fairly well commented, but here’s the general idea:

1. Install ggplot2 and scales to build the chart and format it with dollar signs.
2. Load data from a csv file. Ours simply contains months as numbers, and the total savings.
3. Interpolate our 60 months into 1000 separate values to reduce the choppines.
4. Add a valence column to indicate which values are positive or negative.
5. Plot the chart. ``` ### Begin R Code ### # If you haven't already, install the packages below. library(ggplot2) # ggplot2 for creating the charts library(scales) # scales for the dollar formatting of the axis.```
``` #Load the csv file containing the values: SampleROI <- read.csv("C:/Users/convalytics/Documents/R/SampleROI.csv") # Make sure to use your own path *** # Interpolate the data into 1000 separate values. (to smooth out the choppiness of having only a few values) interp <- approx(SampleROI\$Month, SampleROI\$TotalCostVsSavings, n=1000) #Rebuild the data frame with the interpolated values. roi <- data.frame(Month=interp\$x, Savings=interp\$y) #Add a "valence" column to indicate positive vs negative values. #Essentially selects the not-yet-existing "valence" column on a # subset of the positive or negative values in the "Savings" column, # and inserts a value of "pos" or "neg" accordingly. roi\$valence[roi\$Savings >= 0] <- "pos" roi\$valence[roi\$Savings < 0] <- "neg" # Plot the chart ggplot(roi, aes(x=Month, y=Savings, width=1)) + geom_area(aes(fill=valence, alpha=.8,stat="identity")) + scale_x_continuous(breaks=seq(0,60,12),expand=c(0,0)) + scale_y_continuous(breaks=seq(-4000000,6000000,1000000),labels=dollar) + scale_fill_manual(values=c("darkred","darkgreen")) + labs(title="Sample ROI",x="Months from Project Start",y="Running Cost vs. Savings") + theme(plot.title=element_text(face="bold")) ```
```# R Code by Jason Green : Convalytics.com # ### End R Code ### ```