First Impressions: ProjectTemplate For R-Projects
Order, Order, Order
If you have done any major R-project, you quickly get to the point where it is hard to keep everything ordered – your scripts, you data, your output, your tests… If you have done several major R-projects, you know how hard it is to keep a similar structure and workflow between projects. The end result can be that projects are hardly reproducible, because you confuse others with a lack of order in each project, and a different order – to the degree that you have any order – from one project to the next.
There are some resource available – free and open source – that can help you keep order in your project and keep a similar structure and workflow in every project you do. One such resource is ProjectTemplate .
How It Works
ProjectTemplate helps you automate the grinding in your project by organizing the files in your project in predefined folders, loading all the R packages you need, loading your data-sets into memory, and pre-processing (mounging) your data into a form suitable for analysis. The workings of the package are easily configured in a single configuration file.
In addition to the automation of grudgery, the ProjectTemplate team want to promote better coding and analysis practices by :
- Curating the best R packages.
- Providing simple tools for keeping a log of your work
- Providing template code for:
- Data diagnostics
- Data munging
- Code profiling
- Unit testing
ProjectTemplate creates a series of folders in the project. Folders of particular importance are:
data - Both original and eventual processed data src - All R-scripts created for the project figure - All graphical output reports - All project reports
In each directory, there is a README.md-file that explains what the directory is for. For example, the README.md for the src- folder looks like this:
Here you’ll store your final statistical analysis scripts. You should add the following piece of code to the start of each analysis script: `library(‘ProjectTemplate); load.project()`. You should also do your best to ensure that any code that’s shared between the analyses in `src` is moved into the `munge` directory; if you do that, you can execute all of the analyses in the `src` directory in parallel. A future release of ProjectTemplate will provide tools to automatically execute every individual analysis from `src` in parallel.
This is helpful for ensuring a consistent use of the template throughout the project and between different projects.
To load the project, you’ll first need to
setwd() into the root directory of your project. Next, you need to run the following two lines of R code:
After you enter the second line of code, you’ll see a series of automated messages as ProjectTemplate goes about doing its work. This work involves:
- Reading in the global configuration file contained in
- Loading any R packages listed in the configuration file.
- Reading in any data-sets stored in
The workings of the package are set in the configuration file
/config/global.dcf. A typical setup might look like this:
libraries: reshape, plyr, dplyr, ggplot2, stringr, lubridate, Hmisc
This setup loads the data, goes through all the scripts in the munge-folder, and loads all the libraries defined above, every time you call
Have a look at the Getting Started Guide from ProjectTemplate for more details  on how to get started. The guide is one of the more well-written “getting started”-guides I’ve seen.
ProjectTemplate is a good aid to make sure that your R-projects are reproducible. It helps maintain good structure, and a similar structure across projects. The same – and perhaps more importantly – goes for workflow. Note that I have just used it for a couple of relatively simple projects so far, and I will write an extensive review as soon as I gain more experience with it.
 The ProjectTemplate Website – http://projecttemplate.net/ – Downloaded January 20, 2018.
 ProjectTemplate Getting Started Guide – http://projecttemplate.net/getting_started.html – Downloaded January 20, 2018.