Order, Order, Order

If you have done any major R-project, you quickly get to the point where it is hard to keep everything ordered – your scripts, you data, your output, your tests… If you have done several major R-projects, you know how hard it is to keep a similar structure and workflow between projects. The end result can be that projects are hardly reproducible, because you confuse others with a lack of order in each project, and a different order – to the degree that you have any order – from one project to the next.

There are some resource available – free and open source – that can help you keep order in your project and keep a similar structure and workflow in every project you do. One such resource is ProjectTemplate [1].

How It Works

ProjectTemplate helps you automate the grinding in your project by organizing the files in your project in predefined folders, loading all the R packages you need, loading your data-sets into memory, and pre-processing (mounging) your data into a form suitable for analysis. The workings of the package are easily configured in a single configuration file.

In addition to the automation of grudgery, the ProjectTemplate team want to promote better coding and analysis practices by [1]:

  • Curating the best R packages.
  • Providing simple tools for keeping a log of your work
  • Providing template code for:
    • Data diagnostics
    • Data munging
    • Code profiling
    • Unit testing

ProjectTemplate creates a series of folders in the project. Folders of particular importance are:

data - Both original and eventual processed data
src - All R-scripts created for the project
figure - All graphical output
reports - All project reports

In each directory, there is a README.md-file that explains what the directory is for. For example, the README.md for the src- folder looks like this:

Here you’ll store your final statistical analysis scripts. You should add the following piece of code to the start of each analysis script: `library(‘ProjectTemplate); load.project()`. You should also do your best to ensure that any code that’s shared between the analyses in `src` is moved into the `munge` directory; if you do that, you can execute all of the analyses in the `src` directory in parallel. A future release of ProjectTemplate will provide tools to automatically execute every individual analysis from `src` in parallel.

This is helpful for ensuring a consistent use of the template throughout the project and between different projects.

Getting Started

To load the project, you’ll first need to setwd() into the root directory of your project. Next, you need to run the following two lines of R code:

install.packages('ProjectTemplate')
library('ProjectTemplate')

After you enter the second line of code, you’ll see a series of automated messages as ProjectTemplate goes about doing its work. This work involves:

  • Reading in the global configuration file contained in config.
  • Loading any R packages listed in the configuration file.
  • Reading in any data-sets stored in data or cache.

The workings of the package are set in the configuration file /config/global.dcf. A typical setup might look like this:

version: 0.8
data_loading: TRUE
data_loading_header: TRUE
data_ignore:
cache_loading: TRUE
recursive_loading: FALSE
munging: TRUE
logging: FALSE
logging_level: INFO
load_libraries: TRUE
libraries: reshape, plyr, dplyr, ggplot2, stringr, lubridate, Hmisc
as_factors: TRUE
data_tables: FALSE
attach_internal_libraries: FALSE
cache_loaded_data: TRUE
sticky_variables: NONE

This setup loads the data, goes through all the scripts in the munge-folder, and loads all the libraries defined above, every time you call load.project().

Have a look at the Getting Started Guide from ProjectTemplate for more details [2] on how to get started. The guide is one of the more well-written “getting started”-guides I’ve seen.

Preliminary Conclusion

ProjectTemplate is a good aid to make sure that your R-projects are reproducible. It helps maintain good structure, and a similar structure across projects. The same – and perhaps more importantly – goes for workflow. Note that I have just used it for a couple of relatively simple projects so far, and I will write an extensive review as soon as I gain more experience with it.

Reference


[1] The ProjectTemplate Website – http://projecttemplate.net/ – Downloaded January 20, 2018.

[2] ProjectTemplate Getting Started Guide – http://projecttemplate.net/getting_started.html – Downloaded January 20, 2018.

 

0 comments

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.