Welcome to my blog on the following topics: speeding up R codes using C++, building packages using R and C++, parallel computing with R and C++. These tools can help with simulations in daily research. There’re also some of my troubleshootings, which might help with future debug.

Speed up R codes using C++

It is well-known that R is vectorized and slow in executing loops. To speed up the execution, C++ can serve as a remedy. The R package Rcpp by Dirk Eddelbuettel et. al. provides nice integration of R and C++.

  1. Installation of Rcpp
    • (On windows) Install Rtools in a folder whose name doesn’t contain spaces or tabs
    • Install package Rcpp
  2. Some Basic Syntax
    • Not reusable functions: Directly write C++ functions in RStudio console using cppFunction(). E.g.
        cppFunction("double foo(double x){return x+1.0;}",depends="RcppArmadillo")
      
    • Reusable functions: Prepare a C++ file someFile.cpp. Write functions in it. Directly source the file someFile.cpp in RStudio.
    • In C++ files, insert // [[Rcpp::export]] before the declaration of functions that you want to pass to R. Otherwise you cannot call it in R.
    • Functions written in someFile.cpp is only usable in the current R session and cannot be saved. If a new R session is started, we need to source someFile.cpp again
    • Reference: Blog, Gallery, 中文参考
  3. Another useful package: RcppArmadillo. It provides some R-like functions including sampling functions. Reference: click here
  4. One more useful package RcppEigen: click here.
  5. Yes another helpful package RcppNumerical: see here
  6. Some special topics:
    • Passing a C++ function as an argument into another C++ function in Rcpp:
      • To do the “passing” in R console: The data type of the callee is Rcpp::Function or SEXP. And pass the output of this callee to as<double>() before passing it to any local variable. But you can only do the “passing” in R console rather than in the C++ file. Possible to export the caller.
      • To do the “passing” in C++ file by other C++ functions: the data type of this callee is a pointer. Need to declare a new type for the argument. Cannot export the caller.
      • To do the “passing” in both: Seems not easy. By adding //[[Rcpp::export]] before the caller, the type is automatically SEXP. ad hoc Remedy: use a “wrapper” to perform the call in cpp and export the wrapper to R.

Speeding up R codes: other methods

Sometimes a simple improvement suffices.

  1. Parallel computing: with the help of the cluster system in CU, we can use around 30 cores for one task.
  2. Parallel computing & C++: Notice that, C++ functions cannot be paralleled in R unless they are built into an R package. (Please refer to the next section)
  3. Some tricks:
    - C++: Use pointers properly
    - C++: Reduce the number of local variables (declaration and copying are slow in C++)
    - C++: Use pipe operator. For details, IDK…
    - R: Use the Forward-Pipe Operator %>% from package magrittr.
    - R: Vectorize, use apply, etc.

Building packages purely by R

To build an R package, just prepare all source codes and use the pane in RStudio. Reference: click here

Building packages by R and C++

To make your codes distributable and parallelable, it would be a good choice to build a package for it. The following are steps for writing, building and updating a package.

  1. Write a package with Rcpp
    a. Write source .cpp codes
    b. Create a package skeleton: I prefer the following command
      RcppArmadillo::RcppArmadillo.package.skeleton("yourPackageName")
    

    One can also use

      Rcpp.package.skeleton("yourPackageName", cpp_files = c("convolve.cpp"),example=F)
    

    A folder named yourPackageName will be created in the working directory.
    c. Copy all .cpp and .R source code files to ./src folder directly.
    d. Some notes:

    • The created package includes: DESCRPTION & man folder,  NAMESPACE, src & R folder (which include RcppExport). We only touch the src folder.
    • RcppArmadillo changes C++ data type to R data type, automates DESCRPTION and NAMESPACE (linkingTo etc). If instead one uses Rcpp.package.skeleton, one still needs to modify Depends/Imports and LinkingTo, along with correct NAMESPACE file, makevar in description.
    • Compared to Rcpp, RcppArmadillo will create additional Makevars and Makevars.win files in the src folder. No need to modify them.
    • If the package is built using RStudio pane button, need to add makevar file in ./src (or directly choose package type “with rcpparmadillo”), change the documentation cppFileName.rd in ./man. It is really tedious.
  2. Build the package: execute the following commands in R console
    a. compileAttributes() (to modify the RcppExports.R file)
    b. setwd('./yourPackageName')
    c. devtools::check() (optional)
    d. devtools::build() (create a someName.tar.gz file, that is your package and you can upload it to the cluster or send it to others)

  3. Use the package
    a. Install: there are many ways to install
    • install.packages('someName.tar.gz',repos=NULL,type='source')
    • devtools::install('yourPackageName')
    • setwd('./yourPackageName'); devtools::install()
      b. Load: in R session, run library('yourPackageName')
    • Use ls(package:yourPackageName) to check what functions are loaded from the package
  4. Update the package
    a. Modify codes in ./src
    b. Repeat Step 2: Build the package and Step 3: Use the package

Parallel computing with R and C++

Recall that C++ functions cannot be paralleled in R unless they are built into an R package. Moreover, once the package is ready, we must load it in the parallel computing function foreach(). For example,

foreach(iRep = 1:nRep, .combine = 'c', .packages = c('magrittr','yourPackageName')) %dopar% { someSimulation(iRep) }

Otherwise there will be an error Package not found.

Troubleshooting

Problems to be solved