Building an `R` Package and Integrate `C++`
Welcome to my blog on the following topics: speeding up R codes using C++, building packages using R and C++, parallel computing with R and C++. These tools can help with simulations in daily research. There’re also some of my troubleshootings, which might help with future debug.
Speed up R codes using C++
It is well-known that R is vectorized and slow in executing loops. To speed up the execution, C++ can serve as a remedy. The R package Rcpp by Dirk Eddelbuettel et. al. provides nice integration of R and C++.
- Installation of Rcpp
- (On windows) Install Rtools in a folder whose name doesn’t contain spaces or tabs
- Install package Rcpp
- Some Basic Syntax
- Not reusable functions: Directly write
C++functions in RStudio console usingcppFunction(). E.g.cppFunction("double foo(double x){return x+1.0;}",depends="RcppArmadillo") - Reusable functions: Prepare a
C++filesomeFile.cpp. Write functions in it. Directly source the filesomeFile.cppin RStudio. - In
C++files, insert// [[Rcpp::export]]before the declaration of functions that you want to pass toR. Otherwise you cannot call it inR. - Functions written in
someFile.cppis only usable in the currentRsession and cannot be saved. If a newRsession is started, we need to sourcesomeFile.cppagain - Reference: Blog, Gallery, 中文参考
- Not reusable functions: Directly write
- Another useful package:
RcppArmadillo. It provides someR-like functions including sampling functions. Reference: click here - One more useful package
RcppEigen: click here. - Yes another helpful package
RcppNumerical: see here - Some special topics:
- Passing a
C++function as an argument into anotherC++function in Rcpp:- To do the “passing” in R console: The data type of the callee is
Rcpp::FunctionorSEXP. And pass the output of this callee toas<double>()before passing it to any local variable. But you can only do the “passing” in R console rather than in theC++file. Possible to export the caller. - To do the “passing” in
C++file by otherC++functions: the data type of this callee is a pointer. Need to declare a new type for the argument. Cannot export the caller. - To do the “passing” in both: Seems not easy. By adding
//[[Rcpp::export]]before the caller, the type is automaticallySEXP. ad hoc Remedy: use a “wrapper” to perform the call in cpp and export the wrapper to R.
- To do the “passing” in R console: The data type of the callee is
- Passing a
Speeding up R codes: other methods
Sometimes a simple improvement suffices.
- Parallel computing: with the help of the cluster system in CU, we can use around 30 cores for one task.
- Parallel computing &
C++: Notice that,C++functions cannot be paralleled inRunless they are built into anRpackage. (Please refer to the next section) - Some tricks:
-C++: Use pointers properly
-C++: Reduce the number of local variables (declaration and copying are slow inC++)
-C++: Use pipe operator. For details, IDK…
-R: Use the Forward-Pipe Operator%>%from packagemagrittr.
-R: Vectorize, useapply, etc.
Building packages purely by R
To build an R package, just prepare all source codes and use the pane in RStudio. Reference: click here
Building packages by R and C++
To make your codes distributable and parallelable, it would be a good choice to build a package for it. The following are steps for writing, building and updating a package.
- Write a package with
Rcpp
a. Write source.cppcodes
b. Create a package skeleton: I prefer the following commandRcppArmadillo::RcppArmadillo.package.skeleton("yourPackageName")One can also use
Rcpp.package.skeleton("yourPackageName", cpp_files = c("convolve.cpp"),example=F)A folder named
yourPackageNamewill be created in the working directory.
c. Copy all.cppand.Rsource code files to./srcfolder directly.
d. Some notes:- The created package includes:
DESCRPTION&manfolder,NAMESPACE,src&Rfolder (which includeRcppExport). We only touch thesrcfolder. -
RcppArmadillochangesC++data type to R data type, automatesDESCRPTIONandNAMESPACE(linkingToetc). If instead one usesRcpp.package.skeleton, one still needs to modifyDepends/ImportsandLinkingTo, along with correctNAMESPACEfile,makevarin description. - Compared to
Rcpp,RcppArmadillowill create additionalMakevarsandMakevars.winfiles in thesrcfolder. No need to modify them. - If the package is built using RStudio pane button, need to add
makevarfile in./src(or directly choose package type “with rcpparmadillo”), change the documentationcppFileName.rdin./man. It is really tedious.
- The created package includes:
-
Build the package: execute the following commands in
Rconsole
a.compileAttributes()(to modify theRcppExports.Rfile)
b.setwd('./yourPackageName')
c.devtools::check()(optional)
d.devtools::build()(create asomeName.tar.gzfile, that is your package and you can upload it to the cluster or send it to others) - Use the package
a. Install: there are many ways to installinstall.packages('someName.tar.gz',repos=NULL,type='source')devtools::install('yourPackageName')-
setwd('./yourPackageName'); devtools::install()
b. Load: in R session, runlibrary('yourPackageName') - Use
ls(package:yourPackageName)to check what functions are loaded from the package
- Update the package
a. Modify codes in./src
b. Repeat Step 2: Build the package and Step 3: Use the package
Parallel computing with R and C++
Recall that C++ functions cannot be paralleled in R unless they are built into an R package. Moreover, once the package is ready, we must load it in the parallel computing function foreach(). For example,
foreach(iRep = 1:nRep, .combine = 'c', .packages = c('magrittr','yourPackageName')) %dopar% { someSimulation(iRep) }
Otherwise there will be an error Package not found.
Troubleshooting
- Use
R3.6.0for parallel computing withC++ -
install error: install.packages from source gives no functions while devtools::install from folder succeeds: In Windows10, usedevtools::install('yourPackageName')instead of from source usingsomeName.tar.gz - According to my experience solely, if the package is to be installed into the cluster in CUHK, use the following command:
R CMD INSTALL 'someName.tar.gz' --no-lock(directly type it in the console, no need to runR) - Note 1:
not exit in scopeorno matching function: Check data types! It solves 95% of the problems. - Note 2:
C++checks typing when you source the file. Make sure data types are properly declared, functions are applied to variables with matching types (functions from different packages may have the same name, but accept different types of arguments), and arithmetic operators are applied todouble(if you perform divison onint, the simulation results could be hugely different) - Note 3: e.g.,
std::max(a,b)is different fromx.max()wherexis a of typearma::mat.
Problems to be solved
- How to use
wrap()
Enjoy Reading This Article?
Here are some more articles you might like to read next: