Building an `R` Package and Integrate `C++`
Welcome to my blog on the following topics: speeding up R
codes using C++
, building packages using R
and C++
, parallel computing with R
and C++
. These tools can help with simulations in daily research. There’re also some of my troubleshootings, which might help with future debug.
Speed up R
codes using C++
It is well-known that R
is vectorized and slow in executing loops. To speed up the execution, C++
can serve as a remedy. The R
package Rcpp
by Dirk Eddelbuettel et. al. provides nice integration of R
and C++
.
- Installation of Rcpp
- (On windows) Install Rtools in a folder whose name doesn’t contain spaces or tabs
- Install package Rcpp
- Some Basic Syntax
- Not reusable functions: Directly write
C++
functions in RStudio console usingcppFunction()
. E.g.cppFunction("double foo(double x){return x+1.0;}",depends="RcppArmadillo")
- Reusable functions: Prepare a
C++
filesomeFile.cpp
. Write functions in it. Directly source the filesomeFile.cpp
in RStudio. - In
C++
files, insert// [[Rcpp::export]]
before the declaration of functions that you want to pass toR
. Otherwise you cannot call it inR
. - Functions written in
someFile.cpp
is only usable in the currentR
session and cannot be saved. If a newR
session is started, we need to sourcesomeFile.cpp
again - Reference: Blog, Gallery, 中文参考
- Not reusable functions: Directly write
- Another useful package:
RcppArmadillo
. It provides someR
-like functions including sampling functions. Reference: click here - One more useful package
RcppEigen
: click here. - Yes another helpful package
RcppNumerical
: see here - Some special topics:
- Passing a
C++
function as an argument into anotherC++
function in Rcpp:- To do the “passing” in R console: The data type of the callee is
Rcpp::Function
orSEXP
. And pass the output of this callee toas<double>()
before passing it to any local variable. But you can only do the “passing” in R console rather than in theC++
file. Possible to export the caller. - To do the “passing” in
C++
file by otherC++
functions: the data type of this callee is a pointer. Need to declare a new type for the argument. Cannot export the caller. - To do the “passing” in both: Seems not easy. By adding
//[[Rcpp::export]]
before the caller, the type is automaticallySEXP
. ad hoc Remedy: use a “wrapper” to perform the call in cpp and export the wrapper to R.
- To do the “passing” in R console: The data type of the callee is
- Passing a
Speeding up R
codes: other methods
Sometimes a simple improvement suffices.
- Parallel computing: with the help of the cluster system in CU, we can use around 30 cores for one task.
- Parallel computing &
C++
: Notice that,C++
functions cannot be paralleled inR
unless they are built into anR
package. (Please refer to the next section) - Some tricks:
-C++
: Use pointers properly
-C++
: Reduce the number of local variables (declaration and copying are slow inC++
)
-C++
: Use pipe operator. For details, IDK…
-R
: Use the Forward-Pipe Operator%>%
from packagemagrittr
.
-R
: Vectorize, useapply
, etc.
Building packages purely by R
To build an R
package, just prepare all source codes and use the pane in RStudio. Reference: click here
Building packages by R
and C++
To make your codes distributable and parallelable, it would be a good choice to build a package for it. The following are steps for writing, building and updating a package.
- Write a package with
Rcpp
a. Write source.cpp
codes
b. Create a package skeleton: I prefer the following commandRcppArmadillo::RcppArmadillo.package.skeleton("yourPackageName")
One can also use
Rcpp.package.skeleton("yourPackageName", cpp_files = c("convolve.cpp"),example=F)
A folder named
yourPackageName
will be created in the working directory.
c. Copy all.cpp
and.R
source code files to./src
folder directly.
d. Some notes:- The created package includes:
DESCRPTION
&man
folder,NAMESPACE
,src
&R
folder (which includeRcppExport
). We only touch thesrc
folder. -
RcppArmadillo
changesC++
data type to R data type, automatesDESCRPTION
andNAMESPACE
(linkingTo
etc). If instead one usesRcpp.package.skeleton
, one still needs to modifyDepends/Imports
andLinkingTo
, along with correctNAMESPACE
file,makevar
in description. - Compared to
Rcpp
,RcppArmadillo
will create additionalMakevars
andMakevars.win
files in thesrc
folder. No need to modify them. - If the package is built using RStudio pane button, need to add
makevar
file in./src
(or directly choose package type “with rcpparmadillo”), change the documentationcppFileName.rd
in./man
. It is really tedious.
- The created package includes:
-
Build the package: execute the following commands in
R
console
a.compileAttributes()
(to modify theRcppExports.R
file)
b.setwd('./yourPackageName')
c.devtools::check()
(optional)
d.devtools::build()
(create asomeName.tar.gz
file, that is your package and you can upload it to the cluster or send it to others) - Use the package
a. Install: there are many ways to installinstall.packages('someName.tar.gz',repos=NULL,type='source')
devtools::install('yourPackageName')
-
setwd('./yourPackageName'); devtools::install()
b. Load: in R session, runlibrary('yourPackageName')
- Use
ls(package:yourPackageName)
to check what functions are loaded from the package
- Update the package
a. Modify codes in./src
b. Repeat Step 2: Build the package and Step 3: Use the package
Parallel computing with R
and C++
Recall that C++
functions cannot be paralleled in R
unless they are built into an R
package. Moreover, once the package is ready, we must load it in the parallel computing function foreach()
. For example,
foreach(iRep = 1:nRep, .combine = 'c', .packages = c('magrittr','yourPackageName')) %dopar% { someSimulation(iRep) }
Otherwise there will be an error Package not found
.
Troubleshooting
- Use
R3.6.0
for parallel computing withC++
-
install error: install.packages from source gives no functions while devtools::install from folder succeeds
: In Windows10, usedevtools::install('yourPackageName')
instead of from source usingsomeName.tar.gz
- According to my experience solely, if the package is to be installed into the cluster in CUHK, use the following command:
R CMD INSTALL 'someName.tar.gz' --no-lock
(directly type it in the console, no need to runR
) - Note 1:
not exit in scope
orno matching function
: Check data types! It solves 95% of the problems. - Note 2:
C++
checks typing when you source the file. Make sure data types are properly declared, functions are applied to variables with matching types (functions from different packages may have the same name, but accept different types of arguments), and arithmetic operators are applied todouble
(if you perform divison onint
, the simulation results could be hugely different) - Note 3: e.g.,
std::max(a,b)
is different fromx.max()
wherex
is a of typearma::mat
.
Problems to be solved
- How to use
wrap()
Enjoy Reading This Article?
Here are some more articles you might like to read next: