show code
library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
show code
library(foreach)
library(parallel)
Temi
September 3, 2023
When running computationally-heavy tasks in R, it can be useful to parallelize your codes/runs. In that vein, this is a really good blogpost to read to understand when and how to parallelize. And here is another one.
R offers many ways to do this. Usually, I prefer using some libraries.
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Assuming we want to apply a function over the rows of a matrix. This function will take each row, divide the numbers in that row by the index of that row in the matrix and return a newly-created matrix of the same shape.
[1] 300 500
lapply
First, I will time this function with lapply
loops. lapply
is shipped with base R and is a parallel form of a regular for loop.
mclapply
Next, let’s take advantage of the cores, this time using mclapply
user system elapsed
0.043 0.084 0.032
We see that lapply is faster. This is because there is an overhead to distributing these runs and collecting their results when using mclapply
foreach
& %do%
Here, I will use the foreach
but without a parallel back-end - this is akin to a sequential run, just like lapply or a regular for loop
foreach
& %dopar%
Here, I will use the foreach
but with a parallel back-end. The parallel back-end is a cluster of cores If you are familiar with multiprocessing in python, it is equivalent to multiprocessing.Pool
First we need to register a parallel
back-end
We can query how many cores we have on this computer
I will register 7 cores
Registering 7 clusters for a parallel run
user system elapsed
0.085 0.070 0.081
Here we see that lapply is much faster. Of course that is because of all the overheads and all that.
Next, I will show to use the various makeCluster()
options.