vignettes/05-Simulation_Experiments_Replication.Rmd
05-Simulation_Experiments_Replication.Rmd
spades
So far, we have run models with the spades
function.
spades()
only has a few arguments:
- debug
- .plotInitialTime
- .saveInitialTime
- progress
- cache
What if we want to do more stuff?
What if we have a stochastic model and need many independent (Monte Carlo) runs?
The experiment
function does this.
Generally, however, this creates problems because of the computational time required …
The replicates
argument
It is as easy as replicates = 2
to do two replicates:
?experiment # for details
# See Example 5
# ... run mySim from top of help file
sims <- experiment(mySim, replicates = 2)
attr(sims, "experiment")$expDesign # shows 2 reps of same expt
What does it return?
What is the attr
?
Replication is necessary but it multiplies the time it takes to run any model.
Fortunately, replication can use parallel processing:
cl
) argument.SpaDES
functions that are parallel aware:
experiment
, POM
, splitRaster
In each case, there are two ways:
## option 1:
raster::beginCluster() # make a cluster silently in background
## option 2:
cl <- parallel::makeCluster() # need to explicitly pass the
# cluster object into functions
# using cl argument
Try it!
experiment
: Saving filesOften, a spades
call will generate output files. Where do they go when using experiment
?
experiment
Exercise:
Pick a few examples and try to understand them. (NOTE: they get more complicated as you get towards the bottom.)
Clearly, using experiment
can take a lot of resources, but there are a few options:
Use your own computer, a local cluster, a server, cloud and more;
Access Compute Canada for free (though sometimes the queues are long):
There is a different example here: ssh among Linux machines
Ecological models use equations and functions
These have parameters
Parameters can be estimated from:
These methods often have predict methods associated with them
So, instead of using “fixed” parameters in simulation models
We can use these methods which give more accurate prediction:
# predict with new data
predVals <- predict(glmObj, newdata = data.frame(
diamCM = rlnorm(nTrees, meanlog = 2, sdlog = 0.7)
))
In R, everything is an object
The glmObj
and predictVals
are objects
These have many features that can be used in simulations
For the first 25 years of simulation modeling in ecology, parameters were copied into simulation models and hard coded
This made it difficult to update the simulation model if there are new data and the regression needs to be re-fit
Using SpaDES
we can now fit the data and use the parameters all inside the simulation
DEoptim
seems to be the best optimization package for heuristic optimizationPOM
function in SpaDES
uses this packagePOM
- example 3Often we want to run a simulation with a variety of parameter values as inputs. This is sometimes called a simulation experiment. The experiment
function gives us an easy interface to this.
We will use the establishment
module that we created in exercise 4a.
library(SpaDES)
module.path <- file.path(dirname(tempdir()), "modules")
downloadModule("establishment", path = module.path)
setPaths(modulePath = module.path)
library(ggplot2)
modules <- list("establishment")
mySim <- simInit(modules = modules)
# make sure the plotting device is clear
clearPlot()
spades(mySim)
?experiment
, use experiment
to run a spades
call for each parameter set (note that experiment
is just a wrapper around spades
).attributes(mySim)
.sum(mySim$establish[])
, i.e., the number of pixels that had an establishment event.