vignettes/09c-CachingAndDebugging.Rmd
09c-CachingAndDebugging.Rmd
Caching is the ability to save some sort of output from an operation, and then retrieve these outputs when the operation is repeated in the same way - meaning the inputs of this operation and the actual tasks it performs are unchanged.
Caching becomes fundamental when we can expect to re-run operations several times, particularly if they they a while to compute each time. Some examples of these operations are: - downloading data - (spatial) data processing/munging - fitting statistical models to large datasets, or that are complex in nature - running simulations with no stochasticity
SpaDES
(via the reproducible
package) offers a number of functions that make caching these operations a lot easier for non-programmers. Two fundamental ones are Cache
and prepInputs
.
library(reproducible)
library(sp)
library(raster)
## from ?prepInputs
dPath <- file.path("modules/Biomass_borealDataPrep/data")
url <- file.path("ftp://ftp.ccrs.nrcan.gc.ca/ad/NLCCLandCover",
"LandcoverCanada2005_250m/LandCoverOfCanada2005_V1_4.zip")
landCover <- prepInputs(url = url,
destinationPath = asPath(dPath))
Now do it again. Notice any difference?
landCover <- prepInputs(url = url,
destinationPath = asPath(dPath))
Now try wrapping the previous operation in Cache
call, and run it twice. Notice differences in speed.
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath))
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath))
The previous code is great but we don’t have as much control as we’d like on where Cache
is storing cached objects. To do that, we can explicitly provide a cache folder and add tags to the object so that we can find it more easily if we ever need to “clean it”.
cPath <- file.path(tempdir(), "cache")
## run this twice
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath),
cacheRepo = cPath,
userTags = "landCover")
showCache(x = cPath, userTags = "landCover")
reproducible::clearCache(x = cPath, userTags = "landCover")
## notice how Cache needs to re-do things
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath),
userTags = "landCover")
We can also force Cache
to redo operations and re-cache, or simply to ignore caching altogether. See more options for Cache(useCache)
in ?Cache
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath),
cacheRepo = cPath,
userTags = "landCover",
useCache = "overwrite")
options("reproducible.useCache" = FALSE)
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath),
cacheRepo = cPath,
userTags = "landCover",
useCache = TRUE)
Try to provide a study area now. Hints:
?reproducible::prepInputs
and ?reproducible::postProcess
?SpaDES.tools::randomStudyArea
Cache(prepInputs(...))
with the new study area(s)?
library(SpaDES.tools)
StudyArea <- randomStudyArea(size = 100000^2)
## cheating to visualise beforehand
if (identical(crs(StudyArea), crs(landCover)))
StudyArea <- spTransform(StudyArea, crs = crs(landCover))
plot(landCover)
plot(StudyArea, add = TRUE, col = "red")
options(reproducible.useCache = TRUE)
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath),
useSAcrs = TRUE,
overwrite = TRUE,
studyArea = StudyArea,
cacheRepo = cPath,
userTags = "landCover")
plot(landCover)
What if my study area is a raster? Assuming you don’t have raster at hand, try using a raster from the SpaDESInAction
example (“inputs/rasterToMatch.rds”).
## assuming you're in the SpaDESInAction2 project folder
templateRaster <- readRDS(file.path(getwd(), "inputs/rasterToMatch.rds"))
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath),
# useSAcrs = TRUE, ## we don't need this anymore
rasterToMatch = templateRaster, ## a reproducible::postProcess argument
maskWithRTM = TRUE, ## a reproducible::maskInputs argument
overwrite = TRUE,
cacheRepo = cPath,
userTags = "landCover",
useCache = TRUE)
plot(landCover)
What if I have both? In some cases your study area may be defined by a polygon, but you may have a raster that will dictate the e.g. projection and resolution of the output (remember, polygons have no resolution).
landCover <- Cache(prepInputs,
url = url,
destinationPath = asPath(dPath),
studyArea = StudyArea,
useSAcrs = FALSE, ## use the template raster projection
rasterToMatch = templateRaster,
maskWithRTM = FALSE, ## mask using the study area
overwrite = TRUE,
cacheRepo = cPath,
userTags = "landCover",
useCache = TRUE)
Now imagine someone told you there is a more up to date Land Cover map for Canada. And they told you where to look to get to the .zip
file - the Canadian Gov. open data portal
url <- "http://ftp.maps.canada.ca/pub/nrcan_rncan/Land-cover_Couverture-du-sol/canada-landcover_canada-couverture-du-sol/CanadaLandcover2010.zip"
landCover <- Cache(prepInputs,
targetFile = "CAN_LC_2010_CAL.tif",
url = url,
destinationPath = asPath(dPath),
studyArea = StudyArea,
overwrite = TRUE,
cacheRepo = cPath,
userTags = "landCover",
useCache = TRUE)
plot(landCover)
Learn more about caching
Now that you have a flavour of caching, we’re going to explore debugging a bit and put our new “caching skills” in practice in a SpaDES modelling context.
We’re going to run the caribouRSF
module by itself.
## Restart your R session if not running from a "clean" environment
library("reproducible")
library(SpaDES)
library(LandR)
library(raster)
library(data.table)
options(
"spades.recoveryMode" = 2,
"spades.lowMemory" = TRUE,
"LandR.assertions" = FALSE,
"LandR.verbose" = 1,
"reproducible.useMemoise" = TRUE, # Brings cached stuff to memory during the second run
"reproducible.useNewDigestAlgorithm" = TRUE, # use the new less strict hashing algo
"reproducible.useCache" = TRUE,
"pemisc.useParallel" = FALSE
)
## assuming you're in the SpaDESInAction2 project folder
inputDirectory <- checkPath(file.path(getwd(), "inputs"), create = TRUE)
outputDirectory <- checkPath(file.path(getwd(), "outputs"), create = TRUE)
modulesDirectory <- checkPath(file.path(getwd(), "modules"), create = TRUE)
cacheDirectory <- checkPath(file.path(getwd(), "cache"), create = TRUE)
setPaths(cachePath = cacheDirectory,
modulePath = c(modulesDirectory,
file.path(modulesDirectory, "scfm/modules")),
inputPath = inputDirectory,
outputPath = outputDirectory)
times <- list(start = 0, end = 10)
successionTimestep <- 1L
parameters <- list(
caribouRSF = list(
"decidousSp" = c("Betu_Pap", "Popu_Tre", "Popu_Bal"),
"predictionInterval" = 20
)
)
# load studyArea
studyArea <- readRDS(file.path(getPaths()$inputPath, "studyArea.rds"))
objects <- list(
"studyArea" = studyArea
)
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = 1)
Oops, something doesn’t seem to be right! We start by looking carefully at the printed output, then we use traceback
to help us locate the problem. In this case, it seems to be a particular line of caribouRSF.R
traceback()
# 11: stop("This module does not work without data. Please provide the necessary layers") at caribouRSF.R#154
# 10: get(moduleCall, envir = fnEnv)(sim, cur[["eventTime"]], cur[["eventType"]])
# 9: eval(fnCallAsExpr)
# 8: eval(fnCallAsExpr)
# (...)
file.edit("modules/caribouRSF/caribouRSF.R") ## got to caribouRSF.R#154
browser()
before the line with the stop()
. Save and re-rerun. ORdebug
option in simInitAndSpaDES
or spades
to go in into “browser mode”?browser
, while you’re at it ;)P(sim)$.useDummyData
? Where does its value come from?
# Browse[1]> P(sim)$.useDummyData
# [1] TRUE
# Browse[1]> mod$pixelGroupMap
# NULL
# Browse[1]> mod$cohortData
# NULL
You can also try to enter the “browser mode” in specific events or all events of a module via the simInitAndSpades
or spades
functions.
## browse at the init event(s) - if you had more than one module it would stop at each
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = "init")
## browse at the event that triggered the error
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = "lookingForCaribou")
## "browse" at each event of the caribouRSF module
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = "caribouRSF")
We are going to supply these objects - note that the dynamic part will not be simulated.
.inputObjects
function and the metadata for inputs.Can you see a pattern in how prepInputs
gets data from online sources? Try to do the same for pixelGroupMap
and cohortData
How are sources for objects given? Try adding sources the following sources:
for pixelGroupMap
: “https://drive.google.com/open?id=1IUEuH55su8X7JCWt8LXy_hTAQz0cfCmU”
for cohortData
: “https://drive.google.com/open?id=1R_wGGvzUI0gGZ5NOs2KmT2KrmXaTm4NS”
## add sourceURL to pixelGroupMap and cohortData
expectsInput(objectName = "pixelGroupMap", objectClass = "RasterLayer",
desc = paste0("Map of groups of pixels that share the same info from cohortData (sp, age, biomass, etc).",
"Here is mainly used to determine old and recent burns based on tree age,",
" and if deciduous by species"),
sourceURL = "https://drive.google.com/open?id=1IUEuH55su8X7JCWt8LXy_hTAQz0cfCmU")
expectsInput(objectName = "cohortData", objectClass = "data.table",
desc = paste0("data.table with information by pixel group of sp, age, biomass, etc"),
sourceURL = "https://drive.google.com/open?id=1R_wGGvzUI0gGZ5NOs2KmT2KrmXaTm4NS")
## add defaults for these objects in .inputObjects, so that the module can get them if they are not supplied
if (!suppliedElsewhere("pixelGroupMap", sim = sim, where = "sim")) {
sim$pixelGroupMap <- Cache(prepInputs, targetFile = "pixelGroupMapCaribouEg.rds",
fun = "readRDS",
url = extractURL("pixelGroupMap"), studyArea = sim$studyArea,
destinationPath = dataPath(sim), filename2 = NULL,
rasterToMatch = sim$rasterToMatch)
}
if (!suppliedElsewhere("cohortData", sim = sim, where = "sim")) {
sim$cohortData <- Cache(prepInputs, targetFile = "cohortDataCaribouEg.rds",
fun = "readRDS",
url = extractURL("cohortData"),
destinationPath = dataPath(sim))
}
You can now use restartSpades
or simply re-run simInitAndSpades
caribou <- restartSpaDES()
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = 1)
There seem to be no more issues. Maybe we don’t need to print all those debug messages anymore - less verbose - next time we run.
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = FALSE)
Or maybe, we don’t want to see them printed, but want to keep a log of all messages to check later.
?spades
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = list(console = list(level = 40),
file = list(append = FALSE,
file = "logCaribou.txt",
level = 0),
debug = TRUE)
)
## notivce how different the file is from what was printed on the console.
file.edit("logCaribou.txt")
What about all that purple text? That’s the module code checking. It’s helpful, but not always accurate - meant to be informative rather than enforced.
options("spades.moduleCodeChecks" = FALSE)
caribou <- simInitAndSpades(times = times,
objects = objects,
params = parameters,
modules = as.list("caribouRSF"),
paths = getPaths(),
debug = list(console = list(level = 40),
file = list(append = FALSE,
file = "logCaribou.txt",
level = 0),
debug = TRUE)
)
## notice how different the file is from what was printed on the console.
file.edit("logCaribou.txt")
Learn about debugging in SpaDES and with RStudio