Some basic Caching lessons

Caching is the ability to save some sort of output from an operation, and then retrieve these outputs when the operation is repeated in the same way - meaning the inputs of this operation and the actual tasks it performs are unchanged.

Caching becomes fundamental when we can expect to re-run operations several times, particularly if they they a while to compute each time. Some examples of these operations are: - downloading data - (spatial) data processing/munging - fitting statistical models to large datasets, or that are complex in nature - running simulations with no stochasticity

SpaDES (via the reproducible package) offers a number of functions that make caching these operations a lot easier for non-programmers. Two fundamental ones are Cache and prepInputs.

Now do it again. Notice any difference?

Now try wrapping the previous operation in Cache call, and run it twice. Notice differences in speed.

The previous code is great but we don’t have as much control as we’d like on where Cache is storing cached objects. To do that, we can explicitly provide a cache folder and add tags to the object so that we can find it more easily if we ever need to “clean it”.

We can also force Cache to redo operations and re-cache, or simply to ignore caching altogether. See more options for Cache(useCache) in ?Cache

Try to provide a study area now. Hints:

What if my study area is a raster? Assuming you don’t have raster at hand, try using a raster from the SpaDESInAction example (“inputs/rasterToMatch.rds”).

What if I have both? In some cases your study area may be defined by a polygon, but you may have a raster that will dictate the e.g. projection and resolution of the output (remember, polygons have no resolution).

Now imagine someone told you there is a more up to date Land Cover map for Canada. And they told you where to look to get to the .zip file - the Canadian Gov. open data portal

  • Try right-click on “Access” for the TIF file and replace the “old” URL
  • Had an error, the messages are helpful!

Learn more about caching

Some basic Debugging lessons

Now that you have a flavour of caching, we’re going to explore debugging a bit and put our new “caching skills” in practice in a SpaDES modelling context.

We’re going to run the caribouRSF module by itself.

Oops, something doesn’t seem to be right! We start by looking carefully at the printed output, then we use traceback to help us locate the problem. In this case, it seems to be a particular line of caribouRSF.R

  1. Insert a browser() before the line with the stop(). Save and re-rerun. OR
  2. Use the debug option in simInitAndSpaDES or spades to go in into “browser mode”
  • Check ?browser, while you’re at it ;)
  1. What is P(sim)$.useDummyData? Where does its value come from?
  2. Which data objects are missing? Why?

We are going to supply these objects - note that the dynamic part will not be simulated.

  1. Check the .inputObjects function and the metadata for inputs.

You can now use restartSpades or simply re-run simInitAndSpades

There seem to be no more issues. Maybe we don’t need to print all those debug messages anymore - less verbose - next time we run.

Or maybe, we don’t want to see them printed, but want to keep a log of all messages to check later.

  • check the “debug” section of ?spades

What about all that purple text? That’s the module code checking. It’s helpful, but not always accurate - meant to be informative rather than enforced.

Learn about debugging in SpaDES and with RStudio