Need for speed

R is widely seen as being ‘slow’ (see julia web page)

But, if you use a few specific tools, then this becomes irrelevant because of the powerful tools in various packages in R

An aside

Pure R, when the most efficient vectorized code is used, appears to be 1/2x the speed of the most efficient C++.

See Hadley Wickham’s page on Rcpp, scroll down to “Vector input, vector output”… ), noting that if it took 10 minutes to write the C++ code, it would have to be 150,000 times faster to make it worth it.

Need for speed

Spatial simulation means doing the same thing over and over and over … so we need speed

We will show how to profile your code at the end of this section.

“Vectorization”

  • This is at the core of making R fast. If you don’t do this, then it is probably not useful to use R as a simulation engine.
# Instead of 
a <- vector()
for (i in 1:1000) {
  a[i] <- rnorm(1)
}

# use vectorized version, which is built into the functions
a <- rnorm(1000)

Vectors and Matrices

  • These are as fast as you can get in R
  • Fast numerical operations
  • Faster than data.frame
  • Anything that is in pure vectors or matrices is ‘fast enough’
  • It is always a challenge to keep all code in vectors and matrices
  • Thus the following packages…

Spatial simulation

  • To work with spatial simulation (e.g., time and space), it requires more than just spatial data manipulation
  • Sometimes it is just base R stuff
  • Need to learn how to make functions (allows reusability)
  • Need to learn a few key packages that are critical for speed

Key packages for spatial simulation

  • base package – everything matrix or vector is ‘fast’
  • raster - for spatially referenced matrices

    • not always fast enough, sometimes we copy the data into a matrix, then manipulate, then return the data to the raster object
  • sp - equivalent of vector shapefiles in a GIS

    • Polygons, Points, Lines
    • Not always fast, but essential to have
  • see also sf

Key packages for spatial simulation

  • data.table

    • For data.frame type data (i.e., columns of data)
    • Very fast when object gets large, but is actually slower if the data.frame is small (<100,000 rows)
  • SpaDES – many functions; will be moved into a separate package soon

  • Rcpp

    • R interface to C++ . When you need something fast, and you can’t get it fast enough with existing tools/packages, you can create your own (we will not go further into this here)

What we will do here

  • We will go through SpaDES functions quickly, because there are fewer tutorials online for these
  • We will show links to various tutorials for raster, sp, data.table, Rcpp
  • Each person should decide which tool is the most useful to them
  • Put something into practice

SpaDES functions

  • These are all potentially useful for building spatio-temporal models
?`spades-package` # section 2 shows many functions

# e.g.,
?spread
?move
?cir
?adj
?distanceFromEachPoint

Working with spatial data

raster

sp

sf

  • Relatively new
  • Implements latest GIS data standards
  • Very fast, especially reading/writing large data
  • CRAN
  • GitHub

The data.table package

From every data.table user ever:

WOW that’s fast!


install.packages('data.table')

(at least for large tables!)

data.table tutorials

raster and data.table together

  • The current implementation of LANDIS-SpaDES uses a “reduced” data structure throughout

  • Instead of keeping rasters of everything (one can imagine that there is redundancy, i.e., 2 pixels next to each other may be identical)

  • We make one raster of “id” and one data.table with a column called “id”

  • Then we can have as many columns as we want of information about each of these places

  • Like “polygons”, but for rasters, and dynamic… can change over time

  • This may be useful for your own module

raster and data.table together

  • There is a key helper function:
?rasterizeReduced

What does this do?

The Rcpp package

From every Rcpp user ever:

WOW! Just wow.


install.packages('Rcpp')