Need for speed

R is widely seen as being ‘slow’ (see julia web page)

But, if you use a few specific tools, then this becomes irrelevant because of the powerful tools in various packages in R.

An aside

Pure R, when the most efficient vectorized code is used, appears to be half the speed of the most efficient C++.

See Hadley Wickham’s page on Rcpp, scroll down to “Vector input, vector output”… ), noting that if it took 10 minutes to write the C++ code, it would have to be 150,000 times faster to make it worth it.

Need for speed

Spatial simulation means doing the same thing over and over and over … so we need speed

We will show how to profile your code at the end of this section.

“Vectorization”

This is at the core of making R fast. If you don’t do this, then it is probably not useful to use R as a simulation engine.

# Instead of 
a <- vector()
for (i in 1:1000) {
  a[i] <- rnorm(1)
}

# use vectorized version, which is built into the functions
a <- rnorm(1000)

Vectors and Matrices

  • These are as fast as you can get in R
  • Fast numerical operations
  • Faster than data.frame
  • Anything that is in pure vectors or matrices is ‘fast enough’
  • It is always a challenge to keep all code in vectors and matrices
  • Thus the following packages…

Spatial simulation

  • To work with spatial simulation (e.g., time and space), it requires more than just spatial data manipulation
  • Sometimes it is just base R stuff
  • Need to learn how to make functions (allows reusability)
  • Need to learn a few key packages that are critical for speed

Key packages for spatial simulation

  • base package – everything matrix or vector is ‘fast’

  • raster - for spatially referenced matrices

    • not always fast enough, sometimes we copy the data into a matrix, then manipulate, then return the data to the raster object
  • sp - equivalent of vector shapefiles in a GIS

    • Polygons, Points, Lines
    • Not always fast, but essential to have
  • see also sf

Key packages for spatial simulation

  • data.table

    • For data.frame type data (i.e., columns of data)
    • Very fast when object gets large, but is actually slower if the data.frame is small (<100,000 rows)
  • SpaDES – many functions; will be moved into a separate package soon

  • Rcpp

    • R interface to C++ . When you need something fast, and you can’t get it fast enough with existing tools/packages, you can create your own (we will not go further into this here)

What we will do here

  • We will go through SpaDES functions quickly, because there are fewer tutorials online for these
  • We will show links to various tutorials for raster, sp, data.table, Rcpp
  • Each person should decide which tool is the most useful to them
  • Put something into practice

SpaDES functions

  • These are all potentially useful for building spatio-temporal models
?`spades-package` # section 2 shows many functions

# e.g.,
?spread
?move
?cir
?adj
?distanceFromEachPoint

Working with spatial data

raster

sp

sf

  • Relatively new
  • Implements latest GIS data standards
  • Very fast, especially reading/writing large data
  • CRAN
  • GitHub

The data.table package

From every data.table user ever:

WOW that’s fast!


install.packages('data.table')

(at least for large tables!)

data.table tutorials

raster and data.table together

  • The current implementation of LandR (based on LANDIS-II) uses a “reduced” data structure throughout

  • Instead of keeping rasters of everything (one can imagine that there is redundancy, i.e., 2 pixels next to each other may be identical)

  • We make one raster of id and one data.table with a column called id

  • Then we can have as many columns as we want of information about each of these places

  • Like “polygons”, but for rasters, and dynamic… can change over time

  • This may be useful for your own module

raster and data.table together

  • There is a key helper function:
?rasterizeReduced

What does this do?

The Rcpp package

From every Rcpp user ever:

WOW! Just wow.