sdmTMB (sdmtmb/sdmtmb)

sdmTMB

https://github.com/sdmtmb/sdmtmb
Admin
sdmTMB is an R package for fitting spatial and spatiotemporal generalized linear mixed models using...

Tokens:14,685
Snippets:111
Trust Score:5.8
Update:1 week ago
Show doc for...
Context Summary (auto-generated)
Raw
# sdmTMB

sdmTMB is an R package for fitting spatial and spatiotemporal generalized linear mixed effects models (GLMMs) using Template Model Builder (TMB), fmesher, and the SPDE (Stochastic Partial Differential Equation) approach to approximate Gaussian random fields. The package provides a fast, flexible, and user-friendly interface similar to glmmTMB but focused on geostatistical spatial and spatiotemporal models. Common applications include species distribution models (SDMs), index standardization for fisheries surveys, and analyzing any coordinate-referenced observations from underlying spatial processes.

sdmTMB extends standard GLMMs with spatial random fields that represent consistent spatial patterns, spatiotemporal random fields that can be independent (IID), autoregressive (AR1), or random walks, smooth terms using mgcv notation, time-varying coefficients, spatially varying coefficients, and a wide range of families including Tweedie, negative binomial, lognormal, Student-t, delta/hurdle models, and more. Estimation is via maximum marginal likelihood with random effects integrated out using the Laplace approximation, and models can also be passed to Stan for Bayesian estimation.

## make_mesh - Create SPDE Mesh for Spatial Correlation

Creates a triangulated mesh that approximates spatial correlation using the SPDE approach. The mesh defines knot locations where spatial random effects are estimated, with bilinear interpolation to other locations. Smaller cutoff values create finer meshes (more detail, slower fitting) while larger values create coarser meshes (faster, less detail).

```r
library(sdmTMB)

# Create mesh using cutoff (minimum triangle edge length in coordinate units)
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
plot(mesh)
mesh$mesh$n # number of vertices/knots

# Create mesh using k-means clustering for specific number of knots
mesh_kmeans <- make_mesh(pcod, c("X", "Y"), n_knots = 100, type = "kmeans")

# Create mesh with fmesher for more control
mesh_custom <- make_mesh(
  pcod, c("X", "Y"),
  fmesher_func = fmesher::fm_mesh_2d_inla,
  cutoff = 8,
  max.edge = c(20, 40),  # inner and outer max triangle lengths
  offset = c(5, 40)      # inner and outer border widths
)

# Use existing fmesher mesh
inla_mesh <- fmesher::fm_mesh_2d_inla(
  loc = cbind(pcod$X, pcod$Y),
  max.edge = c(25, 50),
  offset = c(5, 25),
  cutoff = 5
)
mesh_from_inla <- make_mesh(pcod, c("X", "Y"), mesh = inla_mesh)
```

## sdmTMB - Fit Spatial/Spatiotemporal GLMMs

The main model fitting function that fits spatial and/or spatiotemporal GLMMs. Supports a wide range of families, random fields, smoothers, time-varying effects, and spatially varying coefficients. Returns an object containing parameter estimates, random effects, and model diagnostics.

```r
library(sdmTMB)
library(dplyr)

# Basic spatial model (no time)
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit_spatial <- sdmTMB(
  density ~ depth_scaled + depth_scaled2,
  data = pcod,
  mesh = mesh,
  family = tweedie(link = "log"),
  spatial = "on"
)
print(fit_spatial)

# Spatiotemporal model with IID random fields
fit_st <- sdmTMB(
  density ~ s(depth, k = 5),
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log"),
  spatial = "on",
  spatiotemporal = "iid"  # or "ar1" or "rw"
)

# Delta (hurdle) model for zero-inflated data
fit_delta <- sdmTMB(
  density ~ s(depth),
  data = pcod,
  mesh = mesh,
  time = "year",
  family = delta_gamma()  # binomial + Gamma components
)

# Model with time-varying depth effect (AR1 process)
fit_tv <- sdmTMB(
  density ~ 1 + depth_scaled + depth_scaled2,
  time_varying = ~ 1 + depth_scaled + depth_scaled2,
  time_varying_type = "ar1",
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log"),
  spatial = "on",
  spatiotemporal = "off"
)

# Model with spatially varying coefficient (local trends)
pcod$year_scaled <- as.numeric(scale(pcod$year))
fit_svc <- sdmTMB(
  density ~ s(depth, k = 5) + year_scaled,
  spatial_varying = ~ year_scaled,
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log"),
  spatiotemporal = "off"
)

# Model with random intercepts
pcod$fyear <- as.factor(pcod$year)
fit_re <- sdmTMB(
  density ~ s(depth, k = 5) + (1 | fyear),
  data = pcod,
  mesh = mesh,
  family = tweedie(link = "log")
)

# Model with priors (penalized complexity)
fit_priors <- sdmTMB(
  density ~ depth_scaled,
  data = pcod,
  mesh = mesh,
  family = tweedie(),
  priors = sdmTMBpriors(
    matern_s = pc_matern(range_gt = 10, sigma_lt = 5),
    b = normal(c(0, 0), c(1, 10)),
    phi = halfnormal(0, 15)
  )
)
```

## predict.sdmTMB - Generate Predictions from Fitted Models

Generates predictions from fitted sdmTMB models on original or new data. Returns estimates in link space by default, with options for standard errors, simulation-based uncertainty, and separating fixed vs random effect contributions.

```r
library(sdmTMB)
library(ggplot2)

# Fit model
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit <- sdmTMB(
  density ~ s(depth, k = 5),
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log"),
  spatial = "on",
  spatiotemporal = "iid"
)

# Predictions at original data locations
pred_original <- predict(fit)
head(pred_original)
# Columns: est (total), est_non_rf (fixed effects), omega_s (spatial), epsilon_st (spatiotemporal)

# Predictions on new grid data
grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year))
pred_grid <- predict(fit, newdata = grid_yrs)

# Plot predictions (exponentiate from log link to response scale)
ggplot(pred_grid, aes(X, Y, fill = exp(est))) +
  geom_raster() +
  facet_wrap(~year) +
  scale_fill_viridis_c(trans = "sqrt") +
  coord_fixed()

# Plot spatial random effects
ggplot(pred_grid, aes(X, Y, fill = omega_s)) +
  geom_raster() +
  scale_fill_gradient2() +
  coord_fixed()

# Predictions with standard errors (population-level, excluding random fields)
nd <- data.frame(
  depth = seq(min(pcod$depth), max(pcod$depth), length.out = 100),
  year = 2015L
)
pred_se <- predict(fit, newdata = nd, se_fit = TRUE, re_form = NA)
ggplot(pred_se, aes(depth, exp(est),
  ymin = exp(est - 1.96 * est_se),
  ymax = exp(est + 1.96 * est_se))) +
  geom_line() +
  geom_ribbon(alpha = 0.4)

# Simulation-based uncertainty (faster than se_fit for spatial predictions)
pred_sims <- predict(fit, newdata = grid_yrs, nsim = 100)
grid_yrs$cv <- apply(exp(pred_sims), 1, function(x) sd(x) / mean(x))

# Return TMB object for index calculations
pred_tmb <- predict(fit, newdata = grid_yrs, return_tmb_object = TRUE)
```

## get_index - Calculate Standardized Abundance Index

Calculates area-weighted standardized population index (time series of total abundance/biomass) from model predictions. Uses bias correction by default to account for non-linear transformation of random effects. Essential for fisheries stock assessment applications.

```r
library(sdmTMB)
library(ggplot2)

# Fit model for index standardization
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit <- sdmTMB(
  density ~ 0 + as.factor(year),  # year factors for index
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log")
)

# Predict on grid with return_tmb_object = TRUE
grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year))
predictions <- predict(fit, newdata = grid_yrs, return_tmb_object = TRUE)

# Calculate biomass index (area = 4 for 2km x 2km grid cells)
index <- get_index(predictions, area = 4, bias_correct = TRUE)
print(index)

# Plot index time series
ggplot(index, aes(year, est)) +
  geom_line() +
  geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.4) +
  xlab("Year") + ylab("Biomass estimate (kg)") +
  ylim(0, NA)

# Index for spatial subset (e.g., southern region)
grid_south <- grid_yrs[grid_yrs$Y < 5700, ]
pred_south <- predict(fit, newdata = grid_south, return_tmb_object = TRUE)
index_south <- get_index(pred_south, area = 4, bias_correct = TRUE)

# Memory-efficient index calculation for large grids
index_split <- get_index_split(fit, newdata = grid_yrs, nsplit = 4,
                                area = 4, bias_correct = TRUE)
```

## get_cog - Calculate Center of Gravity

Computes the center of gravity (abundance-weighted mean location) over time, useful for detecting distributional shifts in species populations. Returns coordinates with uncertainty estimates.

```r
library(sdmTMB)
library(ggplot2)

# Fit spatiotemporal model
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit <- sdmTMB(
  density ~ 0 + as.factor(year),
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log")
)

# Get predictions
grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year))
predictions <- predict(fit, newdata = grid_yrs, return_tmb_object = TRUE)

# Calculate center of gravity (wide format for plotting)
cog <- get_cog(predictions, format = "wide", bias_correct = TRUE)
print(cog)

# Plot center of gravity shifts over time
ggplot(cog, aes(est_x, est_y, colour = year)) +
  geom_point(size = 3) +
  geom_linerange(aes(xmin = lwr_x, xmax = upr_x)) +
  geom_linerange(aes(ymin = lwr_y, ymax = upr_y)) +
  scale_colour_viridis_c() +
  labs(x = "Eastings (km)", y = "Northings (km)")

# Long format for separate x/y analysis
cog_long <- get_cog(predictions, format = "long", bias_correct = TRUE)
```

## tidy.sdmTMB - Extract Model Parameters as Data Frame

Extracts model parameters, random effects, and variance components into tidy data frames following broom/broom.mixed conventions. Essential for summarizing and comparing models.

```r
library(sdmTMB)

# Fit model
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit <- sdmTMB(
  density ~ poly(depth_scaled, 2),
  data = pcod,
  mesh = mesh,
  family = tweedie()
)

# Extract fixed effects with confidence intervals
tidy(fit, conf.int = TRUE)
#>   term                    estimate  std.error  conf.low  conf.high
#>   (Intercept)             2.37      0.21       1.95      2.79
#>   poly(depth_scaled, 2)1  -31.7     3.03       -37.6     -25.8

# Extract random effect parameters (range, sigma, phi, etc.)
tidy(fit, effects = "ran_pars", conf.int = TRUE)
#>   term        estimate  std.error  conf.low  conf.high
#>   range       16.4      NA         9.60      28.0
#>   phi         12.7      NA         11.9      13.5
#>   sigma_O     1.86      NA         1.48      2.34
#>   tweedie_p   1.58      NA         1.56      1.60

# For models with random intercepts, extract random effect values
pcod$fyear <- as.factor(pcod$year)
fit_re <- sdmTMB(
  density ~ s(depth, k = 5) + (1 | fyear),
  data = pcod,
  mesh = mesh,
  family = tweedie()
)
tidy(fit_re, effects = "ran_vals")

# For delta models, specify which component
fit_delta <- sdmTMB(
  density ~ s(depth),
  data = pcod,
  mesh = mesh,
  family = delta_gamma()
)
tidy(fit_delta, model = 1)  # binomial component
tidy(fit_delta, model = 2)  # Gamma component
tidy(fit_delta, model = 2, effects = "ran_pars", conf.int = TRUE)
```

## sdmTMB_cv - Cross-Validation for Model Comparison

Performs k-fold or leave-future-out cross-validation to evaluate predictive performance. Returns log-likelihood of held-out data for model comparison (higher values = better prediction).

```r
library(sdmTMB)
library(future)

# Enable parallel processing
plan(multisession, workers = 2)

# Basic k-fold cross validation
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25)
pcod$fyear <- as.factor(pcod$year)

cv_result <- sdmTMB_cv(
  density ~ 0 + s(depth_scaled) + fyear,
  data = pcod,
  mesh = mesh,
  family = tweedie(link = "log"),
  k_folds = 4
)

# Compare models using sum of log-likelihoods (higher = better)
cv_result$sum_loglik
cv_result$fold_loglik  # per-fold values

# Calculate RMSE and MAE from cross-validation predictions
sqrt(mean((cv_result$data$density - cv_result$data$cv_predicted)^2))  # RMSE
mean(abs(cv_result$data$density - cv_result$data$cv_predicted))       # MAE

# Spatial cross-validation using k-means clusters
clust <- kmeans(pcod[, c("X", "Y")], 10)$cluster
cv_spatial <- sdmTMB_cv(
  density ~ 0 + s(depth_scaled) + fyear,
  data = pcod,
  mesh = mesh,
  fold_ids = clust,
  family = tweedie(link = "log")
)

# Compare two models
m1 <- sdmTMB_cv(density ~ 0 + fyear, data = pcod, mesh = mesh,
                fold_ids = clust, family = tweedie())
m2 <- sdmTMB_cv(density ~ 0 + fyear + s(depth_scaled), data = pcod,
                mesh = mesh, fold_ids = clust, family = tweedie())
m1$sum_loglik  # compare these values
m2$sum_loglik  # higher is better

# Model stacking weights
weights <- sdmTMB_stacking(list(m1, m2))
```

## residuals.sdmTMB - Model Diagnostics with Residuals

Generates randomized quantile residuals for model diagnostics. Multiple types available including simulation-based DHARMa residuals for more thorough checking.

```r
library(sdmTMB)
library(ggplot2)

# Fit model
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit <- sdmTMB(
  density ~ s(depth, k = 5),
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log"),
  spatial = "on",
  spatiotemporal = "iid"
)

# Extract randomized quantile residuals (MLE-MVN type recommended)
pcod$resids <- residuals(fit, type = "mle-mvn")

# QQ plot - should follow 1:1 line
qqnorm(pcod$resids)
abline(0, 1)

# Spatial residual patterns - look for no clustering
ggplot(pcod, aes(X, Y, col = resids)) +
  geom_point() +
  scale_colour_gradient2() +
  facet_wrap(~year) +
  coord_fixed()

# Histogram - should look normal
hist(pcod$resids, breaks = 30)

# Simulation-based DHARMa residuals (more thorough)
set.seed(123)
s <- simulate(fit, nsim = 300, type = "mle-mvn")
dharma_residuals(s, fit)

# Deviance residuals
resid_dev <- residuals(fit, type = "deviance")

# Run sanity checks on model fit
sanity(fit)
#> Non-linear minimizer suggests successful convergence
#> Hessian matrix is positive definite
#> No extreme or very small eigenvalues detected
#> No gradients with respect to fixed effects are >= 0.001
#> No fixed-effect standard errors are NA
```

## families - Distribution Families for Response Variables

sdmTMB supports many distribution families beyond standard GLM families, including Tweedie for zero-inflated continuous data, negative binomial variants, lognormal, Student-t, and delta/hurdle models for separate zero vs. positive components.

```r
library(sdmTMB)

mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)

# Tweedie - zero-inflated continuous data (biomass/density)
fit_tw <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh,
                  family = tweedie(link = "log"))

# Negative binomial for count data
fit_nb2 <- sdmTMB(round(density) ~ s(depth), data = pcod, mesh = mesh,
                   family = nbinom2(link = "log"))

# Lognormal for positive continuous data
pcod_pos <- subset(pcod, density > 0)
fit_ln <- sdmTMB(density ~ s(depth), data = pcod_pos,
                  mesh = make_mesh(pcod_pos, c("X", "Y"), cutoff = 10),
                  family = lognormal(link = "log"))

# Student-t for heavy-tailed data (df estimated by default)
fit_t <- sdmTMB(density ~ s(depth), data = pcod_pos,
                 mesh = make_mesh(pcod_pos, c("X", "Y"), cutoff = 10),
                 family = student(link = "log"))

# Delta-gamma (hurdle model): binomial + Gamma
fit_dg <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh,
                  time = "year", family = delta_gamma())

# Delta-lognormal
fit_dln <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh,
                   family = delta_lognormal())

# Poisson-link delta model (ensures predictions sum correctly)
fit_pl <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh,
                  family = delta_gamma(type = "poisson-link"))

# Binomial for presence/absence
fit_bin <- sdmTMB(present ~ s(depth), data = pcod, mesh = mesh,
                   family = binomial(link = "logit"))

# Beta for proportions (0-1)
# Beta(link = "logit")

# Gamma mixture for bimodal positive data
# gamma_mix(link = "log")
```

## sdmTMBpriors - Set Priors/Penalties on Parameters

Allows specifying prior distributions on model parameters for regularization or Bayesian inference via tmbstan. Includes penalized complexity (PC) priors for Matérn random field parameters.

```r
library(sdmTMB)

mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)

# PC prior on spatial random field (recommended starting point)
# range_gt: P(range < range_gt) = 0.05
# sigma_lt: P(sigma > sigma_lt) = 0.05
fit_pc <- sdmTMB(
  density ~ depth_scaled,
  data = pcod,
  mesh = mesh,
  family = tweedie(),
  priors = sdmTMBpriors(
    matern_s = pc_matern(range_gt = 10, sigma_lt = 5)
  )
)

# Visualize PC Matern prior
plot_pc_matern(range_gt = 10, sigma_lt = 5)

# Multiple priors
fit_priors <- sdmTMB(
  density ~ depth_scaled + depth_scaled2,
  data = pcod,
  mesh = mesh,
  family = tweedie(),
  priors = sdmTMBpriors(
    matern_s = pc_matern(range_gt = 10, sigma_lt = 5),
    b = normal(c(0, 0, 0), c(10, 1, 1)),  # priors on intercept and depth coefficients
    phi = halfnormal(0, 15),               # prior on dispersion
    tweedie_p = normal(1.5, 0.1)           # prior on Tweedie power
  )
)

# For Bayesian inference with tmbstan, set bayesian = TRUE
fit_bayes <- sdmTMB(
  density ~ depth_scaled,
  data = pcod,
  mesh = mesh,
  family = tweedie(),
  bayesian = TRUE,  # applies Jacobian adjustments
  priors = sdmTMBpriors(
    matern_s = pc_matern(range_gt = 10, sigma_lt = 5),
    b = normal(c(0, 0), c(2, 2))
  )
)
# Then pass to tmbstan::tmbstan(fit_bayes$tmb_obj)
```

## simulate.sdmTMB - Simulate Data from Fitted Models

Generates simulated response data from fitted models for model checking, power analysis, or creating simulation-based residuals. Can simulate at original or new data locations.

```r
library(sdmTMB)

# Fit model
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit <- sdmTMB(
  density ~ s(depth, k = 5),
  data = pcod,
  mesh = mesh,
  time = "year",
  family = tweedie(link = "log")
)

# Simulate from fitted model (multiple replicates)
set.seed(123)
sims <- simulate(fit, nsim = 500)
dim(sims)  # rows = observations, columns = simulations

# Use for DHARMa residual diagnostics
dharma_residuals(sims, fit)

# Simulate without observation error (just process model)
sims_no_obs <- simulate(fit, nsim = 100, observation_error = FALSE)

# Simulate on new data (e.g., prediction grid)
grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year))
sims_grid <- simulate(fit, nsim = 10, newdata = grid_yrs)

# Simulate from scratch with sdmTMB_simulate
predictor_dat <- expand.grid(
  X = seq(0, 1, length.out = 100),
  Y = seq(0, 1, length.out = 100)
)
sim_mesh <- make_mesh(predictor_dat, c("X", "Y"), cutoff = 0.05)

sim_dat <- sdmTMB_simulate(
  formula = ~ 1,
  data = predictor_dat,
  mesh = sim_mesh,
  family = poisson(link = "log"),
  range = 0.3,      # spatial range
  sigma_O = 0.4,    # spatial SD
  seed = 1,
  B = 1             # intercept value
)
head(sim_dat)
```

## Summary

sdmTMB is designed for researchers and practitioners working with spatially and spatiotemporally correlated data, particularly in ecology, fisheries science, and environmental monitoring. The primary use cases include: species distribution modeling where spatial correlation must be accounted for, fisheries survey index standardization with proper uncertainty quantification, detecting distributional shifts through center of gravity calculations, understanding habitat relationships while controlling for spatial autocorrelation, and forecasting species distributions under changing conditions.

The typical workflow involves: (1) creating a mesh with `make_mesh()` appropriate for your spatial domain, (2) fitting models with `sdmTMB()` iterating on covariates, families, and random field structures, (3) checking model diagnostics with `residuals()`, `sanity()`, and `dharma_residuals()`, (4) comparing models using `sdmTMB_cv()` or AIC, (5) generating predictions with `predict()` on appropriate grids, and (6) calculating derived quantities like indices (`get_index()`), centers of gravity (`get_cog()`), or effective area occupied (`get_eao()`). The package integrates well with tidyverse workflows through `tidy()`, supports visualization with ggplot2 and visreg/ggeffects, and can pass models to tmbstan for fully Bayesian inference.