Try Live
Add Docs
Rankings
Pricing
Docs
Install
Install
Docs
Pricing
More...
More...
Try Live
Rankings
Enterprise
Create API Key
Add Docs
sdmTMB
https://github.com/sdmtmb/sdmtmb
Admin
sdmTMB is an R package for fitting spatial and spatiotemporal generalized linear mixed models using
...
Tokens:
14,685
Snippets:
111
Trust Score:
5.8
Update:
1 week ago
Context
Skills
Chat
Benchmark
91.8
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# sdmTMB sdmTMB is an R package for fitting spatial and spatiotemporal generalized linear mixed effects models (GLMMs) using Template Model Builder (TMB), fmesher, and the SPDE (Stochastic Partial Differential Equation) approach to approximate Gaussian random fields. The package provides a fast, flexible, and user-friendly interface similar to glmmTMB but focused on geostatistical spatial and spatiotemporal models. Common applications include species distribution models (SDMs), index standardization for fisheries surveys, and analyzing any coordinate-referenced observations from underlying spatial processes. sdmTMB extends standard GLMMs with spatial random fields that represent consistent spatial patterns, spatiotemporal random fields that can be independent (IID), autoregressive (AR1), or random walks, smooth terms using mgcv notation, time-varying coefficients, spatially varying coefficients, and a wide range of families including Tweedie, negative binomial, lognormal, Student-t, delta/hurdle models, and more. Estimation is via maximum marginal likelihood with random effects integrated out using the Laplace approximation, and models can also be passed to Stan for Bayesian estimation. ## make_mesh - Create SPDE Mesh for Spatial Correlation Creates a triangulated mesh that approximates spatial correlation using the SPDE approach. The mesh defines knot locations where spatial random effects are estimated, with bilinear interpolation to other locations. Smaller cutoff values create finer meshes (more detail, slower fitting) while larger values create coarser meshes (faster, less detail). ```r library(sdmTMB) # Create mesh using cutoff (minimum triangle edge length in coordinate units) mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) plot(mesh) mesh$mesh$n # number of vertices/knots # Create mesh using k-means clustering for specific number of knots mesh_kmeans <- make_mesh(pcod, c("X", "Y"), n_knots = 100, type = "kmeans") # Create mesh with fmesher for more control mesh_custom <- make_mesh( pcod, c("X", "Y"), fmesher_func = fmesher::fm_mesh_2d_inla, cutoff = 8, max.edge = c(20, 40), # inner and outer max triangle lengths offset = c(5, 40) # inner and outer border widths ) # Use existing fmesher mesh inla_mesh <- fmesher::fm_mesh_2d_inla( loc = cbind(pcod$X, pcod$Y), max.edge = c(25, 50), offset = c(5, 25), cutoff = 5 ) mesh_from_inla <- make_mesh(pcod, c("X", "Y"), mesh = inla_mesh) ``` ## sdmTMB - Fit Spatial/Spatiotemporal GLMMs The main model fitting function that fits spatial and/or spatiotemporal GLMMs. Supports a wide range of families, random fields, smoothers, time-varying effects, and spatially varying coefficients. Returns an object containing parameter estimates, random effects, and model diagnostics. ```r library(sdmTMB) library(dplyr) # Basic spatial model (no time) mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit_spatial <- sdmTMB( density ~ depth_scaled + depth_scaled2, data = pcod, mesh = mesh, family = tweedie(link = "log"), spatial = "on" ) print(fit_spatial) # Spatiotemporal model with IID random fields fit_st <- sdmTMB( density ~ s(depth, k = 5), data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log"), spatial = "on", spatiotemporal = "iid" # or "ar1" or "rw" ) # Delta (hurdle) model for zero-inflated data fit_delta <- sdmTMB( density ~ s(depth), data = pcod, mesh = mesh, time = "year", family = delta_gamma() # binomial + Gamma components ) # Model with time-varying depth effect (AR1 process) fit_tv <- sdmTMB( density ~ 1 + depth_scaled + depth_scaled2, time_varying = ~ 1 + depth_scaled + depth_scaled2, time_varying_type = "ar1", data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log"), spatial = "on", spatiotemporal = "off" ) # Model with spatially varying coefficient (local trends) pcod$year_scaled <- as.numeric(scale(pcod$year)) fit_svc <- sdmTMB( density ~ s(depth, k = 5) + year_scaled, spatial_varying = ~ year_scaled, data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log"), spatiotemporal = "off" ) # Model with random intercepts pcod$fyear <- as.factor(pcod$year) fit_re <- sdmTMB( density ~ s(depth, k = 5) + (1 | fyear), data = pcod, mesh = mesh, family = tweedie(link = "log") ) # Model with priors (penalized complexity) fit_priors <- sdmTMB( density ~ depth_scaled, data = pcod, mesh = mesh, family = tweedie(), priors = sdmTMBpriors( matern_s = pc_matern(range_gt = 10, sigma_lt = 5), b = normal(c(0, 0), c(1, 10)), phi = halfnormal(0, 15) ) ) ``` ## predict.sdmTMB - Generate Predictions from Fitted Models Generates predictions from fitted sdmTMB models on original or new data. Returns estimates in link space by default, with options for standard errors, simulation-based uncertainty, and separating fixed vs random effect contributions. ```r library(sdmTMB) library(ggplot2) # Fit model mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit <- sdmTMB( density ~ s(depth, k = 5), data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log"), spatial = "on", spatiotemporal = "iid" ) # Predictions at original data locations pred_original <- predict(fit) head(pred_original) # Columns: est (total), est_non_rf (fixed effects), omega_s (spatial), epsilon_st (spatiotemporal) # Predictions on new grid data grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year)) pred_grid <- predict(fit, newdata = grid_yrs) # Plot predictions (exponentiate from log link to response scale) ggplot(pred_grid, aes(X, Y, fill = exp(est))) + geom_raster() + facet_wrap(~year) + scale_fill_viridis_c(trans = "sqrt") + coord_fixed() # Plot spatial random effects ggplot(pred_grid, aes(X, Y, fill = omega_s)) + geom_raster() + scale_fill_gradient2() + coord_fixed() # Predictions with standard errors (population-level, excluding random fields) nd <- data.frame( depth = seq(min(pcod$depth), max(pcod$depth), length.out = 100), year = 2015L ) pred_se <- predict(fit, newdata = nd, se_fit = TRUE, re_form = NA) ggplot(pred_se, aes(depth, exp(est), ymin = exp(est - 1.96 * est_se), ymax = exp(est + 1.96 * est_se))) + geom_line() + geom_ribbon(alpha = 0.4) # Simulation-based uncertainty (faster than se_fit for spatial predictions) pred_sims <- predict(fit, newdata = grid_yrs, nsim = 100) grid_yrs$cv <- apply(exp(pred_sims), 1, function(x) sd(x) / mean(x)) # Return TMB object for index calculations pred_tmb <- predict(fit, newdata = grid_yrs, return_tmb_object = TRUE) ``` ## get_index - Calculate Standardized Abundance Index Calculates area-weighted standardized population index (time series of total abundance/biomass) from model predictions. Uses bias correction by default to account for non-linear transformation of random effects. Essential for fisheries stock assessment applications. ```r library(sdmTMB) library(ggplot2) # Fit model for index standardization mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit <- sdmTMB( density ~ 0 + as.factor(year), # year factors for index data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log") ) # Predict on grid with return_tmb_object = TRUE grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year)) predictions <- predict(fit, newdata = grid_yrs, return_tmb_object = TRUE) # Calculate biomass index (area = 4 for 2km x 2km grid cells) index <- get_index(predictions, area = 4, bias_correct = TRUE) print(index) # Plot index time series ggplot(index, aes(year, est)) + geom_line() + geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.4) + xlab("Year") + ylab("Biomass estimate (kg)") + ylim(0, NA) # Index for spatial subset (e.g., southern region) grid_south <- grid_yrs[grid_yrs$Y < 5700, ] pred_south <- predict(fit, newdata = grid_south, return_tmb_object = TRUE) index_south <- get_index(pred_south, area = 4, bias_correct = TRUE) # Memory-efficient index calculation for large grids index_split <- get_index_split(fit, newdata = grid_yrs, nsplit = 4, area = 4, bias_correct = TRUE) ``` ## get_cog - Calculate Center of Gravity Computes the center of gravity (abundance-weighted mean location) over time, useful for detecting distributional shifts in species populations. Returns coordinates with uncertainty estimates. ```r library(sdmTMB) library(ggplot2) # Fit spatiotemporal model mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit <- sdmTMB( density ~ 0 + as.factor(year), data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log") ) # Get predictions grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year)) predictions <- predict(fit, newdata = grid_yrs, return_tmb_object = TRUE) # Calculate center of gravity (wide format for plotting) cog <- get_cog(predictions, format = "wide", bias_correct = TRUE) print(cog) # Plot center of gravity shifts over time ggplot(cog, aes(est_x, est_y, colour = year)) + geom_point(size = 3) + geom_linerange(aes(xmin = lwr_x, xmax = upr_x)) + geom_linerange(aes(ymin = lwr_y, ymax = upr_y)) + scale_colour_viridis_c() + labs(x = "Eastings (km)", y = "Northings (km)") # Long format for separate x/y analysis cog_long <- get_cog(predictions, format = "long", bias_correct = TRUE) ``` ## tidy.sdmTMB - Extract Model Parameters as Data Frame Extracts model parameters, random effects, and variance components into tidy data frames following broom/broom.mixed conventions. Essential for summarizing and comparing models. ```r library(sdmTMB) # Fit model mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit <- sdmTMB( density ~ poly(depth_scaled, 2), data = pcod, mesh = mesh, family = tweedie() ) # Extract fixed effects with confidence intervals tidy(fit, conf.int = TRUE) #> term estimate std.error conf.low conf.high #> (Intercept) 2.37 0.21 1.95 2.79 #> poly(depth_scaled, 2)1 -31.7 3.03 -37.6 -25.8 # Extract random effect parameters (range, sigma, phi, etc.) tidy(fit, effects = "ran_pars", conf.int = TRUE) #> term estimate std.error conf.low conf.high #> range 16.4 NA 9.60 28.0 #> phi 12.7 NA 11.9 13.5 #> sigma_O 1.86 NA 1.48 2.34 #> tweedie_p 1.58 NA 1.56 1.60 # For models with random intercepts, extract random effect values pcod$fyear <- as.factor(pcod$year) fit_re <- sdmTMB( density ~ s(depth, k = 5) + (1 | fyear), data = pcod, mesh = mesh, family = tweedie() ) tidy(fit_re, effects = "ran_vals") # For delta models, specify which component fit_delta <- sdmTMB( density ~ s(depth), data = pcod, mesh = mesh, family = delta_gamma() ) tidy(fit_delta, model = 1) # binomial component tidy(fit_delta, model = 2) # Gamma component tidy(fit_delta, model = 2, effects = "ran_pars", conf.int = TRUE) ``` ## sdmTMB_cv - Cross-Validation for Model Comparison Performs k-fold or leave-future-out cross-validation to evaluate predictive performance. Returns log-likelihood of held-out data for model comparison (higher values = better prediction). ```r library(sdmTMB) library(future) # Enable parallel processing plan(multisession, workers = 2) # Basic k-fold cross validation mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25) pcod$fyear <- as.factor(pcod$year) cv_result <- sdmTMB_cv( density ~ 0 + s(depth_scaled) + fyear, data = pcod, mesh = mesh, family = tweedie(link = "log"), k_folds = 4 ) # Compare models using sum of log-likelihoods (higher = better) cv_result$sum_loglik cv_result$fold_loglik # per-fold values # Calculate RMSE and MAE from cross-validation predictions sqrt(mean((cv_result$data$density - cv_result$data$cv_predicted)^2)) # RMSE mean(abs(cv_result$data$density - cv_result$data$cv_predicted)) # MAE # Spatial cross-validation using k-means clusters clust <- kmeans(pcod[, c("X", "Y")], 10)$cluster cv_spatial <- sdmTMB_cv( density ~ 0 + s(depth_scaled) + fyear, data = pcod, mesh = mesh, fold_ids = clust, family = tweedie(link = "log") ) # Compare two models m1 <- sdmTMB_cv(density ~ 0 + fyear, data = pcod, mesh = mesh, fold_ids = clust, family = tweedie()) m2 <- sdmTMB_cv(density ~ 0 + fyear + s(depth_scaled), data = pcod, mesh = mesh, fold_ids = clust, family = tweedie()) m1$sum_loglik # compare these values m2$sum_loglik # higher is better # Model stacking weights weights <- sdmTMB_stacking(list(m1, m2)) ``` ## residuals.sdmTMB - Model Diagnostics with Residuals Generates randomized quantile residuals for model diagnostics. Multiple types available including simulation-based DHARMa residuals for more thorough checking. ```r library(sdmTMB) library(ggplot2) # Fit model mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit <- sdmTMB( density ~ s(depth, k = 5), data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log"), spatial = "on", spatiotemporal = "iid" ) # Extract randomized quantile residuals (MLE-MVN type recommended) pcod$resids <- residuals(fit, type = "mle-mvn") # QQ plot - should follow 1:1 line qqnorm(pcod$resids) abline(0, 1) # Spatial residual patterns - look for no clustering ggplot(pcod, aes(X, Y, col = resids)) + geom_point() + scale_colour_gradient2() + facet_wrap(~year) + coord_fixed() # Histogram - should look normal hist(pcod$resids, breaks = 30) # Simulation-based DHARMa residuals (more thorough) set.seed(123) s <- simulate(fit, nsim = 300, type = "mle-mvn") dharma_residuals(s, fit) # Deviance residuals resid_dev <- residuals(fit, type = "deviance") # Run sanity checks on model fit sanity(fit) #> Non-linear minimizer suggests successful convergence #> Hessian matrix is positive definite #> No extreme or very small eigenvalues detected #> No gradients with respect to fixed effects are >= 0.001 #> No fixed-effect standard errors are NA ``` ## families - Distribution Families for Response Variables sdmTMB supports many distribution families beyond standard GLM families, including Tweedie for zero-inflated continuous data, negative binomial variants, lognormal, Student-t, and delta/hurdle models for separate zero vs. positive components. ```r library(sdmTMB) mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) # Tweedie - zero-inflated continuous data (biomass/density) fit_tw <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh, family = tweedie(link = "log")) # Negative binomial for count data fit_nb2 <- sdmTMB(round(density) ~ s(depth), data = pcod, mesh = mesh, family = nbinom2(link = "log")) # Lognormal for positive continuous data pcod_pos <- subset(pcod, density > 0) fit_ln <- sdmTMB(density ~ s(depth), data = pcod_pos, mesh = make_mesh(pcod_pos, c("X", "Y"), cutoff = 10), family = lognormal(link = "log")) # Student-t for heavy-tailed data (df estimated by default) fit_t <- sdmTMB(density ~ s(depth), data = pcod_pos, mesh = make_mesh(pcod_pos, c("X", "Y"), cutoff = 10), family = student(link = "log")) # Delta-gamma (hurdle model): binomial + Gamma fit_dg <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh, time = "year", family = delta_gamma()) # Delta-lognormal fit_dln <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh, family = delta_lognormal()) # Poisson-link delta model (ensures predictions sum correctly) fit_pl <- sdmTMB(density ~ s(depth), data = pcod, mesh = mesh, family = delta_gamma(type = "poisson-link")) # Binomial for presence/absence fit_bin <- sdmTMB(present ~ s(depth), data = pcod, mesh = mesh, family = binomial(link = "logit")) # Beta for proportions (0-1) # Beta(link = "logit") # Gamma mixture for bimodal positive data # gamma_mix(link = "log") ``` ## sdmTMBpriors - Set Priors/Penalties on Parameters Allows specifying prior distributions on model parameters for regularization or Bayesian inference via tmbstan. Includes penalized complexity (PC) priors for Matérn random field parameters. ```r library(sdmTMB) mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) # PC prior on spatial random field (recommended starting point) # range_gt: P(range < range_gt) = 0.05 # sigma_lt: P(sigma > sigma_lt) = 0.05 fit_pc <- sdmTMB( density ~ depth_scaled, data = pcod, mesh = mesh, family = tweedie(), priors = sdmTMBpriors( matern_s = pc_matern(range_gt = 10, sigma_lt = 5) ) ) # Visualize PC Matern prior plot_pc_matern(range_gt = 10, sigma_lt = 5) # Multiple priors fit_priors <- sdmTMB( density ~ depth_scaled + depth_scaled2, data = pcod, mesh = mesh, family = tweedie(), priors = sdmTMBpriors( matern_s = pc_matern(range_gt = 10, sigma_lt = 5), b = normal(c(0, 0, 0), c(10, 1, 1)), # priors on intercept and depth coefficients phi = halfnormal(0, 15), # prior on dispersion tweedie_p = normal(1.5, 0.1) # prior on Tweedie power ) ) # For Bayesian inference with tmbstan, set bayesian = TRUE fit_bayes <- sdmTMB( density ~ depth_scaled, data = pcod, mesh = mesh, family = tweedie(), bayesian = TRUE, # applies Jacobian adjustments priors = sdmTMBpriors( matern_s = pc_matern(range_gt = 10, sigma_lt = 5), b = normal(c(0, 0), c(2, 2)) ) ) # Then pass to tmbstan::tmbstan(fit_bayes$tmb_obj) ``` ## simulate.sdmTMB - Simulate Data from Fitted Models Generates simulated response data from fitted models for model checking, power analysis, or creating simulation-based residuals. Can simulate at original or new data locations. ```r library(sdmTMB) # Fit model mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit <- sdmTMB( density ~ s(depth, k = 5), data = pcod, mesh = mesh, time = "year", family = tweedie(link = "log") ) # Simulate from fitted model (multiple replicates) set.seed(123) sims <- simulate(fit, nsim = 500) dim(sims) # rows = observations, columns = simulations # Use for DHARMa residual diagnostics dharma_residuals(sims, fit) # Simulate without observation error (just process model) sims_no_obs <- simulate(fit, nsim = 100, observation_error = FALSE) # Simulate on new data (e.g., prediction grid) grid_yrs <- replicate_df(qcs_grid, "year", unique(pcod$year)) sims_grid <- simulate(fit, nsim = 10, newdata = grid_yrs) # Simulate from scratch with sdmTMB_simulate predictor_dat <- expand.grid( X = seq(0, 1, length.out = 100), Y = seq(0, 1, length.out = 100) ) sim_mesh <- make_mesh(predictor_dat, c("X", "Y"), cutoff = 0.05) sim_dat <- sdmTMB_simulate( formula = ~ 1, data = predictor_dat, mesh = sim_mesh, family = poisson(link = "log"), range = 0.3, # spatial range sigma_O = 0.4, # spatial SD seed = 1, B = 1 # intercept value ) head(sim_dat) ``` ## Summary sdmTMB is designed for researchers and practitioners working with spatially and spatiotemporally correlated data, particularly in ecology, fisheries science, and environmental monitoring. The primary use cases include: species distribution modeling where spatial correlation must be accounted for, fisheries survey index standardization with proper uncertainty quantification, detecting distributional shifts through center of gravity calculations, understanding habitat relationships while controlling for spatial autocorrelation, and forecasting species distributions under changing conditions. The typical workflow involves: (1) creating a mesh with `make_mesh()` appropriate for your spatial domain, (2) fitting models with `sdmTMB()` iterating on covariates, families, and random field structures, (3) checking model diagnostics with `residuals()`, `sanity()`, and `dharma_residuals()`, (4) comparing models using `sdmTMB_cv()` or AIC, (5) generating predictions with `predict()` on appropriate grids, and (6) calculating derived quantities like indices (`get_index()`), centers of gravity (`get_cog()`), or effective area occupied (`get_eao()`). The package integrates well with tidyverse workflows through `tidy()`, supports visualization with ggplot2 and visreg/ggeffects, and can pass models to tmbstan for fully Bayesian inference.