### Configure Parallel Processing for KernelSHAP Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/examples-and-troubleshooting.md Ensure the `doFuture` package is installed and a backend is registered for parallel processing. Specify necessary packages in `parallel_args` if needed. ```r # 1. Install doFuture install.packages("doFuture") # 2. Load and register backend library(doFuture) plan(multicore, workers = 4) # or plan(multisession) # 3. Verify it works s <- kernelshap(fit, X, parallel = TRUE, verbose = TRUE) # 4. If using Windows with specific packages: s <- kernelshap( fit, X, parallel = TRUE, parallel_args = list(packages = c("mgcv", "ranger")) ) # Reset to serial plan(sequential) ``` -------------------------------- ### Install kernelshap Package Source: https://github.com/modeloriented/kernelshap/blob/main/README.md Install the kernelshap package from CRAN or the development version from GitHub. ```r # From CRAN install.packages("kernelshap") # Or the development version: devtools::install_github("ModelOriented/kernelshap") ``` -------------------------------- ### GAM Model Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/additive-shap.md Shows how to apply additive_shap to a Generalized Additive Model (GAM) fitted using the mgcv package. This example uses smooth functions (s()) for feature contributions. ```r library(mgcv) fit <- gam( Sepal.Length ~ s(Sepal.Width) + s(Petal.Length) + Species, data = iris ) s <- additive_shap(fit, iris[1:10, ]) s ``` -------------------------------- ### Input Data (X) Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Displays the input data used for SHAP calculation. The 'X' component is a matrix or data frame containing the observations and features that were explained. ```r s$X # Sepal.Width Petal.Length Petal.Width Species # 1 3.5 1.4 0.2 setosa # 2 3.0 1.4 0.2 setosa ``` -------------------------------- ### Interactive Program Startup Notice Source: https://github.com/modeloriented/kernelshap/blob/main/LICENSE.md Example of a short notice displayed by an interactive program upon startup. It includes version, copyright, warranty information, and details on redistribution rights. ```text Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. ``` -------------------------------- ### Linear Regression Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/additive-shap.md Demonstrates how to use additive_shap with a linear regression model fitted using lm(). Shows how to access the SHAP values matrix (S) and the baseline value. ```r library(kernelshap) fit <- lm(Sepal.Length ~ ., data = iris) s <- additive_shap(fit, iris[1:10, ]) s # Access SHAP values head(s$S) # Access baseline s$baseline ``` -------------------------------- ### KernelSHAP Exact Weight Proportion Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Demonstrates how to access the proportion of Kernel SHAP weight distribution covered by exact calculations. This value is specific to kernelshap and absent for permshap. ```r s$prop_exact # [1] 0.125 # 12.5% of weight from exact, 87.5% from sampling ``` -------------------------------- ### Ranger Regression Usage Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Complete example demonstrating training a ranger regression model on diamonds data and calculating SHAP values for a subset. ```r library(ranger) library(kernelshap) # Train model set.seed(123) diamonds_small <- diamonds[sample(nrow(diamonds), 1000), ] fit <- ranger( log(price) ~ log(carat) + clarity + color + cut, data = diamonds_small, num.trees = 100, seed = 123 ) # Explain 50 predictions X_explain <- diamonds_small[1:50, c("carat", "clarity", "color", "cut")] s <- kernelshap(fit, X_explain) # View results summary(s) s$S[1, ] # SHAP values for first observation s$predictions[1] # Prediction for first observation s$baseline # Average prediction ``` -------------------------------- ### KernelSHAP Configuration for Small Datasets Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/configuration.md Example of calling kernelshap with a small dataset where 'exact = TRUE' is the default due to few features. Background data is explicitly provided. ```r kernelshap(fit, X, bg_X = full_training_data, exact = TRUE) # Default: exact = TRUE (since p < 8) # Default: bg_n = 200 (ignored, bg_X provided) ``` -------------------------------- ### KernelSHAP Sampling Size Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Shows how to retrieve the number of on-off vectors sampled per iteration for sampling-based SHAP calculations. This is ignored if exact = TRUE. ```r s$m # [1] 8 # 4 features, 2 * 4 = 8 samples per iteration ``` -------------------------------- ### Parallel Computing Setup for SHAP Functions Source: https://github.com/modeloriented/kernelshap/blob/main/README.md Configures parallel computing using doFuture and multisession for permshap() and kernelshap(). Note potential issues with package and global object passing on Windows. ```r library(doFuture) library(mgcv) plan(multisession, workers = 4) # Windows ``` -------------------------------- ### Exact Evaluations Count Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Example of the number of exact on-off vectors evaluated. The 'm_exact' component indicates the computational effort for exact or hybrid SHAP calculations. ```r s$m_exact # [1] 8 # 4 features, degree 1: 2 * 4 = 8 ``` -------------------------------- ### KernelSHAP Convergence Status Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Illustrates retrieving the logical vector indicating whether each observation converged within the maximum number of iterations. FALSE means the max_iter was reached without satisfying the stopping criterion, and this is absent if exact = TRUE. ```r s$converged # [1] TRUE TRUE FALSE TRUE TRUE ... # Observation 3 did not converge ``` -------------------------------- ### Subsample Explanation Set for Large Datasets Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md When dealing with a large number of observations to explain, subsample the explanation set to improve performance. This example explains 100 observations instead of 10,000. ```r # Explain 100 observations at a time instead of 10,000 X_sample <- X_explain[sample(nrow(X_explain), 100), ] s <- kernelshap(fit, X_sample) ``` -------------------------------- ### Ranger Regression Model Training and SHAP Calculation Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Example of training a ranger regression model and then calculating SHAP values for test data. ```r library(ranger) # Train random forest regression fit <- ranger(y ~ ., data = training_data) # SHAP values s <- kernelshap(fit, X = test_data[, -response_col]) s$predictions # Numeric vector s$S # (n × p) matrix of SHAP values ``` -------------------------------- ### Ranger Survival Model Training and SHAP Calculation (Probabilities) Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Example of training a ranger survival model and calculating SHAP values for survival probabilities. ```r # SHAP values for survival probabilities s_prob <- kernelshap(fit, veteran[-c(1, 2)], survival = "prob") s_prob$predictions # (n × n_times) matrix ``` -------------------------------- ### Model Predictions Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Shows the model's predictions on the input data. The 'predictions' component is a matrix where each row corresponds to an observation's predicted output. ```r s$predictions # [,1] # [1,] 5.006 # [2,] 4.906 # [3,] 4.706 ``` -------------------------------- ### KernelSHAP Iteration Count Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Shows how to access the vector indicating the number of iterations performed for each observation to reach convergence. This value varies per row and is absent if exact = TRUE. ```r s$n_iter # [1] 3 2 4 3 5 2 4 3 ... # Observation 1 required 3 iterations, observation 2 required 2, etc. ``` -------------------------------- ### KernelSHAP Configuration for Large Datasets Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/configuration.md Example of calling kernelshap for a large dataset with many features. 'exact = FALSE' is the default. Parameters like 'hybrid_degree' and 'max_iter' are adjusted for performance and accuracy. ```r kernelshap( fit, X, bg_X = subset_training_data, exact = FALSE, hybrid_degree = 1, # or 2 max_iter = 200, tol = 0.01 ) # Default: exact = FALSE (since p > 8) # Tighter iterations needed for many features ``` -------------------------------- ### KernelSHAP Standard Errors Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Illustrates accessing the standard error estimates for SHAP values, which indicate the precision of the approximation. These are calculated across iterations and are absent if exact = TRUE. ```r s$SE # Sepal.Width Petal.Length Petal.Width Species # [1,] 0.0089 0.0123 0.0045 0.0078 # [2,] 0.0091 0.0125 0.0047 0.0079 ``` -------------------------------- ### PermutationSHAP for Memory-Constrained Environments Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/configuration.md Example of using permshap with 'low_memory = TRUE' to reduce peak memory usage, suitable for environments like Shiny apps. 'max_iter' is also adjusted. ```r permshap(fit, X, low_memory = TRUE, max_iter = 50) # Processes one iteration chunk at a time # Reduces peak memory usage ``` -------------------------------- ### Generalized Linear Model (Logistic Regression) Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/additive-shap.md Demonstrates the application of additive_shap to a logistic regression model fitted using glm(). The model predicts a binary outcome based on several features. ```r fit <- glm( I(Sepal.Width > 3.0) ~ Sepal.Length + Petal.Length + Petal.Width, data = iris, family = binomial() ) s <- additive_shap(fit, iris[1:10, ]) s ``` -------------------------------- ### Cox Proportional Hazards Model Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/additive-shap.md Illustrates how to use additive_shap with a Cox proportional hazards model fitted using the survival package. This is suitable for survival analysis tasks. ```r library(survival) fit <- coxph( Surv(time, status) ~ age + ph.ecog + wt.loss, data = veteran ) s <- additive_shap(fit, veteran[1:10, ]) s ``` -------------------------------- ### Polynomial and Log Terms Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/additive-shap.md Illustrates the use of additive_shap with a linear model that includes polynomial (poly()) and logarithmic (log()) terms. The function correctly attributes contributions from these multivariate terms to their single underlying feature. ```r fit <- lm( Sepal.Length ~ poly(Sepal.Width, 2) + log(Petal.Length) + log(Sepal.Width), data = iris ) s_add <- additive_shap(fit, iris[1:10, ]) s_add ``` -------------------------------- ### Custom Prediction Function for Logarithmic Response Scale Source: https://github.com/modeloriented/kernelshap/blob/main/NEWS.md When the default predict function is insufficient (e.g., for logarithmic responses), provide a custom `pred_fun` to `kernelshap`. This example uses `exp(predict(m, X))` to get predictions on the original scale. ```R kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X))) ``` -------------------------------- ### Basic KernelSHAP and Shapviz Integration Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/README.md Demonstrates how to perform KernelSHAP calculations and then integrate the results with the shapviz package for visualization. ```r library(shapviz) s <- kernelshap(fit, iris[-1]) sv <- shapviz(s) sv_importance(sv) # Variable importance sv_dependence(sv, "Petal.Length") # Dependence plot ``` -------------------------------- ### Ranger Classification Usage Example Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Example demonstrating training a ranger classification model on iris data and calculating SHAP values for all observations. ```r # Train model fit <- ranger(Species ~ ., data = iris, num.trees = 100, probability = TRUE) # Explain all iris observations s <- kernelshap(fit, iris[-5]) # SHAP values per class length(s$S) # 3 (three classes) head(s$S[[1]]) # First class (setosa) ``` -------------------------------- ### Typical Ranger and KernelSHAP Workflow Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md A standard workflow for training a Ranger model, selecting data for explanation, computing SHAP values, and visualizing them with shapviz. ```r library(ranger) library(kernelshap) library(shapviz) # 1. Train ranger model fit <- ranger(log(price) ~ ., data = diamonds, num.trees = 500) # 2. Select observations to explain (avoid duplicating training data) X_explain <- diamonds[sample(nrow(diamonds), 100), ] # 3. Compute SHAP values s <- kernelshap(fit, X_explain) # 4. Visualize with shapviz sv <- shapviz(s) sv_importance(sv) sv_dependence(sv, "carat") ``` -------------------------------- ### Permutation SHAP Exact Algorithm Steps Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/algorithms.md Outlines the steps for the exact Permutation SHAP algorithm, focusing on generating all on-off vectors and calculating marginal contributions for each feature. ```Mathematical Z = {0, 1}^p (all 2^p combinations) ``` ```Mathematical Identify positions where feature j transitions from 0→1 (on→off) contributions[j] = mean(v[position_on] - v[position_off]) ``` -------------------------------- ### KernelSHAP Generic Function Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/kernelshap.md The generic function for KernelSHAP. Use this as a starting point for understanding its interface. ```r kernelshap(object, ...) ``` -------------------------------- ### Access Baseline Prediction Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/kernelshap-class.md Get the baseline (average prediction on background data) from a kernelshap object using the $baseline component. ```r s$baseline ``` -------------------------------- ### Basic Kernel SHAP Usage Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/README.md Demonstrates the basic workflow of training a model and calculating SHAP values using the kernelshap package. Requires the kernelshap library to be loaded. ```r library(kernelshap) # Train any model fit <- lm(Sepal.Length ~ ., data = iris) # Calculate SHAP values s <- kernelshap(fit, iris[-1]) print(s) summary(s) # Access results s$S # SHAP values (n × p matrix) s$baseline # Average prediction s$predictions # Model predictions on X ``` -------------------------------- ### Standard Software License Notice Source: https://github.com/modeloriented/kernelshap/blob/main/LICENSE.md This is a template for a standard free software license notice to be included in source files. It specifies redistribution rights and warranty disclaimers under the GNU General Public License. ```text Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. ``` -------------------------------- ### Permutation SHAP for Ranger Classification Model Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/permshap.md Calculates SHAP values for a classification model trained with the 'ranger' package. Ensure 'ranger' is installed and loaded. ```r library(ranger) fit <- ranger(Species ~ ., data = iris, probability = TRUE) s <- permshap(fit, iris[-5]) s ``` -------------------------------- ### Document Algorithm Choice and Convergence Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/examples-and-troubleshooting.md Record the SHAP algorithm used and report convergence status for sampling-based methods. This aids in understanding and reproducing the analysis. ```r # Record what was used method <- s$algorithm if (!s$exact) { cat("Convergence achieved for", sum(s$converged), "rows\n") } ``` -------------------------------- ### Set Parallel Processing Plan Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/configuration.md Configure the parallel processing backend using doFuture. Choose multicore for Linux/macOS or multisession for Windows. Specify the number of worker processes. ```r library(doFuture) # Linux, macOS: plan(multicore, workers = 4) # Windows: plan(multisession, workers = 4) ``` -------------------------------- ### Baseline Value (Single Output) Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Example of the baseline prediction value for a single-output model. The 'baseline' component represents the average prediction on the background dataset. ```r s$baseline # [1] 5.843333 ``` -------------------------------- ### Ranger Classification Model Training and SHAP Calculation Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Example of training a ranger classification model with probabilities and calculating SHAP values per class. ```r # Train random forest classifier with probabilities fit <- ranger(Species ~ ., data = iris, probability = TRUE) # SHAP values (one per class) s <- kernelshap(fit, iris[-5]) s$S # List of 3 matrices (one per class) s$S[[1]] # SHAP for class 1 (setosa) s$S[[2]] # SHAP for class 2 (versicolor) s$S[[3]] # SHAP for class 3 (virginica) s$baseline # Named vector of prior probabilities ``` -------------------------------- ### Custom Prediction Function for Ranger SHAP Calculation Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Example of defining and using a custom prediction function to override the automatic one for ranger SHAP calculations. ```r custom_pred <- function(model, newdata, ...) { # Custom logic (e.g., probability scaling) pred <- predict(model, newdata, ...)$predictions return(pred / 100) # Rescale probabilities } s <- kernelshap(fit, X, pred_fun = custom_pred) ``` -------------------------------- ### Basic Usage: Random Forest Model for Diamond Prices Source: https://github.com/modeloriented/kernelshap/blob/main/README.md Demonstrates modeling diamond prices with a random forest and calculating SHAP values using permshap(). Requires loading necessary libraries and preparing data. ```r library(kernelshap) library(ggplot2) library(ranger) library(shapviz) options(ranger.num.threads = 8) diamonds <- transform( diamonds, log_price = log(price), log_carat = log(carat) ) xvars <- c("log_carat", "clarity", "color", "cut") fit <- ranger( log_price ~ log_carat + clarity + color + cut, data = diamonds, num.trees = 100, seed = 20 ) fit # OOB R-squared 0.989 # 1) Sample rows to be explained set.seed(10) X <- diamonds[sample(nrow(diamonds), 1000), xvars] # 2) Optional: Select background data. If unspecified, 200 rows from X are used bg_X <- diamonds[sample(nrow(diamonds), 200), ] # 3) Crunch SHAP values (22 seconds) # Since the number of features is small, we use permshap() system.time( ps <- permshap(fit, X, bg_X = bg_X) ) ps # SHAP values of first observations: log_carat clarity color cut [1,] 1.1913247 0.09005467 -0.13430720 0.000682593 [2,] -0.4931989 -0.11724773 0.09868921 0.028563613 # Indeed, Kernel SHAP gives the same: ks <- kernelshap(fit, X, bg_X = bg_X) ks log_carat clarity color cut [1,] 1.1913247 0.09005467 -0.13430720 0.000682593 [2,] -0.4931989 -0.11724773 0.09868921 0.028563613 # 4) Analyze with {shapviz} ps <- shapviz(ps) sv_importance(ps) sv_dependence(ps, xvars) ``` -------------------------------- ### Baseline Value (Multi Output) Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Example of baseline prediction values for a multi-output model, such as probabilistic classification. 'baseline' is a vector containing the prior probability for each class. ```r s$baseline # setosa versicolor virginica # 0.333 0.333 0.333 ``` -------------------------------- ### Parallel Processing with Package Loading (Windows) Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/configuration.md When using parallel processing on Windows, ensure that necessary model packages are loaded into each worker session using the 'packages' argument in parallel_args. ```r s <- kernelshap(fit, X, parallel = TRUE, parallel_args = list(packages = "mgcv")) ``` -------------------------------- ### Ranger Survival Model Training and SHAP Calculation (CHF) Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Example of training a ranger survival model and calculating SHAP values for cumulative hazards (default). ```r # Train survival random forest fit <- ranger( Surv(time, status) ~ ., data = veteran, treetype = "Survival" ) # SHAP values for cumulative hazards (default) s_chf <- kernelshap(fit, veteran[-c(1, 2)], survival = "chf") s_chf$predictions # (n × n_times) matrix ``` -------------------------------- ### summary.kernelshap() Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/kernelshap-class.md Provides a detailed summary of the SHAP calculation results, including algorithm details and convergence information. ```APIDOC ## summary.kernelshap() ### Description Provide a detailed summary of the SHAP calculation results, including algorithm details and convergence information. ### Method `summary(object, compact = FALSE, n = 2L, ...)` ### Parameters #### Parameters - **object** (kernelshap) - Required - A kernelshap object. - **compact** (logical) - Optional - If TRUE, omit SHAP values and standard errors from output (summary only). Defaults to FALSE. - **n** (integer) - Optional - Maximum number of rows of SHAP values and SE to display. Defaults to 2L. - **...** (any) - Optional - Further arguments passed from other methods. ### Return Value Invisibly returns the input object. ### Output Details The summary displays: - Algorithm text (exact vs. sampling, hybrid degree) - SHAP matrix dimensions - Baseline value(s) - (If sampling) Average number of iterations, non-convergence count, proportion covered by exact calculations, samples per iteration - (If not compact) SHAP values and standard errors of first n observations ### Example ```r s <- kernelshap(fit, iris[-1]) summary(s) # Exact Kernel SHAP values # - SHAP matrix of dim 150 x 4 # - baseline: 5.843333 # # SHAP values of first observations: # Sepal.Width Petal.Length Petal.Width Species # 1 0.123 0.456 -0.089 0.012 # 2 0.145 0.489 -0.067 0.034 # Compact summary (no SHAP values) summary(s, compact = TRUE) # Exact Kernel SHAP values # - SHAP matrix of dim 150 x 4 # - baseline: 5.843333 # For sampling-based results s_sample <- kernelshap(fit, iris[-1], exact = FALSE) summary(s_sample) # Kernel SHAP values by the hybrid strategy of degree 1 # - SHAP matrix of dim 150 x 4 # - baseline: 5.843333 # - average number of iterations: 2.5 # - rows not converged: 1 # - proportion exact: 0.125 # - m/iter: 8 # - m_exact: 8 # # Corresponding standard errors: ``` ``` -------------------------------- ### Kernel SHAP Pure Sampling Algorithm Steps Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/algorithms.md Illustrates the core steps of the Kernel SHAP pure sampling algorithm: sampling on-off vectors, generating masked data, predicting masked data, and performing weighted regression. ```Mathematical w(z) = (p - 1) / (C(p, S(z)) * S(z) * (p - S(z))) where S(z) = sum of z, C(p, k) = binomial coefficient ``` ```Mathematical X_z: Replace columns j where z_j = 1 with x_j; keep other columns from background data ``` ```Mathematical v_z = f(X_z), averaged over background data ``` ```Mathematical min_β || √w .* (v - Z β) ||^2 such that sum(β) = v_1 - v_0 ``` ```Mathematical max(SE) / (max(β) - min(β)) < tol ``` -------------------------------- ### Basic shapviz Plots Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/README.md Demonstrates the usage of basic plotting functions from the shapviz library with a KernelSHAP object. ```R sv_importance(sv) sv_dependence(sv, "feature_name") sv_interaction(sv, "feat1", "feat2") ``` -------------------------------- ### GAM with Interactions using Permutation SHAP Source: https://github.com/modeloriented/kernelshap/blob/main/README.md Demonstrates calculating SHAP values for a GAM with interactions using permshap. Requires the mgcv package for parallel processing. ```r fit <- gam(log_price ~ s(log_carat) + clarity * color + cut, data = diamonds) system.time( # 4 seconds in parallel ps <- permshap( fit, X, bg_X = bg_X, parallel = TRUE, parallel_args = list(packages = "mgcv") ) ) ps ``` -------------------------------- ### SHAP Values (S) - Single Output Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Example of SHAP values for a single-output model. The 'S' component is a matrix where each row represents an observation and each column represents a feature's contribution. ```r s$S # Sepal.Width Petal.Length Petal.Width Species # [1,] 0.123 0.456 -0.089 0.012 # [2,] 0.145 0.489 -0.067 0.034 ``` -------------------------------- ### Integrating KernelSHAP with Shapviz for Visualization Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/kernelshap-class.md This snippet shows the basic workflow for using the `shapviz` package to visualize KernelSHAP results, including creating a `shapviz` object and generating importance and dependence plots. ```r library(shapviz) s <- kernelshap(fit, iris[-1]) sv <- shapviz(s) # Importance plot sv_importance(sv) # Dependence plot sv_dependence(sv, "Sepal.Width") ``` -------------------------------- ### Comparison with kernelshap() Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/additive-shap.md Compares the performance and results of additive_shap() against the model-agnostic kernelshap() function. It highlights the speed advantage of additive_shap() for compatible models and verifies the identical SHAP values when using the full training data as the background for kernelshap(). ```r library(kernelshap) fit <- lm(Sepal.Length ~ ., data = iris) X <- iris[1:20, -1] # Fast additive approach system.time(s_add <- additive_shap(fit, X)) # Slower model-agnostic kernel SHAP (but with full training data as background) system.time( s_kernel <- kernelshap( fit, X[c("Sepal.Width", "Petal.Length")], bg_X = iris[, -1] ) ) # Values are identical all.equal(s_add$S[, c("Sepal.Width", "Petal.Length")], s_kernel$S) # [1] TRUE ``` -------------------------------- ### Kernel SHAP Implementation Details Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/algorithms.md Provides R code snippets for the different Kernel SHAP strategies: exact, hybrid, and pure sampling. It outlines the main loop and conditional logic for each approach. ```R # Main loop if (exact) { # All on-off vectors z ∈ {0,1}^p except all-0 and all-1 Z <- exact_Z(p) # (2^p - 2) x p matrix vz <- get_vz(...) # Masked predictions # Solve regression with constraint beta <- solver(A, b, constraint = v1 - v0) } else if (hybrid_degree >= 1) { # Exact part precalc <- input_partly_exact(p, hybrid_degree) # Sampling part for (iter in 1:max_iter) { input <- input_sampling(p, m, deg = hybrid_degree) vz <- get_vz(...) # Combine and solve } } else { # Pure sampling (not recommended) for (iter in 1:max_iter) { Z <- sample_on_off_vectors(p, m) vz <- get_vz(...) } } ``` -------------------------------- ### Troubleshooting: Small Background Data Warning Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/examples-and-troubleshooting.md Addresses the 'X is quite small to act as background data' warning by providing larger background data (`bg_X`) when the explanation data (`X`) is used as background and is between 20-50 rows. ```r s <- kernelshap( fit, X = medium_test_data, bg_X = training_data[sample(nrow(training_data), 300), ] ) ``` -------------------------------- ### Using Additive SHAP for Additive Models Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/README.md Illustrates how to use the additive_shap function when the model is additive and does not involve interactions, offering the fastest computation. ```r s <- additive_shap(fit, iris[-1]) ``` -------------------------------- ### Background Data Sampling Logic Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/configuration.md Illustrates the logic for selecting background data (bg_X). If bg_X is not provided, it's sampled from X based on nrow(X) and bg_n. ```r if (is.null(bg_X)) { if (nrow(X) <= bg_n) { bg_X <- X # All of X as background } else { bg_X <- random sample of bg_n rows from X } } # Final size: nrow(bg_X) typically = min(nrow(X), bg_n) # Unless user provides explicit bg_X with different size ``` -------------------------------- ### Permutation SHAP Sampling Algorithm Steps Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/algorithms.md Details the sampling algorithm for Permutation SHAP, including cycling through random permutations, forward/backward passes, and convergence criteria. ```Mathematical Start with all features "on" (from observation x) Turn off sequentially: σ₁, σ₂, ..., σₚ (forward pass) Turn back on: σₚ, ..., σ₂, σ₁ (backward pass) This gives 2p disjoint evaluations per permutation. ``` ```Mathematical max(SE) / (max(SHAP) - min(SHAP)) < tol ``` -------------------------------- ### Sample Copyright Disclaimer Source: https://github.com/modeloriented/kernelshap/blob/main/LICENSE.md A sample copyright disclaimer for a company to sign, relinquishing copyright interest in a specific program written by an individual. ```text Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice ``` -------------------------------- ### Custom Prediction Function for Keras Models Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/types.md Provides an example of defining a custom prediction function for models like Keras that do not have a standard predict() interface compatible with KernelSHAP. This function is then passed to the kernelshap function. ```r library(keras) pred_fun <- function(model, X) { predict(model, data.matrix(X), batch_size = 1000, verbose = FALSE) } s <- kernelshap(nn, X, pred_fun = pred_fun) ``` -------------------------------- ### KernelSHAP for Many Rows with Parallel Processing Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/configuration.md Demonstrates setting up parallel processing for KernelSHAP when explaining a large number of rows. The progress bar is automatically disabled. ```r plan(multicore, workers = 8) kernelshap(fit, X, parallel = TRUE) # Rows processed in parallel # Progress bar disabled automatically ``` -------------------------------- ### KernelSHAP Algorithm Selection Decision Tree Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/INDEX.md Use this decision tree to select the appropriate SHAP algorithm based on model characteristics and feature count. It guides you from additive models to sampling-based approaches for larger feature sets. ```text Is the model additive (no interactions)? ├─ YES → Use additive_shap() — FASTEST └─ NO → Continue How many features (p)? ├─ p ≤ 8 → kernelshap(exact=TRUE) — EXACT, FAST ├─ 8 < p ≤ 20 → kernelshap(hybrid) — HYBRID, MEDIUM ├─ 20 < p ≤ 40 → kernelshap() or permshap() — SAMPLING, SLOWER └─ p > 40 → permshap(low_memory=TRUE) — SLOW, MEMORY-EFFICIENT ``` -------------------------------- ### KernelSHAP for Large Explanation Datasets Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/README.md This snippet shows how to use KernelSHAP with a large explanation dataset, specifying background data and enabling parallel computation for efficiency. ```r s <- kernelshap(fit, large_X, bg_X = training_sample, parallel = TRUE) ``` -------------------------------- ### Integrate Ranger with KernelSHAP using caret Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Shows how to train a Ranger model using the caret package and then extract the final Ranger model for SHAP analysis. ```r library(caret) fit <- train( y ~ ., data = training_data, method = "ranger", num.trees = 500 ) # Extract ranger model fit_ranger <- fit$finalModel # Explain s <- kernelshap(fit_ranger, test_data[, -response_col]) ``` -------------------------------- ### Integrate Ranger with KernelSHAP using tidymodels Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Demonstrates how to train a Ranger model within a tidymodels workflow and then extract the Ranger model for use with KernelSHAP. ```r library(tidymodels) # Workflow with ranger workflow <- workflow() %>% add_recipe(recipe(...)) %>% add_model(rand_forest(trees = 500) %>% set_engine("ranger")) fit <- fit(workflow, training_data) # Extract ranger model fit_ranger <- fit$fit$fit # Explain with kernelshap s <- kernelshap(fit_ranger, bake(recipe(...), test_data)) ``` -------------------------------- ### Summarize Sampling-Based KernelSHAP Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/api-reference/kernelshap-class.md View the summary for a sampling-based kernelshap result, which includes convergence information, iterations, and samples per iteration. ```r s_sample <- kernelshap(fit, iris[-1], exact = FALSE) summary(s_sample) ``` -------------------------------- ### Integrate Ranger with KernelSHAP using mlr3 Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Illustrates training a Ranger model with mlr3 and extracting the model object for use with KernelSHAP. ```r library(mlr3) library(mlr3learners) task <- TaskClassif$new(id = "iris", backend = iris, target = "Species") learner <- lrn("classif.ranger", num.trees = 500) learner$train(task) # Extract ranger model fit_ranger <- learner$model # Explain s <- kernelshap(fit_ranger, iris[-5]) ``` -------------------------------- ### SHAP Calculation for Survival Probabilities with Ranger Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/ranger-integration.md Demonstrates calculating SHAP values for survival probabilities using the 'prob' survival parameter. ```r s <- kernelshap(fit, X, survival = "prob") ``` -------------------------------- ### Use Full Training Data as Background for KernelSHAP Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/examples-and-troubleshooting.md Utilize the complete training dataset as the background data for kernelshap to potentially improve the stability and interpretability of SHAP values. ```r s <- kernelshap(fit, X, bg_X = full_training_data) ``` -------------------------------- ### Provide Background Data for KernelSHAP Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/examples-and-troubleshooting.md Always specify background data (`bg_X`) for kernelshap, rather than relying on sampling from the input data (`X`). This improves result stability. ```r s <- kernelshap(fit, X, bg_X = training_data[, feature_cols]) ``` -------------------------------- ### Visualize SHAP Values with Shapviz Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/examples-and-troubleshooting.md Use the `shapviz` library to visualize SHAP results. This includes creating waterfall plots for individual observations and overall plots. ```r library(shapviz) sv <- shapviz(s) # Waterfall plot for single observation sv_waterfall(sv, row = 1) # Multiple plots plot(sv) ``` -------------------------------- ### Probabilistic Classification SHAP with Ranger Source: https://github.com/modeloriented/kernelshap/blob/main/README.md Demonstrates calculating SHAP values for probabilistic classification using the ranger package and permshap. Requires setting a seed for reproducibility. ```r library(kernelshap) library(ranger) library(shapviz) set.seed(1) # Probabilistic classification fit_prob <- ranger(Species ~ ., data = iris, probability = TRUE) ps_prob <- permshap(fit_prob, X = iris[-5]) |> shapviz() sv_importance(ps_prob) sv_dependence(ps_prob, "Petal.Length") ``` -------------------------------- ### Troubleshooting: Insufficient Background Data Error Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/examples-and-troubleshooting.md Addresses the 'X is too small to act as background data' error by providing explicit, larger background data (`bg_X`) when the explanation data (`X`) is small. ```r s <- kernelshap( fit, X = small_test_data, bg_X = larger_training_data[, feature_cols] ) ``` -------------------------------- ### S3 Methods for kernelshap Class Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/MANIFEST.txt Provides methods for interacting with kernelshap objects, including printing and summarizing results, and checking object type. ```APIDOC ## S3 Methods for kernelshap Class ### Description These methods allow users to interact with and inspect `kernelshap` objects. ### Methods - **`print(x, ...)`**: Prints a concise summary of the `kernelshap` object. - **`summary(object, ...)`**: Provides a more detailed summary of the `kernelshap` object, including statistics and key components. - **`is.kernelshap(x)`**: Checks if an object is of class `kernelshap`. ### Parameters - **x** (kernelshap object) - The object to print, summarize, or check. - **object** (kernelshap object) - The object to summarize. - **...** - Additional arguments passed to the methods. ### Request Example ```r # Assuming 'shap_result' is a kernelshap object print(shap_result) summary(shap_result) is.kernelshap(shap_result) ``` ### Response - **`print`**: Returns invisibly the object, prints to console. - **`summary`**: Returns a summary object. - **`is.kernelshap`**: Returns a logical value (TRUE or FALSE). ``` -------------------------------- ### Using Kernel SHAP for Exact Calculation Source: https://github.com/modeloriented/kernelshap/blob/main/_autodocs/README.md Shows how to use kernelshap for exact SHAP value calculation when the model has 8 or fewer features. This is the default behavior for kernelshap. ```r s <- kernelshap(fit, iris[-1]) # Exact, fast ```