### Install dalex Package using pip Source: https://github.com/modeloriented/dalex/blob/master/python/dalex/dalex/documentation.md This snippet shows how to install the dalex package using pip, including the command for installing optional dependencies for all additional features. ```console pip install dalex -U pip install dalex[full] ``` -------------------------------- ### Install DALEX R Package Source: https://github.com/modeloriented/dalex/blob/master/README.md Installs the DALEX package from CRAN, which is used for model-agnostic exploration and explanation in R. ```r install.packages("DALEX") ``` -------------------------------- ### Install dalex Python Package via Pip Source: https://github.com/modeloriented/dalex/blob/master/README.md Installs the dalex Python package using pip, enabling model-agnostic exploration and explanation for Python machine learning models. The -U flag ensures the latest version is installed. ```python pip install dalex -U ``` -------------------------------- ### Install dalex Package using conda Source: https://github.com/modeloriented/dalex/blob/master/python/dalex/dalex/documentation.md This snippet demonstrates how to install the dalex package using conda from the conda-forge channel. ```console conda install -c conda-forge dalex ``` -------------------------------- ### Create Interactive Dashboards with Arena in Python Source: https://context7.com/modeloriented/dalex/llms.txt The Arena class generates an interactive dashboard for exploring multiple models, observations, and datasets. It supports live server mode for local exploration and static JSON export for sharing. Requires DALEX Explainer objects and optionally observations to analyze. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier from sklearn.linear_model import LogisticRegression # Load data and create models titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) gb = GradientBoostingClassifier(n_estimators=100, random_state=42).fit(X, y) # Create explainers exp_rf = dx.Explainer(rf, X, y, label="Random Forest") exp_gb = dx.Explainer(gb, X, y, label="Gradient Boosting") # Create Arena arena = dx.Arena( precalculate=True, # Precalculate plots enable_attributes=True, # Enable observation attributes verbose=True ) # Push models to Arena arena.push_model(exp_rf) arena.push_model(exp_gb) # Push observations to explain arena.push_observations(X.iloc[:10]) # Run live server (opens browser) arena.run_server( host='127.0.0.1', port=8181 ) # Output: https://arena.drwhy.ai/?data=http://127.0.0.1:8181/ # Stop server when done # arena.stop_server() # Alternative: Generate static JSON for sharing # arena.save('arena_export.json') # Or upload to Arena cloud # arena.upload() ``` -------------------------------- ### Decompose Individual Predictions with Break Down and SHAP in R Source: https://context7.com/modeloriented/dalex/llms.txt Illustrates using `predict_parts` for instance-level variable attribution with methods like Break Down, SHAP values, and oscillations. It explains how to prepare data, create models/explainers, and visualize results. ```r library(DALEX) # Prepare data new_dragon <- data.frame( year_of_birth = 200, height = 80, weight = 12.5, scars = 0, number_of_lost_teeth = 5 ) # Create model and explainer dragon_lm <- lm(life_length ~ ., data = dragons) explainer_lm <- explain(dragon_lm, data = dragons, y = dragons$life_length, label = "LM") # Break Down attribution (default) bd <- predict_parts(explainer_lm, new_observation = new_dragon, type = "break_down") print(bd) plot(bd) # Break Down with interactions bd_int <- predict_parts(explainer_lm, new_observation = new_dragon, type = "break_down_interactions") plot(bd_int) # SHAP values (average over multiple orderings) shap <- predict_parts(explainer_lm, new_observation = new_dragon, type = "shap", B = 25) plot(shap) # Oscillations (sensitivity analysis) osc <- predict_parts(explainer_lm, new_observation = new_dragon, type = "oscillations") print(osc) # Kernel SHAP ks <- predict_parts(explainer_lm, new_observation = new_dragon, type = "kernel_shap") plot(ks) ``` -------------------------------- ### Create Surrogate Models with DALEX Source: https://context7.com/modeloriented/dalex/llms.txt Generates interpretable surrogate models (decision trees or linear models) to approximate complex black-box models. It uses the `model_surrogate` method and allows customization of variables and tree depth. Outputs include performance metrics and feature information. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create complex model titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest", model_type='classification') # Create decision tree surrogate tree_surrogate = explainer.model_surrogate( type='tree', max_vars=5, # Use top 5 most important variables max_depth=3 # Limit tree depth for interpretability ) # Access surrogate model attributes print(f"Surrogate R2 score: {tree_surrogate.performance:.3f}") print(f"Feature names: {tree_surrogate.feature_names}") print(f"Class names: {tree_surrogate.class_names}") # Plot decision tree tree_surrogate.plot() # Create linear surrogate linear_surrogate = explainer.model_surrogate( type='linear', max_vars=5 ) print(f"Linear surrogate R2: {linear_surrogate.performance:.3f}") linear_surrogate.plot() ``` -------------------------------- ### Generate LIME Explanations with DALEX Source: https://context7.com/modeloriented/dalex/llms.txt Creates local interpretable explanations for individual predictions using LIME. It takes a new observation and returns an explanation object that includes feature weights and local predictions. The explanation can be visualized and accessed programmatically. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Select observation observation = X.iloc[0] # LIME explanation lime_exp = explainer.predict_surrogate( new_observation=observation, type='lime' ) # Show explanation lime_exp.show_in_notebook() # Access feature weights print(lime_exp.as_list()) # Access local prediction print(lime_exp.local_pred) ``` -------------------------------- ### Generate Partial Dependence and ALE Profiles in R Source: https://context7.com/modeloriented/dalex/llms.txt Shows how to use `model_profile` to generate Partial Dependence Profiles (PDP) and Accumulated Local Effects (ALE) for single or multiple variables. It also covers clustered, grouped, and multi-model comparisons. ```r library(DALEX) library(ranger) # Create explainer titanic_rf <- ranger(survived ~ ., data = titanic_imputed, num.trees = 50, probability = TRUE) explainer_rf <- explain(titanic_rf, data = titanic_imputed[, -8], y = titanic_imputed$survived, label = "Random Forest") # Partial Dependence Profile for single variable pdp_age <- model_profile(explainer_rf, variables = "age", type = "partial") plot(pdp_age) # Multiple variables pdp_multi <- model_profile(explainer_rf, variables = c("age", "fare"), type = "partial", N = 100) plot(pdp_multi, variables = c("age", "fare")) # Accumulated Local Effects (better for correlated variables) ale_profile <- model_profile(explainer_rf, type = "accumulated") plot(ale_profile, geom = "profiles") # Clustered profiles (k clusters) clustered <- model_profile(explainer_rf, type = "partial", k = 3, center = TRUE) plot(clustered, geom = "profiles") # Grouped by categorical variable grouped <- model_profile(explainer_rf, type = "partial", groups = "gender") plot(grouped, geom = "profiles") # Compare multiple models titanic_glm <- glm(survived ~ ., data = titanic_imputed, family = "binomial") explainer_glm <- explain(titanic_glm, data = titanic_imputed[, -8], y = titanic_imputed$survived, label = "GLM") pdp_glm <- model_profile(explainer_glm, variables = "fare") pdp_rf_fare <- model_profile(explainer_rf, variables = "fare") plot(pdp_rf_fare, pdp_glm) ``` -------------------------------- ### Generate Ceteris Paribus Profiles in R Source: https://context7.com/modeloriented/dalex/llms.txt Demonstrates how to use `predict_profile` to generate instance-level Ceteris Paribus profiles. This shows how a model's prediction for a specific observation changes as individual variable values are modified. ```r library(DALEX) library(ranger) # Prepare observation new_dragon <- data.frame( year_of_birth = 200, height = 80, weight = 12.5, scars = 0, number_of_lost_teeth = 5 ) # Create model and explainer dragon_rf <- ranger(life_length ~ ., data = dragons, num.trees = 50) explainer_rf <- explain(dragon_rf, data = dragons, y = dragons$life_length, label = "RF") # Ceteris Paribus for selected variables cp <- predict_profile(explainer_rf, new_observation = new_dragon, variables = c("year_of_birth", "height", "weight")) head(cp) plot(cp, variables = c("year_of_birth", "height", "weight")) # All variables cp_all <- predict_profile(explainer_rf, new_observation = new_dragon) plot(cp_all) # Multiple observations multiple_dragons <- dragons[1:3, ] cp_multi <- predict_profile(explainer_rf, new_observation = multiple_dragons, variables = c("year_of_birth", "height")) plot(cp_multi, variables = c("year_of_birth", "height")) ``` -------------------------------- ### Create Model Explainer with DALEX Python Source: https://context7.com/modeloriented/dalex/llms.txt The `Explainer` class in DALEX Python acts as a unified wrapper for predictive models. It facilitates local and global explanations and is the primary entry point for DALEX functionalities. It supports various model types, including classification and regression, and allows for custom prediction functions. ```python import dalex as dx import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor from sklearn.linear_model import LogisticRegression, LinearRegression from sklearn.model_selection import train_test_split # Load sample data titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create classification model rf_model = RandomForestClassifier(n_estimators=100, random_state=42) rf_model.fit(X_train, y_train) # Create explainer explainer = dx.Explainer( model=rf_model, data=X_test, y=y_test, label="Random Forest", model_type="classification" ) # Regression example apartments = dx.datasets.load_apartments() X_apt = apartments.drop(columns='m2_price') y_apt = apartments['m2_price'] lr_model = LinearRegression() lr_model.fit(X_apt, y_apt) explainer_reg = dx.Explainer( model=lr_model, data=X_apt, y=y_apt, label="Linear Regression", model_type="regression" ) # Custom predict function for non-standard models def custom_predict(model, data): return model.predict_proba(data)[:, 1] explainer_custom = dx.Explainer( model=rf_model, data=X_test, y=y_test, predict_function=custom_predict, label="RF Custom" ) # Make predictions using explainer predictions = explainer.predict(X_test[:5]) print(predictions) ``` -------------------------------- ### model_profile() - Partial Dependence and ALE Profiles Source: https://context7.com/modeloriented/dalex/llms.txt Calculates model-level variable profiles including Partial Dependence Profiles (PDP) and Accumulated Local Effects (ALE). This method shows the average model behavior across variable ranges and is useful for understanding global model behavior. ```APIDOC ## model_profile() - Partial Dependence and ALE Profiles ### Description The `model_profile()` method calculates model-level variable profiles including Partial Dependence Profiles (PDP) and Accumulated Local Effects (ALE), showing average model behavior across variable ranges. ### Method `explainer.model_profile()` ### Parameters #### Path Parameters None #### Query Parameters - **type** (str) - Required - Type of profile to calculate. Options: 'partial', 'accumulated', 'conditional'. - **variables** (list of str) - Optional - List of variable names to analyze. - **N** (int) - Optional - Number of sampled observations to use for calculation. - **grid_points** (int) - Optional - Number of grid points to use for the profile. - **groups** (str) - Optional - Categorical variable to group the profiles by. - **variable_type** (str) - Optional - Specifies if the variable is 'categorical' or 'numerical'. ### Request Example ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Partial Dependence Profile pdp = explainer.model_profile( type='partial', # 'partial', 'accumulated', 'conditional' variables=['age', 'fare'], # Variables to analyze N=300, # Number of sampled observations grid_points=101 # Number of grid points ) print(pdp.result) pdp.plot(variables=['age', 'fare']) # Accumulated Local Effects (handles correlated variables better) ale = explainer.model_profile(type='accumulated', variables=['age']) ale.plot() # Grouped by categorical variable pdp_grouped = explainer.model_profile( type='partial', variables=['age'], groups='gender' ) pdp_grouped.plot() # Categorical variable profiles pdp_cat = explainer.model_profile( type='partial', variables=['class'], variable_type='categorical' ) pdp_cat.plot() # Compare multiple models from sklearn.linear_model import LogisticRegression lr = LogisticRegression(max_iter=1000).fit(X, y) exp_lr = dx.Explainer(lr, X, y, label="Logistic Regression") pdp_rf = explainer.model_profile(variables=['age']) pdp_lr = exp_lr.model_profile(variables=['age']) pdp_rf.plot(pdp_lr) ``` ### Response #### Success Response (200) - **result** (object) - The calculated profile data. - **plot()** (method) - Method to visualize the profile. #### Response Example (Output depends on the specific calculation and visualization, typically a pandas DataFrame for `result` and a plot object for `plot()`) ``` -------------------------------- ### Analyze Model Performance with model_performance() in DALEX Python Source: https://context7.com/modeloriented/dalex/llms.txt The `model_performance()` method computes various performance metrics tailored for classification (e.g., recall, precision, AUC) and regression (e.g., MSE, R2). It allows for direct comparison and visualization of performance across different models. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier from sklearn.linear_model import LogisticRegression # Load data and create models titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) gb = GradientBoostingClassifier(n_estimators=100, random_state=42).fit(X, y) lr = LogisticRegression(max_iter=1000).fit(X, y) # Create explainers exp_rf = dx.Explainer(rf, X, y, label="Random Forest") exp_gb = dx.Explainer(gb, X, y, label="Gradient Boosting") exp_lr = dx.Explainer(lr, X, y, label="Logistic Regression") # Calculate performance perf_rf = exp_rf.model_performance() perf_gb = exp_gb.model_performance() perf_lr = exp_lr.model_performance() # Access results print(perf_rf.result) # Output: DataFrame with recall, precision, f1, accuracy, auc # Visualize performance comparison perf_rf.plot(perf_gb, perf_lr) # ROC curve comparison perf_rf.plot(perf_gb, perf_lr, geom='roc') # LIFT curve perf_rf.plot(geom='lift') # Regression performance apartments = dx.datasets.load_apartments() X_apt = apartments.drop(columns='m2_price') y_apt = apartments['m2_price'] from sklearn.ensemble import RandomForestRegressor rf_reg = RandomForestRegressor(n_estimators=50).fit(X_apt, y_apt) exp_reg = dx.Explainer(rf_reg, X_apt, y_apt, label="RF Regression") perf_reg = exp_reg.model_performance() print(perf_reg.result) # mse, rmse, r2, mae, mad ``` -------------------------------- ### Calculate Variable Importance with Custom Loss Function in R Source: https://context7.com/modeloriented/dalex/llms.txt Demonstrates how to calculate variable importance using a custom loss function (Mean Absolute Error) with the `model_parts` function in DALEX. It also shows how to sample fewer observations for faster computation. ```r custom_loss <- function(observed, predicted) { mean(abs(observed - predicted)) # MAE } attr(custom_loss, "loss_name") <- "Mean Absolute Error" vi_custom <- model_parts(explainer_rf, loss_function = custom_loss, type = "raw") plot(vi_custom) # Sample fewer observations for faster computation vi_sampled <- model_parts(explainer_rf, N = 500, type = "raw") ``` -------------------------------- ### predict_profile() - Ceteris Paribus Profiles Source: https://context7.com/modeloriented/dalex/llms.txt Calculates instance-level profiles showing how individual predictions change when single variable values are modified. This is useful for 'What-If' analysis and understanding local model behavior. ```APIDOC ## predict_profile() - Ceteris Paribus Profiles ### Description The `predict_profile()` method calculates instance-level profiles showing how individual predictions change when single variable values are modified (What-If analysis). It helps understand the local behavior of the model around a specific observation. ### Method `explainer.predict_profile()` ### Parameters #### Path Parameters None #### Query Parameters - **new_observation** (object or DataFrame) - Required - The observation(s) for which to calculate the profile. - **variables** (list of str) - Optional - List of variables for which to generate profiles. - **grid_points** (int) - Optional - Number of points to use in the variable's grid. - **variable_splits_type** (str) - Optional - Method for splitting the variable range. Options: 'uniform', 'quantiles'. - **variable_splits** (dict) - Optional - Custom splits for specific variables. ### Request Example ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Select observation observation = X.iloc[[0]] # Ceteris Paribus for specific variables cp = explainer.predict_profile( new_observation=observation, variables=['age', 'fare'], grid_points=101, variable_splits_type='uniform' ) print(cp.result.head()) cp.plot(variables=['age', 'fare']) # All numerical variables cp_all = explainer.predict_profile(new_observation=observation) cp_all.plot() # Multiple observations comparison observations = X.iloc[:3] cp_multi = explainer.predict_profile( new_observation=observations, variables=['age'] ) cp_multi.plot() # Quantile-based splits (useful for skewed distributions) cp_quantile = explainer.predict_profile( new_observation=observation, variables=['fare'], variable_splits_type='quantiles' ) cp_quantile.plot() # Custom variable splits cp_custom = explainer.predict_profile( new_observation=observation, variable_splits={'age': [0, 18, 30, 45, 60, 80]} ) cp_custom.plot(variables=['age']) ``` ### Response #### Success Response (200) - **result** (object) - The calculated profile data, showing prediction changes for modified variable values. - **plot()** (method) - Method to visualize the profile. #### Response Example (Output depends on the specific calculation and visualization, typically a pandas DataFrame for `result` and a plot object for `plot()`) ``` -------------------------------- ### Calculate Model Profiles (PDP and ALE) with DALEX Source: https://context7.com/modeloriented/dalex/llms.txt The model_profile() method calculates model-level variable profiles such as Partial Dependence Profiles (PDP) and Accumulated Local Effects (ALE). It shows the average model behavior across variable ranges and can handle continuous and categorical variables, as well as compare profiles across multiple models. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Partial Dependence Profile pdp = explainer.model_profile( type='partial', # 'partial', 'accumulated', 'conditional' variables=['age', 'fare'], # Variables to analyze N=300, # Number of sampled observations grid_points=101 # Number of grid points ) print(pdp.result) pdp.plot(variables=['age', 'fare']) # Accumulated Local Effects (handles correlated variables better) ale = explainer.model_profile(type='accumulated', variables=['age']) ale.plot() # Grouped by categorical variable pdp_grouped = explainer.model_profile( type='partial', variables=['age'], groups='gender' ) pdp_grouped.plot() # Categorical variable profiles pdp_cat = explainer.model_profile( type='partial', variables=['class'], variable_type='categorical' ) pdp_cat.plot() # Compare multiple models from sklearn.linear_model import LogisticRegression lr = LogisticRegression(max_iter=1000).fit(X, y) exp_lr = dx.Explainer(lr, X, y, label="Logistic Regression") pdp_rf = explainer.model_profile(variables=['age']) pdp_lr = exp_lr.model_profile(variables=['age']) pdp_rf.plot(pdp_lr) ``` -------------------------------- ### Create Model Explainer with explain() in R Source: https://context7.com/modeloriented/dalex/llms.txt The `explain()` function in DALEX (R) creates a unified explainer object from any predictive model. It wraps the model with data, target variable, and prediction functions for consistent analysis. This function supports regression and classification tasks and can handle various model types, including those from scikit-learn, XGBoost, and TensorFlow. ```r library(DALEX) library(ranger) # Load sample data data(apartments) data(titanic_imputed) # Regression example with linear model lm_model <- lm(m2.price ~ ., data = apartments) lm_explainer <- explain( model = lm_model, data = apartments[, -1], # Exclude target column y = apartments$m2.price, label = "Linear Regression" ) # Regression example with random forest rf_model <- ranger(m2.price ~ ., data = apartments, num.trees = 50) rf_explainer <- explain( model = rf_model, data = apartments[, -1], y = apartments$m2.price, label = "Random Forest", predict_function = function(model, newdata) predict(model, newdata)$predictions ) # Binary classification example glM_model <- glm(survived ~ ., data = titanic_imputed, family = "binomial") glM_explainer <- explain( model = glm_model, data = titanic_imputed[, -8], # Exclude 'survived' column y = titanic_imputed$survived, label = "Logistic Regression", type = "classification" ) # Output: explainer object with model, data, y, predict_function, residuals, label, model_info print(lm_explainer) ``` -------------------------------- ### Calculate Ceteris Paribus Profiles with DALEX Source: https://context7.com/modeloriented/dalex/llms.txt The predict_profile() method calculates instance-level profiles, performing a 'What-If' analysis by showing how individual predictions change when single variable values are modified. It supports custom variable splits, quantile-based splits, and can be applied to multiple observations simultaneously. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Select observation observation = X.iloc[[0]] # Ceteris Paribus for specific variables cp = explainer.predict_profile( new_observation=observation, variables=['age', 'fare'], grid_points=101, variable_splits_type='uniform' ) print(cp.result.head()) cp.plot(variables=['age', 'fare']) # All numerical variables cp_all = explainer.predict_profile(new_observation=observation) cp_all.plot() # Multiple observations comparison observations = X.iloc[:3] cp_multi = explainer.predict_profile( new_observation=observations, variables=['age'] ) cp_multi.plot() # Quantile-based splits (useful for skewed distributions) cp_quantile = explainer.predict_profile( new_observation=observation, variables=['fare'], variable_splits_type='quantiles' ) cp_quantile.plot() # Custom variable splits cp_custom = explainer.predict_profile( new_observation=observation, variable_splits={'age': [0, 18, 30, 45, 60, 80]} ) cp_custom.plot(variables=['age']) ``` -------------------------------- ### Perform Fairness Analysis with model_fairness() in Python Source: https://context7.com/modeloriented/dalex/llms.txt The model_fairness() method analyzes bias in machine learning models across protected subgroups. It requires a DALEX Explainer object, a protected attribute array, and optionally privileged subgroup, cutoff, and epsilon values. It outputs fairness metrics and visualizations. ```python import dalex as dx import numpy as np from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest", model_type='classification') # Create protected attribute (must be same length as data) protected = np.array(X['gender']) # Fairness analysis fairness = explainer.model_fairness( protected=protected, privileged='male', # Privileged subgroup cutoff=0.5, # Classification threshold epsilon=0.8 # Fairness threshold (80% rule) ) # Fairness check (prints summary) fairness.fairness_check() # Access metrics print(fairness.result) # Metric ratios print(fairness.metric_scores) # Raw metric scores print(fairness.parity_loss) # Parity loss summary # Visualize fairness fairness.plot(type='fairness_check') fairness.plot(type='metric_scores') fairness.plot(type='stacked') fairness.plot(type='radar') fairness.plot(type='heatmap') # Performance vs fairness tradeoff fairness.plot(type='performance_and_fairness', fairness_metric='TPR', performance_metric='accuracy') # Cutoff analysis for specific subgroup fairness.plot(type='ceteris_paribus_cutoff', subgroup='female') # Compare multiple models from sklearn.linear_model import LogisticRegression lr = LogisticRegression(max_iter=1000).fit(X, y) exp_lr = dx.Explainer(lr, X, y, label="Logistic Regression", model_type='classification') fair_lr = exp_lr.model_fairness(protected=protected, privileged='male') fairness.plot(fair_lr, type='fairness_check') ``` -------------------------------- ### predict_parts() - Instance Level Attribution Source: https://context7.com/modeloriented/dalex/llms.txt Decomposes individual predictions into variable contributions using methods like Break Down or SHAP. This is useful for explaining why a specific prediction was made for a given instance. ```APIDOC ## predict_parts() - Instance Level Attribution ### Description The `predict_parts()` method decomposes individual predictions into variable contributions using Break Down, SHAP, or other attribution methods. It helps in understanding the contribution of each feature to a specific prediction. ### Method `explainer.predict_parts()` ### Parameters #### Path Parameters None #### Query Parameters - **new_observation** (object or DataFrame) - Required - The observation(s) for which to calculate attribution. - **type** (str) - Required - The attribution method to use. Options: 'break_down', 'break_down_interactions', 'shap', 'shap_wrapper'. - **B** (int) - Optional - Number of random paths to use for SHAP calculation. - **processes** (int) - Optional - Number of parallel processes to use. - **random_state** (int) - Optional - Seed for random number generation. ### Request Example ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Select observation to explain observation = X.iloc[[0]] print(f"Prediction: {explainer.predict(observation)[0]:.3f}") # Break Down (with interactions) bd = explainer.predict_parts( new_observation=observation, type='break_down_interactions' ) print(bd.result) bd.plot() # Break Down (without interactions - faster) bd_simple = explainer.predict_parts( new_observation=observation, type='break_down' ) bd_simple.plot() # SHAP values (average over permutations) shap = explainer.predict_parts( new_observation=observation, type='shap', B=25, # Number of random paths processes=1, # Parallel processes random_state=42 ) shap.plot() # SHAP wrapper (uses shap library) shap_wrapper = explainer.predict_parts( new_observation=observation, type='shap_wrapper' ) shap_wrapper.plot() # Multiple observations observations = X.iloc[:3] for i, obs in enumerate(observations.iterrows()): bd = explainer.predict_parts(new_observation=X.iloc[[obs[0]]], type='break_down') print(f"Observation {i}: Prediction = {explainer.predict(X.iloc[[obs[0]]])[0]:.3f}") ``` ### Response #### Success Response (200) - **result** (object) - The calculated attribution data (e.g., contributions for each variable). - **plot()** (method) - Method to visualize the attribution. #### Response Example (Output depends on the specific calculation and visualization, typically a pandas DataFrame for `result` and a plot object for `plot()`) ``` -------------------------------- ### Perform Residual Diagnostics with model_diagnostics() in Python Source: https://context7.com/modeloriented/dalex/llms.txt The model_diagnostics() method analyzes model residuals for regression tasks to detect issues like poor model fit, outliers, and systematic errors. It requires a DALEX Explainer object for a regression model and outputs diagnostic metrics and plots. ```python import dalex as dx from sklearn.ensemble import RandomForestRegressor # Regression example apartments = dx.datasets.load_apartments() X = apartments.drop(columns='m2_price') y = apartments['m2_price'] rf = RandomForestRegressor(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Calculate diagnostics diagnostics = explainer.model_diagnostics() # Access results print(diagnostics.result.head()) # Output: DataFrame with y, y_hat, residuals, abs_residuals, and variable values # Visualize residuals diagnostics.plot() # Default: residuals vs predicted diagnostics.plot(variable='y_hat') # Residuals vs predicted values diagnostics.plot(variable='y') # Residuals vs actual values diagnostics.plot(variable='surface') # Residuals vs specific variable # Compare multiple models from sklearn.linear_model import LinearRegression lr = LinearRegression().fit(X, y) exp_lr = dx.Explainer(lr, X, y, label="Linear Regression") diag_lr = exp_lr.model_diagnostics() diagnostics.plot(diag_lr) ``` -------------------------------- ### Decompose Individual Predictions with DALEX Source: https://context7.com/modeloriented/dalex/llms.txt The predict_parts() method decomposes individual predictions into variable contributions using methods like Break Down and SHAP. It allows for analysis with or without interactions and can handle multiple observations, providing insights into how each feature affects a specific prediction. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Select observation to explain observation = X.iloc[[0]] print(f"Prediction: {explainer.predict(observation)[0]:.3f}") # Break Down (with interactions) bd = explainer.predict_parts( new_observation=observation, type='break_down_interactions' ) print(bd.result) bd.plot() # Break Down (without interactions - faster) bd_simple = explainer.predict_parts( new_observation=observation, type='break_down' ) bd_simple.plot() # SHAP values (average over permutations) shap = explainer.predict_parts( new_observation=observation, type='shap', B=25, # Number of random paths processes=1, # Parallel processes random_state=42 ) shap.plot() # SHAP wrapper (uses shap library) shap_wrapper = explainer.predict_parts( new_observation=observation, type='shap_wrapper' ) shap_wrapper.plot() # Multiple observations observations = X.iloc[:3] for i, obs in enumerate(observations.iterrows()): bd = explainer.predict_parts(new_observation=X.iloc[[obs[0]]], type='break_down') print(f"Observation {i}: Prediction = {explainer.predict(X.iloc[[obs[0]]])[0]:.3f}") ``` -------------------------------- ### Calculate Variable Importance with model_parts() in DALEX Python Source: https://context7.com/modeloriented/dalex/llms.txt The `model_parts()` method calculates permutation-based variable importance by measuring the impact of shuffling variable values on model performance. It supports various loss functions and visualization options, including ratios and grouped variables, and can utilize parallel processing. ```python import dalex as dx from sklearn.ensemble import RandomForestClassifier # Create model and explainer titanic = dx.datasets.load_titanic() X = titanic.drop(columns='survived') y = titanic['survived'] rf = RandomForestClassifier(n_estimators=100, random_state=42).fit(X, y) explainer = dx.Explainer(rf, X, y, label="Random Forest") # Calculate variable importance vi = explainer.model_parts( loss_function='1-auc', # Loss function: 'rmse', '1-auc', 'mse', 'mae' type='variable_importance', # Type: 'variable_importance', 'ratio', 'difference' N=1000, # Number of sampled observations B=10, # Number of permutation rounds random_state=42 ) # Access results print(vi.result) # Output: DataFrame with variable, dropout_loss, label columns # Visualize vi.plot() # Variable importance as ratio vi_ratio = explainer.model_parts(type='ratio') vi_ratio.plot() # Group variables vi_grouped = explainer.model_parts( variable_groups={'demographics': ['gender', 'age'], 'ticket': ['fare', 'class']} ) vi_grouped.plot() # Parallel computation vi_parallel = explainer.model_parts(processes=4, B=20) ``` -------------------------------- ### Calculate Model Performance with model_performance() in R Source: https://context7.com/modeloriented/dalex/llms.txt The `model_performance()` function in DALEX (R) computes comprehensive performance measures for regression and classification models. For regression, it provides MSE, RMSE, R-squared, and MAD. For classification, it calculates F1, accuracy, recall, precision, and AUC. The results can be visualized and compared across different models. ```r library(DALEX) library(ranger) # Regression performance apartments_rf <- ranger(m2.price ~ ., data = apartments, num.trees = 50) explainer_rf <- explain(apartments_rf, data = apartments[, -1], y = apartments$m2.price, label = "RF Apartments") perf_rf <- model_performance(explainer_rf) print(perf_rf) # $measures: mse, rmse, r2, mad # Compare with linear model apartments_lm <- lm(m2.price ~ ., data = apartments) explainer_lm <- explain(apartments_lm, data = apartments[, -1], y = apartments$m2.price, label = "LM Apartments") perf_lm <- model_performance(explainer_lm) # Visualize comparison plot(perf_rf, perf_lm) # Default ECDF plot plot(perf_rf, perf_lm, geom = "boxplot") # Boxplot of residuals plot(perf_rf, perf_lm, geom = "histogram") # Histogram of residuals # Classification performance titanic_glm <- glm(survived ~ ., data = titanic_imputed, family = "binomial") explainer_glm <- explain(titanic_glm, data = titanic_imputed[, -8], y = titanic_imputed$survived) perf_glm <- model_performance(explainer_glm, cutoff = 0.5) print(perf_glm) # $measures: recall, precision, f1, accuracy, auc ``` -------------------------------- ### DALEX JMLR Paper Citation Source: https://github.com/modeloriented/dalex/blob/master/python/dalex/dalex/documentation.md This snippet provides the citation details for the Journal of Machine Learning Research (JMLR) paper about the dalex package, including authors, title, journal, year, volume, number, pages, and URL. ```html
@article{JMLR:v22:20-1473,
  author  = {Hubert Baniecki and
             Wojciech Kretowicz and
             Piotr Piatyszek and 
             Jakub Wisniewski and 
             Przemyslaw Biecek},
  title   = {dalex: Responsible Machine Learning 
             with Interactive Explainability and Fairness in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {214},
  pages   = {1-7},
  url     = {http://jmlr.org/papers/v22/20-1473.html}
}
``` -------------------------------- ### Calculate Variable Importance with model_parts() in R Source: https://context7.com/modeloriented/dalex/llms.txt The `model_parts()` function in DALEX (R) calculates dataset-level variable importance using permutation-based methods. It quantifies a variable's contribution to predictive performance by measuring the change in model loss when the variable's values are shuffled. This function supports different calculation types: raw loss, ratio to full model loss, and difference from full model loss. ```r library(DALEX) library(ranger) # Create explainer apartments_rf <- ranger(m2.price ~ ., data = apartments, num.trees = 50) explainer_rf <- explain(apartments_rf, data = apartments[, -1], y = apartments$m2.price, label = "Random Forest") # Calculate variable importance (raw dropout loss) vi_rf <- model_parts(explainer_rf, type = "raw") head(vi_rf, 10) plot(vi_rf) # Calculate as ratio to full model loss vi_ratio <- model_parts(explainer_rf, type = "ratio") plot(vi_ratio) # Calculate as difference from full model vi_diff <- model_parts(explainer_rf, type = "difference") plot(vi_diff) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.