### Install Interpret Library Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-classification.ipynb Installs the interpret library and its dependencies if not already present. This is a prerequisite for running the examples. ```python # install interpret if not already installed try: import interpret except ModuleNotFoundError: !pip install --quiet interpret pandas scikit-learn ``` -------------------------------- ### Install Interpret Library Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-regression.ipynb Installs the interpret library and scikit-learn if they are not already present. This is a prerequisite for running the examples. ```python try: import interpret except ModuleNotFoundError: !pip install --quiet interpret scikit-learn ``` -------------------------------- ### Install Interpret from PyPI and Dependencies Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt Installs the interpret package from PyPI along with Jupyter and other necessary libraries for running example notebooks. Remove 'lime' if not used in examples. ```bash pip install jupyter interpret lime ``` -------------------------------- ### Install interpret-core with specific dependencies (pip) Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb Install interpret-core with a selection of extra dependencies using pip. This allows for fine-tuning the installed components. ```bash pip install interpret-core[debug,notebook,plotly,lime,sensitivity,shap,linear,skoperules,treeinterpreter,aplr,dash,testing] ``` -------------------------------- ### Install InterpretML from source Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb Clone the InterpretML repository and install from source. Ensure you have Python 3.10+. ```bash git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install ``` -------------------------------- ### Install interpret from source with all dependencies Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb Follow these steps to clone the repository and install the interpret package from source with all dependencies. ```bash git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install ``` -------------------------------- ### Install Interpret Core with EBM Explainer Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/faq.ipynb Install only the EBM explainer and its required dependencies from interpret-core using pip. The 'required' tag is generally recommended for all installs. ```sh pip install interpret-core[required,ebm] ``` -------------------------------- ### Install interpret-core from source with minimal dependencies Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb Follow these steps to clone the repository and install the interpret-core package from source with minimal dependencies. ```bash git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install-core ``` -------------------------------- ### Install Node.js Version 22 using nvm Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb Install and use Node.js version 22, which is required for building the visualization bundle. This ensures compatibility with CI processes. Verify the installation by checking the Node and npm versions. ```shell nvm install 22 nvm use 22 node --version # should report v22.x npm --version ``` -------------------------------- ### Install Powerlift Library Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Installs the 'powerlift' Python package with optional dataset and postgres support if it is not already found. ```python # install powerlift if not already installed try: import powerlift except ModuleNotFoundError: !pip install -U --quiet powerlift[datasets,postgres] ``` -------------------------------- ### Install Interpret Library Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-regression-synthetic.ipynb Installs the interpret library and scikit-learn if they are not already present. This is a prerequisite for running the rest of the notebook. ```python # install interpret if not already installed try: import interpret except ModuleNotFoundError: !pip install --quiet interpret scikit-learn ``` -------------------------------- ### Install Interpret Library Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Installs the 'interpret-core' Python package if it is not already found in the current environment. ```python # install interpret if not already installed try: import interpret except ModuleNotFoundError: !pip install -U --quiet interpret-core ``` -------------------------------- ### Install interpret Package Source: https://github.com/interpretml/interpret/blob/main/docs/excel_exporter/excel_exporter.ipynb Installs the interpret package from the local source. Ensure you are in the correct directory before running. ```python # import sys # !{sys.executable} -m pip install -e ./interpret/python/interpret-core/ ``` -------------------------------- ### Start JS Development Server Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md Starts a development server for the JavaScript visualization bundle using webpack-dev-server. This provides live reloading and is useful during development. ```bash cd shared/vis && npm start ``` -------------------------------- ### Install interpret-core with minimal dependencies (pip) Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb Use this command to install the interpret-core package with only the essential dependencies for fitting, predicting, and explaining. ```bash pip install interpret-core ``` -------------------------------- ### Install interpret with all dependencies (pip) Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb Use this command to install the interpret package with all its dependencies, suitable for general use. ```bash pip install interpret ``` -------------------------------- ### Install Interpret Package Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/explain-blackbox-classifiers.ipynb Installs the interpret package and its dependencies if not already present. This is a prerequisite for running the rest of the notebook. ```python try: import interpret except ModuleNotFoundError: !pip install --quiet interpret pandas scikit-learn lime ``` -------------------------------- ### Install ebm2onnx package Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/faq.ipynb Install the `ebm2onnx` package from PyPi to enable high-speed inference on EBM objects through ONNX compatible runtimes. ```bash pip install ebm2onnx ``` -------------------------------- ### Load and Train Model for Explanation Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/framework.ipynb Loads the adult dataset, preprocesses it, and trains an Explainable Boosting Classifier. This setup is required before generating explanations. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from interpret.glassbox import ExplainableBoostingClassifier from interpret import show from sklearn.datasets import fetch_openml data = fetch_openml("adult", version=2, as_frame=True) X = data.data X = X.dropna() X.columns = [ "Age", "WorkClass", "fnlwgt", "Education", "EducationNum", "MaritalStatus", "Occupation", "Relationship", "Race", "Gender", "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", ] y = (data.target == ">50K").astype(int) y = y[X.index] seed = 42 np.random.seed(seed) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=seed ) ebm = ExplainableBoostingClassifier() ebm.fit(X_train, y_train) ``` -------------------------------- ### Install InterpretML Package Source: https://github.com/interpretml/interpret/blob/main/README.md Install the InterpretML package using pip or conda. Ensure you have Python 3.10+ installed. ```sh pip install interpret ``` ```sh conda install -c conda-forge interpret ``` -------------------------------- ### Install InterpretML using pip Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb Use this command to install the InterpretML library via pip. Ensure you have Python 3.10+. ```bash pip install interpret ``` -------------------------------- ### Install interpret with all dependencies (conda) Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb Use this command to install the interpret package with all its dependencies using conda. ```bash conda install -c conda-forge interpret ``` -------------------------------- ### Install interpret and dependencies Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/prototype-selection.ipynb Installs the interpret library and necessary dependencies like numpy, scikit-learn, and matplotlib if they are not already present. ```python # install interpret if not already installed try: import interpret except ModuleNotFoundError: !pip install --quiet interpret numpy scikit-learn matplotlib ``` -------------------------------- ### Install interpret-core with minimal dependencies (conda) Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb Use this command to install the interpret-core package with only the essential dependencies using conda. ```bash conda install -c conda-forge interpret-core ``` -------------------------------- ### Install Benchmark Dependencies Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Installs specific versions of required Python packages for reproducibility, including numpy, pandas, scikit-learn, optuna, xgboost, lightgbm, catboost, aplr, and tabpfn. ```python # use exact versions for reproducibility of the RANK ordering requirements = "numpy==1.26.4 pandas==2.2.2 scikit-learn==1.5.1 optuna==4.0.0 optuna-integration==4.0.0 xgboost==2.1.0 lightgbm==4.5.0 catboost==1.2.5 aplr==10.6.1 tabpfn==2.0.1 autogluon.tabular==1.5" !pip install -U --quiet {requirements} ``` -------------------------------- ### Build JS Visualization Bundle for Production Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md Builds the JavaScript visualization bundle for production use. This command should be run from the `shared/vis/` directory and installs npm dependencies before building. ```bash cd shared/vis && npm install && npm run build-prod ``` -------------------------------- ### Build interpret-inline.js (Production) Source: https://github.com/interpretml/interpret/blob/main/shared/vis/CONTRIBUTING.md Installs dependencies and builds the production version of interpret-inline.js. The output is a minified file located at dist/interpret-inline.js. ```bash cd shared/vis npm install npm run build-prod ``` -------------------------------- ### Install Interpret-Core with Debug and Visualization Dependencies Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb Install the interpret-core Python package in editable mode, including dependencies for debugging, notebooks, Plotly, and various explainers like LIME, SHAP, and others. ```shell cd ../../python/interpret-core pip install -e .[debug,notebook,plotly,lime,sensitivity,shap,linear,skoperules,treeinterpreter,aplr,dash,testing] ``` -------------------------------- ### Setup Inline Visualization Provider Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt Configures the Interpret library to use the InlineProvider for visualizations within Jupyter notebooks. This is crucial for testing the NPM inline JS package. ```python from interpret import set_visualize_provider from interpret.provider import InlineProvider set_visualize_provider(InlineProvider()) ``` -------------------------------- ### Install Python Development Dependencies Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md Installs the InterpretML Python package in editable mode with development dependencies. Includes optional extras for debugging, notebooks, and various visualization/analysis libraries. ```bash pip install -e ".[debug,notebook,plotly,lime,sensitivity,shap,linear,treeinterpreter,aplr,dash,skoperules,excel,testing]" ``` -------------------------------- ### Run Local Benchmark Experiment Source: https://github.com/interpretml/interpret/blob/main/python/powerlift/README.md Initializes the Powerlift store, populates it with datasets, and runs a benchmark experiment locally. Ensure the database connection string is correctly set. ```python import os from powerlift.bench import Benchmark, Store from powerlift.bench import populate_with_datasets # Initialize database (if needed). conn_str = f"sqlite:///{os.getcwd()}/powerlift.db" store = Store(conn_str, force_recreate=False) # This downloads datasets once and feeds into the database. populate_with_datasets(store, cache_dir="~/.powerlift", exist_ok=True) # Run experiment benchmark = Benchmark(f"sqlite:///{os.getcwd()}/powerlift.db", name="SVM vs RF") benchmark.run(trial_runner, trial_filter) benchmark.wait_until_complete() ``` -------------------------------- ### Build Interpret and Visualization Bundle Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb Navigate to the Interpret directory, build the core library using the provided script (build.sh or build.bat), and then build the production-ready visualization bundle. ```shell cd interpret ./build.sh # OR use build.bat on Windows cd shared/vis npm run clean && npm install && npm run build-prod ``` -------------------------------- ### Configure Data Store and Benchmark Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Sets up the connection string for the data store (either Azure or SQLite) and initializes the Powerlift Benchmark object. Requires Azure credentials and environment variables to be set if `is_azure` is True. ```python import os if is_azure: import requests import json import subprocess from azure.identity import AzureCliCredential credential = AzureCliCredential() access_token = credential.get_token("https://graph.microsoft.com/.default").token headers = { "Authorization": f"Bearer {access_token}", "Content-Type": "application/json", } azure_client_id = ( requests.get("https://graph.microsoft.com/v1.0/me", headers=headers) .json() .get("id") ) azure_tenant_id = ( requests.get("https://graph.microsoft.com/v1.0/organization", headers=headers) .json()["value"][0] .get("id") ) subscription_id = json.loads( subprocess.run( "az account show", capture_output=True, text=True, shell=True ) .stdout ).get("id") from dotenv import load_dotenv load_dotenv() conn_str = os.getenv("DOCKER_DB_URL") resource_group = os.getenv("AZURE_RESOURCE_GROUP") else: conn_str = f"sqlite:///{os.getcwd()}/powerlift.db" from powerlift.bench import Store, Benchmark store = Store(conn_str, force_recreate=force_recreate) benchmark = Benchmark(store, name=experiment_name) ``` -------------------------------- ### Run Unit Tests Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb Navigate to the scripts directory and execute the 'make test' command to run all unit tests for the Interpret project. ```shell cd interpret/scripts make test ``` -------------------------------- ### Install InterpretML using conda Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb Use this command to install the InterpretML library via conda. Ensure you have Python 3.10+. ```bash conda install -c conda-forge interpret ``` -------------------------------- ### Configure LGBM, CatBoost, and RF Parameters Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Configures parameters for LightGBM (verbosity), CatBoost (verbose), and Random Forest variants (enable_categorical, feature_types, random_state, n_jobs). ```python lgbm_params["verbosity"] = -1 catboost_params["verbose"] = False rf_xgb_params["enable_categorical"] = True rf_xgb_params["feature_types"] = ["c" if cat else "q" for cat in cat_bools] rf_sk_params["random_state"] = seed rf_sk_params["n_jobs"] = -1 ``` -------------------------------- ### Define Trial Runner Configuration Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Sets up the runner for trials, including seed initialization and importing necessary libraries for various machine learning models. ```python def trial_runner(trial): seed = 42 seed += trial.replicate_num max_samples = None n_calibration_folds = 4 # 4 uses all cores on the containers from interpret.glassbox import ( ExplainableBoostingClassifier, ExplainableBoostingRegressor, ) from interpret.develop import set_option from interpret.utils._native import Native from xgboost import XGBClassifier, XGBRegressor, XGBRFClassifier, XGBRFRegressor from lightgbm import LGBMClassifier, LGBMRegressor from catboost import CatBoostClassifier, CatBoostRegressor from autogluon.tabular import TabularPredictor from sklearn.ensemble import ( RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, ) from sklearn.linear_model import ( LogisticRegression, LinearRegression, ElasticNet, SGDClassifier, SGDRegressor, ) from sklearn.svm import LinearSVC, LinearSVR, SVC, SVR from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor from sklearn.neural_network import MLPClassifier, MLPRegressor from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor from aplr import APLRClassifier, APLRRegressor ``` -------------------------------- ### Run Benchmark Experiment on Azure Container Instances Source: https://github.com/interpretml/interpret/blob/main/python/powerlift/README.md Configures and runs a Powerlift benchmark experiment using Azure Container Instances as the executor. Requires Azure environment variables to be set. ```python # Run experiment (but in ACI). from powerlift.executors import AzureContainerInstance store = Store(os.getenv("AZURE_DB_URL")) azure_tenant_id = os.getenv("AZURE_TENANT_ID") subscription_id = os.getenv("AZURE_SUBSCRIPTION_ID") azure_client_id = os.getenv("AZURE_CLIENT_ID") azure_client_secret = os.getenv("AZURE_CLIENT_SECRET") resource_group = os.getenv("AZURE_RESOURCE_GROUP") executor = AzureContainerInstance( store, azure_tenant_id, subscription_id, azure_client_id, azure_client_secret=azure_client_secret, resource_group=resource_group, n_running_containers=5 ) benchmark = Benchmark(store, name="SVM vs RF") benchmark.run(trial_runner, trial_filter, timeout=10, executor=executor) benchmark.wait_until_complete() ``` -------------------------------- ### Install Interpret Conda Package and Dependencies Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt Installs the interpret-core and interpret packages from conda-forge, along with necessary Python libraries for visualization and notebook execution. ```bash conda install --yes -c conda-forge interpret-core psutil ipykernel ipython plotly lime SALib shap dill dash dash-core-components dash-html-components dash-table dash_cytoscape gevent requests ``` -------------------------------- ### Generate Local Explanations with LIME Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb This snippet demonstrates how to initialize the LimeTabular explainer and generate local explanations for the first five instances in the test set. Ensure that 'blackbox_model', 'X_train', 'X_test', 'y_test', and 'seed' are defined prior to execution. ```python from interpret.blackbox import LimeTabular from interpret import show lime = LimeTabular(blackbox_model, X_train, random_state=seed) show(lime.explain_local(X_test[:5], y_test[:5]), 0) ``` -------------------------------- ### Define Benchmark Parameters Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Sets up experiment parameters including Azure status, replication counts, instance counts, and file paths for wheels. It also defines a unique experiment name based on the current timestamp. ```python # Within this notebook we adopt the convention that for metrics lower is better. # For metrics where higher is better like AUC we flip the sign to negative. is_azure = False # if this is set to True, login with 'az login' before n_replicates = 25 # 25 is almost costless since with 30 we get to saturation of 650 runners in 2 hours n_instances = 810 force_recreate = False exist_ok = True TIMEOUT_SEC = 60 * 60 * 24 * 180 # 180 days wheel_filepaths = [ "interpret_core-0.7.1-py3-none-any.whl", "powerlift-0.1.12-py3-none-any.whl", ] import datetime experiment_name = datetime.datetime.now().strftime("%Y_%m_%d_%H%M__") + "myexperiment" # experiment_name = 'yyyy_mm_dd_hhmm__myexperiment' print("Experiment name: " + experiment_name) ``` -------------------------------- ### Configure SVC with Pipeline Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Sets up a pipeline with SVC. Includes dataset-specific subsampling for large datasets like Fashion-MNIST and CIFAR-10 to prevent crashes or long fit times. ```python est = Pipeline( [ ("p", p), ("est", SVC(**svm_params)), ] ) ``` -------------------------------- ### Import and Check Interpret Package Source: https://github.com/interpretml/interpret/blob/main/docs/excel_exporter/excel_exporter.ipynb Imports the interpret package and verifies its installation path. This is a basic check to ensure the library is accessible. ```python import interpret interpret.__file__ ``` -------------------------------- ### Initialize Pipeline with KNeighborsClassifier Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Creates a pipeline with a preprocessing step 'p' and a K-Nearest Neighbors Classifier (KNeighborsClassifier). Use for classification tasks where instance-based learning is appropriate. ```python est = Pipeline([("p", p), ("est", KNeighborsClassifier(**knn_params))]) ``` -------------------------------- ### Get current process ID Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/debugging-guide.ipynb Use `os.getpid()` in your Python script to print the current process ID, which can be helpful when attaching the C++ debugger. ```python print('Current PID = {}'.format(os.getpid())) ``` -------------------------------- ### Initialize Pipeline with Calibrated SVC Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Creates a pipeline with a preprocessing step 'p' and a CalibratedClassifierCV using SVC. Use for classification tasks requiring probability calibration. ```python est = Pipeline([ ("p", p), ( "est", CalibratedClassifierCV( SVC(**svm_params), n_jobs=-1, cv=n_calibration_folds ), ), ]) ``` -------------------------------- ### Understand Global Model Behavior with EBM Source: https://github.com/interpretml/interpret/blob/main/README.md Use this snippet to generate and display a global explanation for an EBM model. Ensure the 'interpret' library is installed. ```python from interpret import show ebm_global = ebm.explain_global() show(ebm_global) ``` -------------------------------- ### Initialize Pipeline with SVC Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Creates a pipeline with a preprocessing step 'p' and a Support Vector Classifier (SVC) with specified SVM parameters. Use for classification tasks where SVC is appropriate. ```python est = Pipeline([("p", p), ("est", SVC(**svm_params))]) ``` -------------------------------- ### Monotonize EBM Feature Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-classification.ipynb Applies a monotonic constraint to the 'Age' feature of the trained EBM, ensuring that the model's prediction increases with age. This is an example of post-processing EBMs. ```python # post-process monotonize the Age feature ebm.monotonize("Age", increasing=True) ``` -------------------------------- ### Configure EBM and XGBoost Parameters Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Sets up parameters for EBM and XGBoost, including feature types, number of jobs, and random state. XGBoost specific parameters like enable_categorical are also configured. ```python ebm_params["feature_types"] = [ "nominal" if cat else "continuous" for cat in cat_bools ] ebm_params["n_jobs"] = -1 ebm_params["random_state"] = seed xgb_params["enable_categorical"] = True xgb_params["feature_types"] = ["c" if cat else "q" for cat in cat_bools] ``` -------------------------------- ### Create and Activate PyPI Test Conda Environment Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt Sets up a new Conda environment specifically for testing the PyPI release of the interpret package. This ensures isolation from other installed packages. ```bash conda env remove --name interpret_pypi && conda create --yes --name interpret_pypi python=3.10 && conda activate interpret_pypi ``` -------------------------------- ### Generate Synthetic Dataset and Split Data Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-regression-synthetic.ipynb Generates a synthetic dataset with specified parameters and splits it into training and testing sets. This setup is crucial for evaluating the EBM's performance. ```python # boilerplate - generate the synthetic dataset and split into test/train import numpy as np from sklearn.model_selection import train_test_split from interpret.utils import make_synthetic from interpret import show seed = 42 X, y, names, types = make_synthetic( classes=None, n_samples=50000, missing=False, seed=seed ) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=seed ) ``` -------------------------------- ### Get and Display Group Importances Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/group-importances.ipynb Calculates the importance of specified feature groups for a given model and dataset. It then iterates through the resulting dictionary to print each group's term and its calculated importance. ```python my_dict = get_group_and_individual_importances( [social_feature_group, education_feature_group], adult_ebm, X ) for key in my_dict: print(f"Term: {key} - Importance: {my_dict[key]}") ``` -------------------------------- ### Load and Prepare Adult Dataset Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/custom-interactions.ipynb Loads the Adult dataset from OpenML, preprocesses it by dropping rows with missing values, and splits it into training and testing sets. Ensures reproducibility by setting a random seed. ```python import numpy as np from sklearn.model_selection import train_test_split from interpret.glassbox import ExplainableBoostingClassifier from sklearn.datasets import fetch_openml data = fetch_openml("adult", version=2, as_frame=True) X = data.data X = X.dropna() X.columns = [ "Age", "WorkClass", "fnlwgt", "Education", "EducationNum", "MaritalStatus", "Occupation", "Relationship", "Race", "Gender", "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", ] y = data.target y = y[X.index] seed = 42 np.random.seed(seed) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=seed ) ``` -------------------------------- ### Interpret Build Scripts Source: https://github.com/interpretml/interpret/blob/main/shared/vis/CONTRIBUTING.md Provides a reference for various npm scripts available for building and managing the interpret-inline.js project. Use `build-dev` for development builds with source maps, `build-prod` for minified production builds, `clean` to remove build artifacts, and `start` to run the development server. ```bash npm run build-dev ``` ```bash npm run build-prod ``` ```bash npm run clean ``` ```bash npm start ``` -------------------------------- ### Manual Prediction Calculation Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/ebm-internals-regression.ipynb A sample function to manually calculate predictions for simplified scenarios. It iterates through samples, starting with the intercept and adding scores from each feature's lookup table based on the feature's value and binning. This code does not handle interactions, missing, or unseen values. ```python sample_scores = [] for sample in X: # start from the intercept for each sample score = ebm.intercept_ print("intercept: " + str(score)) # we have 2 features, so add their score contributions for feature_idx, feature_val in enumerate(sample): bins = ebm.bins_[feature_idx][0] if isinstance(bins, dict): # categorical feature bin_idx = bins[feature_val] else: # continuous feature. bins is an array of cut points # add 1 because the 0th bin is reserved for 'missing' bin_idx = np.digitize(feature_val, bins) + 1 local_score = ebm.term_scores_[feature_idx][bin_idx] # local_score is also the local feature importance (see plot below) print(ebm.feature_names_in_[feature_idx] + ": " + str(local_score)) score += local_score sample_scores.append(score) print() print("PREDICTIONS:") print(ebm.predict(X)) print(np.array(sample_scores)) ``` -------------------------------- ### Build Native Library (libebm) - Windows Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md Builds the native libebm library for Windows. Supports debug, release, and different architectures. Requires Visual Studio 2022. Use -analysis flag for clang-tidy checks. ```batch ./build.bat -release_64 ./build.bat -debug_64 ./build.bat -release_32 ./build.bat -debug_32 ./build.bat -analysis ``` -------------------------------- ### SPOTgreedy with uniform target distribution Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/prototype-selection.ipynb Applies the SPOTgreedy algorithm to select 20 prototypes from the source data that best match a uniform distribution over the target data. Visualizes the selected prototypes. ```python # Define a targetmarginal on the target set # We define the uniform marginal targetmarginal = np.ones(C.shape[1]) / C.shape[1] # The number of prototypes to be computed numprototypes = 20 # Run SPOTgreedy # prototypeIndices represent the indices corresponding to the chosen prototypes. # prototypeWeights represent the weights associated with each of the chosen prototypes. The weights sum to 1. [prototypeIndices, prototypeWeights] = SPOT_GreedySubsetSelection( C, targetmarginal, numprototypes ) # Plot the chosen prototypes fig, axs = plt.subplots(nrows=5, ncols=4, figsize=(2, 2)) for idx, ax in enumerate(axs.ravel()): ax.imshow(data[prototypeIndices[idx]].reshape((8, 8)), cmap=plt.cm.binary) ax.axis("off") _ = fig.suptitle( "Top prototypes selected from the 64-dimensional digit dataset with uniform target distribution", fontsize=16, ) ``` -------------------------------- ### Compare Multiple Model Explanations Source: https://github.com/interpretml/interpret/blob/main/README.md Display a dashboard to compare explanations from multiple models. Pass a list of explanation objects to the 'show' function. ```python show([logistic_regression_global, decision_tree_global]) ``` -------------------------------- ### CatBoost Hyperparameter Tuning with Optuna Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Configuration for hyperparameter tuning of CatBoost using OptunaSearchCV. Includes task-specific sample limits to prevent OOM errors or long fit times. ```python if trial.task.name in {"Allstate_Claims_Severity"}: # TODO: tweak max_samples = 8000 # crashes or fit time too long without subsampling if trial.task.name in {"Airlines_DepDelay_10M"}: # TODO: tweak max_samples = 100000 # crashes or fit time too long without subsampling if trial.task.name in {"nyc-taxi-green-dec-2016"}: # TODO: tweak max_samples = 50000 # crashes or fit time too long without subsampling if trial.task.name in {"Buzzinsocialmedia_Twitter"}: # TODO: tweak max_samples = 5000 # crashes or fit time too long without subsampling if trial.task.name in {"Yolanda"}: # TODO: tweak max_samples = 5000 # crashes or fit time too long without subsampling # from https://forecastegy.com/posts/catboost-hyperparameter-tuning-guide-with-optuna/ ``` -------------------------------- ### Train EBM Classifier and Evaluate Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/ebm.ipynb Loads the adult dataset, preprocesses it, trains an ExplainableBoostingClassifier, and evaluates its performance using AUC. Ensure all necessary libraries are imported. ```python import numpy as np from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from interpret.glassbox import ExplainableBoostingClassifier from interpret import show from sklearn.datasets import fetch_openml data = fetch_openml("adult", version=2, as_frame=True) X = data.data X = X.dropna() X.columns = [ "Age", "WorkClass", "fnlwgt", "Education", "EducationNum", "MaritalStatus", "Occupation", "Relationship", "Race", "Gender", "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", ] y = data.target y = y[X.index] seed = 42 np.random.seed(seed) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=seed ) ebm = ExplainableBoostingClassifier() ebm.fit(X_train, y_train) auc = roc_auc_score(y_test, ebm.predict_proba(X_test)[:, 1]) print("AUC: {:.3f}".format(auc)) ``` -------------------------------- ### Load and Prepare Adult Dataset Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/merge-ebms.ipynb Loads the Adult dataset from OpenML, handles missing values, renames columns, and splits the data into training and testing sets. Ensure data is cleaned before proceeding. ```python import numpy as np from sklearn.model_selection import train_test_split from interpret import show from sklearn.datasets import fetch_openml data = fetch_openml("adult", version=2, as_frame=True) X = data.data X = X.dropna() X.columns = [ "Age", "WorkClass", "fnlwgt", "Education", "EducationNum", "MaritalStatus", "Occupation", "Relationship", "Race", "Gender", "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", ] y = data.target y = y[X.index] seed = 42 np.random.seed(seed) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=seed ) ``` -------------------------------- ### Load and Prepare UCI Adult Dataset Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb Loads the UCI adult dataset using fetch_openml, preprocesses it by dropping NA values and renaming columns, and splits it into training and testing sets. Requires pandas, numpy, and scikit-learn. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.datasets import fetch_openml data = fetch_openml("adult", version=2, as_frame=True) X = data.data X = X.dropna() X.columns = [ "Age", "WorkClass", "fnlwgt", "Education", "EducationNum", "MaritalStatus", "Occupation", "Relationship", "Race", "Gender", "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", ] y = (data.target == ">50K").astype(int) y = y[X.index] seed = 42 np.random.seed(seed) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=seed ) ``` -------------------------------- ### Initialize Pipeline with MLPClassifier Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Creates a pipeline with a preprocessing step 'p' and a Multi-layer Perceptron Classifier (MLPClassifier). Use for complex classification tasks where neural networks are suitable. ```python est = Pipeline([("p", p), ("est", MLPClassifier(**nn_params))]) ``` -------------------------------- ### Configure CatBoostClassifier with OptunaSearchCV Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Sets up hyperparameter distributions for CatBoostClassifier tuning with OptunaSearchCV. Use for optimizing CatBoost models. ```python est = OptunaSearchCV( estimator=CatBoostClassifier(**catboost_params), param_distributions=param_grid, cv=n_calibration_folds, n_trials=50, scoring="neg_log_loss", verbose=0, random_state=seed, n_jobs=1, # catboost uses the cores efficiently ) ``` -------------------------------- ### Initialize Results and Splits Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-classification-comparison.ipynb Initializes an empty list to store benchmark results and sets the number of splits for cross-validation. ```python results = [] n_splits = 3 ``` -------------------------------- ### LinearSVR Initialization Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb Initializes a LinearSVR (Support Vector Regression with linear kernel) within a Pipeline. Use this for SVR tasks with a linear kernel. ```python est = Pipeline([("p", p), ("est", LinearSVR(**lsvm_params))]) ``` -------------------------------- ### Fit Regression EBM and Show Global Explanation Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/ebm-internals-regression.ipynb Creates a sample dataset, fits an EBM regression model without interactions, and displays its global explanation. Validation set is eliminated for small datasets. ```python # make a dataset composed of a nominal categorical, and a continuous feature X = [["Peru", 7.0], ["Fiji", 8.0], ["Peru", 9.0]] y = [450.0, 550.0, 350.0] # Fit a regression EBM without interactions # Eliminate the validation set to handle the small dataset ebm = ExplainableBoostingRegressor( interactions=0, validation_size=0, outer_bags=1, min_samples_leaf=1, min_hessian=1e-9, ) ebm.fit(X, y) show(ebm.explain_global()) ``` -------------------------------- ### Train Blackbox Pipeline and Explain with LIME Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/lime.ipynb Trains a RandomForestClassifier pipeline with PCA on the breast cancer dataset and then uses LimeTabular to generate local explanations for the first 5 test samples. ```python import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.decomposition import PCA from sklearn.pipeline import Pipeline from interpret import show from interpret.blackbox import LimeTabular seed = 42 np.random.seed(seed) X, y = load_breast_cancer(return_X_y=True, as_frame=True) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=seed ) pca = PCA() rf = RandomForestClassifier(random_state=seed) blackbox_model = Pipeline([("pca", pca), ("rf", rf)]) blackbox_model.fit(X_train, y_train) lime = LimeTabular(blackbox_model, X_train) show(lime.explain_local(X_test[:5], y_test[:5]), 0) ```