### Install Interpret Library

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-classification.ipynb

Installs the interpret library and its dependencies if not already present. This is a prerequisite for running the examples.

```python
# install interpret if not already installed
try:
    import interpret
except ModuleNotFoundError:
    !pip install --quiet interpret pandas scikit-learn
```

--------------------------------

### Install Interpret Library

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-regression.ipynb

Installs the interpret library and scikit-learn if they are not already present. This is a prerequisite for running the examples.

```python
try:
    import interpret
except ModuleNotFoundError:
    !pip install --quiet interpret scikit-learn
```

--------------------------------

### Install Interpret from PyPI and Dependencies

Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt

Installs the interpret package from PyPI along with Jupyter and other necessary libraries for running example notebooks. Remove 'lime' if not used in examples.

```bash
pip install jupyter interpret lime
```

--------------------------------

### Install interpret-core with specific dependencies (pip)

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb

Install interpret-core with a selection of extra dependencies using pip. This allows for fine-tuning the installed components.

```bash
pip install interpret-core[debug,notebook,plotly,lime,sensitivity,shap,linear,skoperules,treeinterpreter,aplr,dash,testing]
```

--------------------------------

### Install InterpretML from source

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb

Clone the InterpretML repository and install from source. Ensure you have Python 3.10+.

```bash
git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install
```

--------------------------------

### Install interpret from source with all dependencies

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb

Follow these steps to clone the repository and install the interpret package from source with all dependencies.

```bash
git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install
```

--------------------------------

### Install Interpret Core with EBM Explainer

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/faq.ipynb

Install only the EBM explainer and its required dependencies from interpret-core using pip. The 'required' tag is generally recommended for all installs.

```sh
pip install interpret-core[required,ebm]
```

--------------------------------

### Install interpret-core from source with minimal dependencies

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb

Follow these steps to clone the repository and install the interpret-core package from source with minimal dependencies.

```bash
git clone https://github.com/interpretml/interpret.git && cd interpret/scripts && make install-core
```

--------------------------------

### Install Node.js Version 22 using nvm

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb

Install and use Node.js version 22, which is required for building the visualization bundle. This ensures compatibility with CI processes. Verify the installation by checking the Node and npm versions.

```shell
nvm install 22
nvm use 22
node --version  # should report v22.x
npm --version
```

--------------------------------

### Install Powerlift Library

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Installs the 'powerlift' Python package with optional dataset and postgres support if it is not already found.

```python
# install powerlift if not already installed
try:
    import powerlift
except ModuleNotFoundError:
    !pip install -U --quiet powerlift[datasets,postgres]
```

--------------------------------

### Install Interpret Library

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-regression-synthetic.ipynb

Installs the interpret library and scikit-learn if they are not already present. This is a prerequisite for running the rest of the notebook.

```python
# install interpret if not already installed
try:
    import interpret
except ModuleNotFoundError:
    !pip install --quiet interpret scikit-learn
```

--------------------------------

### Install Interpret Library

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Installs the 'interpret-core' Python package if it is not already found in the current environment.

```python
# install interpret if not already installed
try:
    import interpret
except ModuleNotFoundError:
    !pip install -U --quiet interpret-core
```

--------------------------------

### Install interpret Package

Source: https://github.com/interpretml/interpret/blob/main/docs/excel_exporter/excel_exporter.ipynb

Installs the interpret package from the local source. Ensure you are in the correct directory before running.

```python
# import sys
# !{sys.executable} -m pip install -e ./interpret/python/interpret-core/
```

--------------------------------

### Start JS Development Server

Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md

Starts a development server for the JavaScript visualization bundle using webpack-dev-server. This provides live reloading and is useful during development.

```bash
cd shared/vis && npm start
```

--------------------------------

### Install interpret-core with minimal dependencies (pip)

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb

Use this command to install the interpret-core package with only the essential dependencies for fitting, predicting, and explaining.

```bash
pip install interpret-core
```

--------------------------------

### Install interpret with all dependencies (pip)

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb

Use this command to install the interpret package with all its dependencies, suitable for general use.

```bash
pip install interpret
```

--------------------------------

### Install Interpret Package

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/explain-blackbox-classifiers.ipynb

Installs the interpret package and its dependencies if not already present. This is a prerequisite for running the rest of the notebook.

```python
try:
    import interpret
except ModuleNotFoundError:
    !pip install --quiet interpret pandas scikit-learn lime
```

--------------------------------

### Install ebm2onnx package

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/faq.ipynb

Install the `ebm2onnx` package from PyPi to enable high-speed inference on EBM objects through ONNX compatible runtimes.

```bash
pip install ebm2onnx
```

--------------------------------

### Load and Train Model for Explanation

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/framework.ipynb

Loads the adult dataset, preprocesses it, and trains an Explainable Boosting Classifier. This setup is required before generating explanations.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from interpret.glassbox import ExplainableBoostingClassifier
from interpret import show

from sklearn.datasets import fetch_openml

data = fetch_openml("adult", version=2, as_frame=True)
X = data.data
X = X.dropna()
X.columns = [
    "Age",
    "WorkClass",
    "fnlwgt",
    "Education",
    "EducationNum",
    "MaritalStatus",
    "Occupation",
    "Relationship",
    "Race",
    "Gender",
    "CapitalGain",
    "CapitalLoss",
    "HoursPerWeek",
    "NativeCountry",
]
y = (data.target == ">50K").astype(int)
y = y[X.index]

seed = 42
np.random.seed(seed)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=seed
)

ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)
```

--------------------------------

### Install InterpretML Package

Source: https://github.com/interpretml/interpret/blob/main/README.md

Install the InterpretML package using pip or conda. Ensure you have Python 3.10+ installed.

```sh
pip install interpret
```

```sh
conda install -c conda-forge interpret
```

--------------------------------

### Install InterpretML using pip

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb

Use this command to install the InterpretML library via pip. Ensure you have Python 3.10+.

```bash
pip install interpret
```

--------------------------------

### Install interpret with all dependencies (conda)

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb

Use this command to install the interpret package with all its dependencies using conda.

```bash
conda install -c conda-forge interpret
```

--------------------------------

### Install interpret and dependencies

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/prototype-selection.ipynb

Installs the interpret library and necessary dependencies like numpy, scikit-learn, and matplotlib if they are not already present.

```python
# install interpret if not already installed
try:
    import interpret
except ModuleNotFoundError:
    !pip install --quiet interpret numpy scikit-learn matplotlib
```

--------------------------------

### Install interpret-core with minimal dependencies (conda)

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/deployment-guide.ipynb

Use this command to install the interpret-core package with only the essential dependencies using conda.

```bash
conda install -c conda-forge interpret-core
```

--------------------------------

### Install Benchmark Dependencies

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Installs specific versions of required Python packages for reproducibility, including numpy, pandas, scikit-learn, optuna, xgboost, lightgbm, catboost, aplr, and tabpfn.

```python
# use exact versions for reproducibility of the RANK ordering
requirements = "numpy==1.26.4 pandas==2.2.2 scikit-learn==1.5.1 optuna==4.0.0 optuna-integration==4.0.0 xgboost==2.1.0 lightgbm==4.5.0 catboost==1.2.5 aplr==10.6.1 tabpfn==2.0.1 autogluon.tabular==1.5"
!pip install -U --quiet {requirements}
```

--------------------------------

### Build JS Visualization Bundle for Production

Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md

Builds the JavaScript visualization bundle for production use. This command should be run from the `shared/vis/` directory and installs npm dependencies before building.

```bash
cd shared/vis && npm install && npm run build-prod
```

--------------------------------

### Build interpret-inline.js (Production)

Source: https://github.com/interpretml/interpret/blob/main/shared/vis/CONTRIBUTING.md

Installs dependencies and builds the production version of interpret-inline.js. The output is a minified file located at dist/interpret-inline.js.

```bash
cd shared/vis
npm install
npm run build-prod
```

--------------------------------

### Install Interpret-Core with Debug and Visualization Dependencies

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb

Install the interpret-core Python package in editable mode, including dependencies for debugging, notebooks, Plotly, and various explainers like LIME, SHAP, and others.

```shell
cd ../../python/interpret-core

pip install -e .[debug,notebook,plotly,lime,sensitivity,shap,linear,skoperules,treeinterpreter,aplr,dash,testing]
```

--------------------------------

### Setup Inline Visualization Provider

Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt

Configures the Interpret library to use the InlineProvider for visualizations within Jupyter notebooks. This is crucial for testing the NPM inline JS package.

```python
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
```

--------------------------------

### Install Python Development Dependencies

Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md

Installs the InterpretML Python package in editable mode with development dependencies. Includes optional extras for debugging, notebooks, and various visualization/analysis libraries.

```bash
pip install -e ".[debug,notebook,plotly,lime,sensitivity,shap,linear,treeinterpreter,aplr,dash,skoperules,excel,testing]"
```

--------------------------------

### Run Local Benchmark Experiment

Source: https://github.com/interpretml/interpret/blob/main/python/powerlift/README.md

Initializes the Powerlift store, populates it with datasets, and runs a benchmark experiment locally. Ensure the database connection string is correctly set.

```python
import os
from powerlift.bench import Benchmark, Store
from powerlift.bench import populate_with_datasets

# Initialize database (if needed).
conn_str = f"sqlite:///{os.getcwd()}/powerlift.db"
store = Store(conn_str, force_recreate=False)

# This downloads datasets once and feeds into the database.
populate_with_datasets(store, cache_dir="~/.powerlift", exist_ok=True)

# Run experiment
benchmark = Benchmark(f"sqlite:///{os.getcwd()}/powerlift.db", name="SVM vs RF")
benchmark.run(trial_runner, trial_filter)
benchmark.wait_until_complete()
```

--------------------------------

### Build Interpret and Visualization Bundle

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb

Navigate to the Interpret directory, build the core library using the provided script (build.sh or build.bat), and then build the production-ready visualization bundle.

```shell
cd interpret

./build.sh  # OR use build.bat on Windows

cd shared/vis

npm run clean && npm install && npm run build-prod
```

--------------------------------

### Configure Data Store and Benchmark

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Sets up the connection string for the data store (either Azure or SQLite) and initializes the Powerlift Benchmark object. Requires Azure credentials and environment variables to be set if `is_azure` is True.

```python
import os

if is_azure:
    import requests
    import json
    import subprocess
    from azure.identity import AzureCliCredential

    credential = AzureCliCredential()
    access_token = credential.get_token("https://graph.microsoft.com/.default").token
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "application/json",
    }
    azure_client_id = (
        requests.get("https://graph.microsoft.com/v1.0/me", headers=headers)
        .json()
        .get("id")
    )
    azure_tenant_id = (
        requests.get("https://graph.microsoft.com/v1.0/organization", headers=headers)
        .json()["value"][0]
        .get("id")
    )
    subscription_id = json.loads(
        subprocess.run(
            "az account show", capture_output=True, text=True, shell=True
        )
        .stdout
    ).get("id")

    from dotenv import load_dotenv

    load_dotenv()
    conn_str = os.getenv("DOCKER_DB_URL")
    resource_group = os.getenv("AZURE_RESOURCE_GROUP")
else:
    conn_str = f"sqlite:///{os.getcwd()}/powerlift.db"

from powerlift.bench import Store, Benchmark

store = Store(conn_str, force_recreate=force_recreate)
benchmark = Benchmark(store, name=experiment_name)
```

--------------------------------

### Run Unit Tests

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/installation-guide.ipynb

Navigate to the scripts directory and execute the 'make test' command to run all unit tests for the Interpret project.

```shell
cd interpret/scripts

make test
```

--------------------------------

### Install InterpretML using conda

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb

Use this command to install the InterpretML library via conda. Ensure you have Python 3.10+.

```bash
conda install -c conda-forge interpret
```

--------------------------------

### Configure LGBM, CatBoost, and RF Parameters

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Configures parameters for LightGBM (verbosity), CatBoost (verbose), and Random Forest variants (enable_categorical, feature_types, random_state, n_jobs).

```python
lgbm_params["verbosity"] = -1
    catboost_params["verbose"] = False
    rf_xgb_params["enable_categorical"] = True
    rf_xgb_params["feature_types"] = ["c" if cat else "q" for cat in cat_bools]
    rf_sk_params["random_state"] = seed
    rf_sk_params["n_jobs"] = -1
```

--------------------------------

### Define Trial Runner Configuration

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Sets up the runner for trials, including seed initialization and importing necessary libraries for various machine learning models.

```python
def trial_runner(trial):
    seed = 42
    seed += trial.replicate_num
    max_samples = None
    n_calibration_folds = 4  # 4 uses all cores on the containers

    from interpret.glassbox import (
        ExplainableBoostingClassifier,
        ExplainableBoostingRegressor,
    )
    from interpret.develop import set_option
    from interpret.utils._native import Native
    from xgboost import XGBClassifier, XGBRegressor, XGBRFClassifier, XGBRFRegressor
    from lightgbm import LGBMClassifier, LGBMRegressor
    from catboost import CatBoostClassifier, CatBoostRegressor
    from autogluon.tabular import TabularPredictor
    from sklearn.ensemble import (
        RandomForestClassifier,
        RandomForestRegressor,
        ExtraTreesClassifier,
        ExtraTreesRegressor,
    )
    from sklearn.linear_model import (
        LogisticRegression,
        LinearRegression,
        ElasticNet,
        SGDClassifier,
        SGDRegressor,
    )
    from sklearn.svm import LinearSVC, LinearSVR, SVC, SVR
    from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
    from sklearn.neural_network import MLPClassifier, MLPRegressor
    from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
    from aplr import APLRClassifier, APLRRegressor
```

--------------------------------

### Run Benchmark Experiment on Azure Container Instances

Source: https://github.com/interpretml/interpret/blob/main/python/powerlift/README.md

Configures and runs a Powerlift benchmark experiment using Azure Container Instances as the executor. Requires Azure environment variables to be set.

```python
# Run experiment (but in ACI).
from powerlift.executors import AzureContainerInstance
store = Store(os.getenv("AZURE_DB_URL"))
azure_tenant_id = os.getenv("AZURE_TENANT_ID")
subscription_id = os.getenv("AZURE_SUBSCRIPTION_ID")
azure_client_id = os.getenv("AZURE_CLIENT_ID")
azure_client_secret = os.getenv("AZURE_CLIENT_SECRET")
resource_group = os.getenv("AZURE_RESOURCE_GROUP")

executor = AzureContainerInstance(
    store,
    azure_tenant_id,
    subscription_id,
    azure_client_id,
    azure_client_secret=azure_client_secret,
    resource_group=resource_group,
    n_running_containers=5
)
benchmark = Benchmark(store, name="SVM vs RF")
benchmark.run(trial_runner, trial_filter, timeout=10, executor=executor)
benchmark.wait_until_complete()
```

--------------------------------

### Install Interpret Conda Package and Dependencies

Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt

Installs the interpret-core and interpret packages from conda-forge, along with necessary Python libraries for visualization and notebook execution.

```bash
conda install --yes -c conda-forge interpret-core psutil ipykernel ipython plotly lime SALib shap dill dash dash-core-components dash-html-components dash-table dash_cytoscape gevent requests
```

--------------------------------

### Generate Local Explanations with LIME

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb

This snippet demonstrates how to initialize the LimeTabular explainer and generate local explanations for the first five instances in the test set. Ensure that 'blackbox_model', 'X_train', 'X_test', 'y_test', and 'seed' are defined prior to execution.

```python
from interpret.blackbox import LimeTabular
from interpret import show

lime = LimeTabular(blackbox_model, X_train, random_state=seed)
show(lime.explain_local(X_test[:5], y_test[:5]), 0)
```

--------------------------------

### Define Benchmark Parameters

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Sets up experiment parameters including Azure status, replication counts, instance counts, and file paths for wheels. It also defines a unique experiment name based on the current timestamp.

```python
# Within this notebook we adopt the convention that for metrics lower is better.
# For metrics where higher is better like AUC we flip the sign to negative.

is_azure = False  # if this is set to True, login with 'az login' before
n_replicates = 25  # 25 is almost costless since with 30 we get to saturation of 650 runners in 2 hours
n_instances = 810

force_recreate = False
exist_ok = True
TIMEOUT_SEC = 60 * 60 * 24 * 180  # 180 days
wheel_filepaths = [
    "interpret_core-0.7.1-py3-none-any.whl",
    "powerlift-0.1.12-py3-none-any.whl",
]

import datetime

experiment_name = datetime.datetime.now().strftime("%Y_%m_%d_%H%M__") + "myexperiment"
# experiment_name = 'yyyy_mm_dd_hhmm__myexperiment'

print("Experiment name: " + experiment_name)
```

--------------------------------

### Configure SVC with Pipeline

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Sets up a pipeline with SVC. Includes dataset-specific subsampling for large datasets like Fashion-MNIST and CIFAR-10 to prevent crashes or long fit times.

```python
est = Pipeline(
                [
                    ("p", p),
                    ("est", SVC(**svm_params)),
                ]
            )
```

--------------------------------

### Import and Check Interpret Package

Source: https://github.com/interpretml/interpret/blob/main/docs/excel_exporter/excel_exporter.ipynb

Imports the interpret package and verifies its installation path. This is a basic check to ensure the library is accessible.

```python
import interpret

interpret.__file__
```

--------------------------------

### Initialize Pipeline with KNeighborsClassifier

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Creates a pipeline with a preprocessing step 'p' and a K-Nearest Neighbors Classifier (KNeighborsClassifier). Use for classification tasks where instance-based learning is appropriate.

```python
est = Pipeline([("p", p), ("est", KNeighborsClassifier(**knn_params))])
```

--------------------------------

### Get current process ID

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/debugging-guide.ipynb

Use `os.getpid()` in your Python script to print the current process ID, which can be helpful when attaching the C++ debugger.

```python
print('Current PID = {}'.format(os.getpid()))
```

--------------------------------

### Initialize Pipeline with Calibrated SVC

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Creates a pipeline with a preprocessing step 'p' and a CalibratedClassifierCV using SVC. Use for classification tasks requiring probability calibration.

```python
est = Pipeline([
                    ("p", p),
                    (
                        "est",
                        CalibratedClassifierCV(
                            SVC(**svm_params), n_jobs=-1, cv=n_calibration_folds
                        ),
                    ),
                ])
```

--------------------------------

### Understand Global Model Behavior with EBM

Source: https://github.com/interpretml/interpret/blob/main/README.md

Use this snippet to generate and display a global explanation for an EBM model. Ensure the 'interpret' library is installed.

```python
from interpret import show

ebm_global = ebm.explain_global()
show(ebm_global)
```

--------------------------------

### Initialize Pipeline with SVC

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Creates a pipeline with a preprocessing step 'p' and a Support Vector Classifier (SVC) with specified SVM parameters. Use for classification tasks where SVC is appropriate.

```python
est = Pipeline([("p", p), ("est", SVC(**svm_params))])
```

--------------------------------

### Monotonize EBM Feature

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-classification.ipynb

Applies a monotonic constraint to the 'Age' feature of the trained EBM, ensuring that the model's prediction increases with age. This is an example of post-processing EBMs.

```python
# post-process monotonize the Age feature
ebm.monotonize("Age", increasing=True)
```

--------------------------------

### Configure EBM and XGBoost Parameters

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Sets up parameters for EBM and XGBoost, including feature types, number of jobs, and random state. XGBoost specific parameters like enable_categorical are also configured.

```python
ebm_params["feature_types"] = [
        "nominal" if cat else "continuous" for cat in cat_bools
    ]
    ebm_params["n_jobs"] = -1
    ebm_params["random_state"] = seed
    xgb_params["enable_categorical"] = True
    xgb_params["feature_types"] = ["c" if cat else "q" for cat in cat_bools]
```

--------------------------------

### Create and Activate PyPI Test Conda Environment

Source: https://github.com/interpretml/interpret/blob/main/scripts/release_process.txt

Sets up a new Conda environment specifically for testing the PyPI release of the interpret package. This ensures isolation from other installed packages.

```bash
conda env remove --name interpret_pypi && conda create --yes --name interpret_pypi python=3.10 && conda activate interpret_pypi
```

--------------------------------

### Generate Synthetic Dataset and Split Data

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/interpretable-regression-synthetic.ipynb

Generates a synthetic dataset with specified parameters and splits it into training and testing sets. This setup is crucial for evaluating the EBM's performance.

```python
# boilerplate - generate the synthetic dataset and split into test/train

import numpy as np
from sklearn.model_selection import train_test_split
from interpret.utils import make_synthetic
from interpret import show

seed = 42

X, y, names, types = make_synthetic(
    classes=None, n_samples=50000, missing=False, seed=seed
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=seed
)
```

--------------------------------

### Get and Display Group Importances

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/group-importances.ipynb

Calculates the importance of specified feature groups for a given model and dataset. It then iterates through the resulting dictionary to print each group's term and its calculated importance.

```python
my_dict = get_group_and_individual_importances(
    [social_feature_group, education_feature_group], adult_ebm, X
)
for key in my_dict:
    print(f"Term: {key} - Importance: {my_dict[key]}")
```

--------------------------------

### Load and Prepare Adult Dataset

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/custom-interactions.ipynb

Loads the Adult dataset from OpenML, preprocesses it by dropping rows with missing values, and splits it into training and testing sets. Ensures reproducibility by setting a random seed.

```python
import numpy as np
from sklearn.model_selection import train_test_split
from interpret.glassbox import ExplainableBoostingClassifier

from sklearn.datasets import fetch_openml

data = fetch_openml("adult", version=2, as_frame=True)
X = data.data
X = X.dropna()
X.columns = [
    "Age",
    "WorkClass",
    "fnlwgt",
    "Education",
    "EducationNum",
    "MaritalStatus",
    "Occupation",
    "Relationship",
    "Race",
    "Gender",
    "CapitalGain",
    "CapitalLoss",
    "HoursPerWeek",
    "NativeCountry",
]
y = data.target
y = y[X.index]

seed = 42
np.random.seed(seed)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=seed
)
```

--------------------------------

### Interpret Build Scripts

Source: https://github.com/interpretml/interpret/blob/main/shared/vis/CONTRIBUTING.md

Provides a reference for various npm scripts available for building and managing the interpret-inline.js project. Use `build-dev` for development builds with source maps, `build-prod` for minified production builds, `clean` to remove build artifacts, and `start` to run the development server.

```bash
npm run build-dev
```

```bash
npm run build-prod
```

```bash
npm run clean
```

```bash
npm start
```

--------------------------------

### Manual Prediction Calculation

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/ebm-internals-regression.ipynb

A sample function to manually calculate predictions for simplified scenarios. It iterates through samples, starting with the intercept and adding scores from each feature's lookup table based on the feature's value and binning. This code does not handle interactions, missing, or unseen values.

```python
sample_scores = []
for sample in X:
    # start from the intercept for each sample
    score = ebm.intercept_
    print("intercept: " + str(score))

    # we have 2 features, so add their score contributions
    for feature_idx, feature_val in enumerate(sample):
        bins = ebm.bins_[feature_idx][0]
        if isinstance(bins, dict):
            # categorical feature
            bin_idx = bins[feature_val]
        else:
            # continuous feature. bins is an array of cut points
            # add 1 because the 0th bin is reserved for 'missing'
            bin_idx = np.digitize(feature_val, bins) + 1

        local_score = ebm.term_scores_[feature_idx][bin_idx]

        # local_score is also the local feature importance (see plot below)
        print(ebm.feature_names_in_[feature_idx] + ": " + str(local_score))

        score += local_score
    sample_scores.append(score)
    print()

print("PREDICTIONS:")
print(ebm.predict(X))
print(np.array(sample_scores))
```

--------------------------------

### Build Native Library (libebm) - Windows

Source: https://github.com/interpretml/interpret/blob/main/CLAUDE.md

Builds the native libebm library for Windows. Supports debug, release, and different architectures. Requires Visual Studio 2022. Use -analysis flag for clang-tidy checks.

```batch
./build.bat -release_64
./build.bat -debug_64
./build.bat -release_32
./build.bat -debug_32
./build.bat -analysis
```

--------------------------------

### SPOTgreedy with uniform target distribution

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/prototype-selection.ipynb

Applies the SPOTgreedy algorithm to select 20 prototypes from the source data that best match a uniform distribution over the target data. Visualizes the selected prototypes.

```python
# Define a targetmarginal on the target set
# We define the uniform marginal
targetmarginal = np.ones(C.shape[1]) / C.shape[1]
# The number of prototypes to be computed
numprototypes = 20
# Run SPOTgreedy
# prototypeIndices represent the indices corresponding to the chosen prototypes.
# prototypeWeights represent the weights associated with each of the chosen prototypes. The weights sum to 1.
[prototypeIndices, prototypeWeights] = SPOT_GreedySubsetSelection(
    C, targetmarginal, numprototypes
)
# Plot the chosen prototypes
fig, axs = plt.subplots(nrows=5, ncols=4, figsize=(2, 2))
for idx, ax in enumerate(axs.ravel()):
    ax.imshow(data[prototypeIndices[idx]].reshape((8, 8)), cmap=plt.cm.binary)
    ax.axis("off")
_ = fig.suptitle(
    "Top prototypes selected from the 64-dimensional digit dataset with uniform target distribution",
    fontsize=16,
)
```

--------------------------------

### Compare Multiple Model Explanations

Source: https://github.com/interpretml/interpret/blob/main/README.md

Display a dashboard to compare explanations from multiple models. Pass a list of explanation objects to the 'show' function.

```python
show([logistic_regression_global, decision_tree_global])
```

--------------------------------

### CatBoost Hyperparameter Tuning with Optuna

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Configuration for hyperparameter tuning of CatBoost using OptunaSearchCV. Includes task-specific sample limits to prevent OOM errors or long fit times.

```python
if trial.task.name in {"Allstate_Claims_Severity"}:
    # TODO: tweak
    max_samples = 8000  # crashes or fit time too long without subsampling
if trial.task.name in {"Airlines_DepDelay_10M"}:
    # TODO: tweak
    max_samples = 100000  # crashes or fit time too long without subsampling
if trial.task.name in {"nyc-taxi-green-dec-2016"}:
    # TODO: tweak
    max_samples = 50000  # crashes or fit time too long without subsampling
if trial.task.name in {"Buzzinsocialmedia_Twitter"}:
    # TODO: tweak
    max_samples = 5000  # crashes or fit time too long without subsampling
if trial.task.name in {"Yolanda"}:
    # TODO: tweak
    max_samples = 5000  # crashes or fit time too long without subsampling

# from https://forecastegy.com/posts/catboost-hyperparameter-tuning-guide-with-optuna/
```

--------------------------------

### Train EBM Classifier and Evaluate

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/ebm.ipynb

Loads the adult dataset, preprocesses it, trains an ExplainableBoostingClassifier, and evaluates its performance using AUC. Ensure all necessary libraries are imported.

```python
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

from interpret.glassbox import ExplainableBoostingClassifier
from interpret import show

from sklearn.datasets import fetch_openml

data = fetch_openml("adult", version=2, as_frame=True)
X = data.data
X = X.dropna()
X.columns = [
    "Age",
    "WorkClass",
    "fnlwgt",
    "Education",
    "EducationNum",
    "MaritalStatus",
    "Occupation",
    "Relationship",
    "Race",
    "Gender",
    "CapitalGain",
    "CapitalLoss",
    "HoursPerWeek",
    "NativeCountry",
]
y = data.target
y = y[X.index]

seed = 42
np.random.seed(seed)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=seed
)

ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)

auc = roc_auc_score(y_test, ebm.predict_proba(X_test)[:, 1])
print("AUC: {:.3f}".format(auc))
```

--------------------------------

### Load and Prepare Adult Dataset

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/python/examples/merge-ebms.ipynb

Loads the Adult dataset from OpenML, handles missing values, renames columns, and splits the data into training and testing sets. Ensure data is cleaned before proceeding.

```python
import numpy as np
from sklearn.model_selection import train_test_split
from interpret import show

from sklearn.datasets import fetch_openml

data = fetch_openml("adult", version=2, as_frame=True)
X = data.data
X = X.dropna()
X.columns = [
    "Age",
    "WorkClass",
    "fnlwgt",
    "Education",
    "EducationNum",
    "MaritalStatus",
    "Occupation",
    "Relationship",
    "Race",
    "Gender",
    "CapitalGain",
    "CapitalLoss",
    "HoursPerWeek",
    "NativeCountry",
]
y = data.target
y = y[X.index]

seed = 42
np.random.seed(seed)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=seed
)
```

--------------------------------

### Load and Prepare UCI Adult Dataset

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/index.ipynb

Loads the UCI adult dataset using fetch_openml, preprocesses it by dropping NA values and renaming columns, and splits it into training and testing sets. Requires pandas, numpy, and scikit-learn.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from sklearn.datasets import fetch_openml

data = fetch_openml("adult", version=2, as_frame=True)
X = data.data
X = X.dropna()
X.columns = [
    "Age",
    "WorkClass",
    "fnlwgt",
    "Education",
    "EducationNum",
    "MaritalStatus",
    "Occupation",
    "Relationship",
    "Race",
    "Gender",
    "CapitalGain",
    "CapitalLoss",
    "HoursPerWeek",
    "NativeCountry",
]
y = (data.target == ">50K").astype(int)
y = y[X.index]

seed = 42
np.random.seed(seed)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=seed
)
```

--------------------------------

### Initialize Pipeline with MLPClassifier

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Creates a pipeline with a preprocessing step 'p' and a Multi-layer Perceptron Classifier (MLPClassifier). Use for complex classification tasks where neural networks are suitable.

```python
est = Pipeline([("p", p), ("est", MLPClassifier(**nn_params))])
```

--------------------------------

### Configure CatBoostClassifier with OptunaSearchCV

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Sets up hyperparameter distributions for CatBoostClassifier tuning with OptunaSearchCV. Use for optimizing CatBoost models.

```python
est = OptunaSearchCV(
                estimator=CatBoostClassifier(**catboost_params),
                param_distributions=param_grid,
                cv=n_calibration_folds,
                n_trials=50,
                scoring="neg_log_loss",
                verbose=0,
                random_state=seed,
                n_jobs=1,  # catboost uses the cores efficiently
            )
```

--------------------------------

### Initialize Results and Splits

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-classification-comparison.ipynb

Initializes an empty list to store benchmark results and sets the number of splits for cross-validation.

```python
results = []
n_splits = 3
```

--------------------------------

### LinearSVR Initialization

Source: https://github.com/interpretml/interpret/blob/main/docs/benchmarks/ebm-benchmark.ipynb

Initializes a LinearSVR (Support Vector Regression with linear kernel) within a Pipeline. Use this for SVR tasks with a linear kernel.

```python
est = Pipeline([("p", p), ("est", LinearSVR(**lsvm_params))])
```

--------------------------------

### Fit Regression EBM and Show Global Explanation

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/ebm-internals-regression.ipynb

Creates a sample dataset, fits an EBM regression model without interactions, and displays its global explanation. Validation set is eliminated for small datasets.

```python
# make a dataset composed of a nominal categorical, and a continuous feature
X = [["Peru", 7.0], ["Fiji", 8.0], ["Peru", 9.0]]
y = [450.0, 550.0, 350.0]

# Fit a regression EBM without interactions
# Eliminate the validation set to handle the small dataset
ebm = ExplainableBoostingRegressor(
    interactions=0,
    validation_size=0,
    outer_bags=1,
    min_samples_leaf=1,
    min_hessian=1e-9,
)
ebm.fit(X, y)
show(ebm.explain_global())
```

--------------------------------

### Train Blackbox Pipeline and Explain with LIME

Source: https://github.com/interpretml/interpret/blob/main/docs/interpret/lime.ipynb

Trains a RandomForestClassifier pipeline with PCA on the breast cancer dataset and then uses LimeTabular to generate local explanations for the first 5 test samples.

```python
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from interpret import show
from interpret.blackbox import LimeTabular

seed = 42
np.random.seed(seed)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=seed
)

pca = PCA()
rf = RandomForestClassifier(random_state=seed)

blackbox_model = Pipeline([("pca", pca), ("rf", rf)])
blackbox_model.fit(X_train, y_train)

lime = LimeTabular(blackbox_model, X_train)

show(lime.explain_local(X_test[:5], y_test[:5]), 0)
```