### Setup for Bias-Variance Decomposition Example

Source: https://scikit-learn.org/dev/auto_examples/ensemble/plot_bias_variance.html

Imports necessary libraries and sets up parameters for simulating regression problems to analyze bias-variance decomposition.

```python
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np

from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor

# Settings
n_repeat = 50  # Number of iterations for computing expectations
n_train = 50  # Size of the training set
n_test = 1000  # Size of the test set
noise = 0.1  # Standard deviation of the noise
np.random.seed(0)

# Change this for exploring the bias-variance decomposition of other
# estimators. This should work well for estimators with high variance (e.g.,
# decision trees or KNN), but poorly for estimators with low variance (e.g.,
# linear models).
estimators = [
    ("Tree", DecisionTreeRegressor()),
    ("Bagging(Tree)", BaggingRegressor(DecisionTreeRegressor())),
]

n_estimators = len(estimators)


```

--------------------------------

### Setup and Helper Functions for Discretization Example

Source: https://scikit-learn.org/dev/_downloads/aa8e07ce1b796a15ada1d9f0edce48b5/plot_discretization_classification.ipynb

Imports necessary libraries and defines helper functions for plotting and classifier naming. Sets up the plotting mesh size.

```python
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap

from sklearn.datasets import make_circles, make_classification, make_moons
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import KBinsDiscretizer, StandardScaler
from sklearn.svm import SVC, LinearSVC
from sklearn.utils._testing import ignore_warnings

h = 0.02  # step size in the mesh


def get_name(estimator):
    name = estimator.__class__.__name__
    if name == "Pipeline":
        name = [get_name(est[1]) for est in estimator.steps]
        name = " + ".join(name)
    return name


# list of (estimator, param_grid), where param_grid is used in GridSearchCV
# The parameter spaces in this example are limited to a narrow band to reduce
# its runtime. In a real use case, a broader search space for the algorithms
# should be used.
classifiers = [
    (
        make_pipeline(StandardScaler(), LogisticRegression(random_state=0)),
        {"logisticregression__C": np.logspace(-1, 1, 3)},
    ),
    (
        make_pipeline(StandardScaler(), LinearSVC(random_state=0)),
        {"linearsvc__C": np.logspace(-1, 1, 3)},
    ),
    (
        make_pipeline(
            StandardScaler(),
            KBinsDiscretizer(
                encode="onehot", quantile_method="averaged_inverted_cdf", random_state=0
            ),
            LogisticRegression(random_state=0),
        ),
        {
            "kbinsdiscretizer__n_bins": np.arange(5, 8),
            "logisticregression__C": np.logspace(-1, 1, 3),
        },
    ),
    (
        make_pipeline(
            StandardScaler(),
            KBinsDiscretizer(
                encode="onehot", quantile_method="averaged_inverted_cdf", random_state=0
            ),
            LinearSVC(random_state=0),
        ),
        {
            "kbinsdiscretizer__n_bins": np.arange(5, 8),
            "linearsvc__C": np.logspace(-1, 1, 3),
        },
    ),
    (
        make_pipeline(
            StandardScaler(), GradientBoostingClassifier(n_estimators=5, random_state=0)
        ),
        {"gradientboostingclassifier__learning_rate": np.logspace(-2, 0, 5)},
    ),
    (
        make_pipeline(StandardScaler(), SVC(random_state=0)),
        {"svc__C": np.logspace(-1, 1, 3)},
    ),
]

names = [get_name(e).replace("StandardScaler + ", "") for e, _ in classifiers]

n_samples = 100
datasets = [
    make_moons(n_samples=n_samples, noise=0.2, random_state=0),
    make_circles(n_samples=n_samples, noise=0.2, factor=0.5, random_state=1),
    make_classification(
        n_samples=n_samples,
        n_features=2,
        n_redundant=0,
        n_informative=2,
        random_state=2,
        n_clusters_per_class=1,
    ),
]

fig, axes = plt.subplots(
    nrows=len(datasets), ncols=len(classifiers) + 1, figsize=(21, 9)
)

cm_piyg = plt.cm.PiYG
cm_bright = ListedColormap(["#b30065", "#178000"])

# iter
```

--------------------------------

### Setup for Theil-Sen Regression Example

Source: https://scikit-learn.org/dev/auto_examples/linear_model/plot_theilsen.html

Imports necessary libraries and defines estimators for OLS, Theil-Sen, and RANSAC regression, along with their colors and line widths for plotting.

```python
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import time

import matplotlib.pyplot as plt
import numpy as np

from sklearn.linear_model import LinearRegression, RANSACRegressor, TheilSenRegressor

estimators = [
    ("OLS", LinearRegression()),
    ("Theil-Sen", TheilSenRegressor(random_state=42)),
    ("RANSAC", RANSACRegressor(random_state=42)),
]
colors = {"OLS": "turquoise", "Theil-Sen": "gold", "RANSAC": "lightgreen"}
lw = 2

```

--------------------------------

### Fitting and Calibration Example

Source: https://scikit-learn.org/dev/auto_examples/calibration/plot_calibration_multiclass.html

This snippet shows the setup for fitting and calibrating a multiclass classifier. It involves defining the base estimator and the calibration strategy.

```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base estimator
base_estimator = RandomForestClassifier(n_estimators=25, random_state=42)

# Calibrated classifier with isotonic calibration
calibrated_isotonic = CalibratedClassifierCV(base_estimator, method='isotonic', cv='prefit')
calibrated_isotonic.fit(X_train, y_train)

# Calibrated classifier with sigmoid calibration
calibrated_sigmoid = CalibratedClassifierCV(base_estimator, method='sigmoid', cv='prefit')
calibrated_sigmoid.fit(X_train, y_train)

# Using cross-validation for calibration
calibrated_cv = CalibratedClassifierCV(base_estimator, method='isotonic', cv=5)
calibrated_cv.fit(X_train, y_train)

# Accessing calibrated classifiers
print(f"Isotonic calibrated classifiers (prefit): {calibrated_isotonic.calibrated_classifiers_}")
print(f"Sigmoid calibrated classifiers (prefit): {calibrated_sigmoid.calibrated_classifiers_}")
print(f"Cross-validated calibrated classifiers: {calibrated_cv.calibrated_classifiers_}")

# Making predictions
prob_isotonic = calibrated_isotonic.predict_proba(X_test)
prob_sigmoid = calibrated_sigmoid.predict_proba(X_test)
prob_cv = calibrated_cv.predict_proba(X_test)

print(f"\nSample predicted probabilities (Isotonic, prefit): {prob_isotonic[:5]}")
print(f"Sample predicted probabilities (Sigmoid, prefit): {prob_sigmoid[:5]}")
print(f"Sample predicted probabilities (Cross-validated): {prob_cv[:5]}")

```

--------------------------------

### Minimal Custom Callback Implementation

Source: https://scikit-learn.org/dev/developers/developing_callbacks.html

This example demonstrates a basic custom callback class with methods for setup, teardown, and hooks for the beginning and end of a fit task. It prints messages indicating the stage and provides information about the task and training data.

```python
class MyCallback:

    def setup(self, estimator, context):
        print(f"Setup hook is being called in the {context.task_name} task.")

    def teardown(self, estimator, context):
        print(f"Teardown hook is being called in the {context.task_name} task.")

    def on_fit_task_begin(self, estimator, context, *, X=None):
        msg = f"{context.task_name} task is starting."
        if X is not None:
            msg += f" With training data of shape {X.shape}."
        print(msg)

    def on_fit_task_end(
        self, estimator, context, *, X=None, y=None, fitted_estimator=None
    ):
        msg = f"{context.task_name} task is ending."
        mean_squared_error = ((y - fitted_estimator.predict(X))**2).mean()
        msg += f" With a mean squared error of {mean_squared_error}."
        print(msg)

```

--------------------------------

### Build Documentation with Example Gallery

Source: https://scikit-learn.org/dev/developers/contributing.html

Generates the full documentation, including the example gallery by running all examples. This process can take a significant amount of time.

```bash
make html
```

--------------------------------

### Basic RidgeClassifier Usage

Source: https://scikit-learn.org/dev/modules/generated/sklearn.linear_model.RidgeClassifier.html

Demonstrates how to instantiate, fit, and score a RidgeClassifier using sample data. This is a fundamental example for getting started with the classifier.

```python
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import RidgeClassifier
X, y = load_breast_cancer(return_X_y=True)
clf = RidgeClassifier().fit(X, y)
clf.score(X, y)
```

--------------------------------

### FastICA with Default Parameters

Source: https://scikit-learn.org/dev/modules/generated/fastica-function.html

Performs Fast Independent Component Analysis using default parameters. This is a basic example to get started with the function.

```python
from sklearn.decomposition import FastICA

X = np.array([[1, 1], [2, 2], [3, 3], [4, 4]])

# Perform FastICA
fastica = FastICA(n_components=2, random_state=0)
s = fastica.fit_transform(X)  # Reconstruct signals

```

--------------------------------

### Example warning message for new metadata consumption

Source: https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_metadata_routing.html

This is the expected warning message output when an estimator like `WeightedMetaRegressor` starts consuming metadata (`sample_weight`) that it did not consume before. It guides the user on how to explicitly manage the request.

```text
Received sample_weight of length = 100 in WeightedMetaRegressor.
Support for sample_weight has recently been added to WeightedMetaRegressor(estimator=LinearRegression()) class. To maintain backward compatibility, it is ignored now. Using `set_fit_request(sample_weight={True, False})` on this method of the class, you can set the request value to False to silence this warning, or to True to consume and use the metadata.

```

--------------------------------

### Build Documentation (Basic)

Source: https://scikit-learn.org/dev/developers/contributing.html

Generates the main web documentation without the example gallery. The output is placed in the '_build/html/stable' directory.

```bash
make
```

--------------------------------

### Prepare ARM64 Development Environment

Source: https://scikit-learn.org/dev/developers/tips.html

Download Miniforge installer and clone the scikit-learn repository into a dedicated folder for ARM64 development.

```bash
mkdir arm64
pushd arm64
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-aarch64.sh
git clone https://github.com/scikit-learn/scikit-learn.git

```

--------------------------------

### GaussianMixture.get_metadata_routing

Source: https://scikit-learn.org/dev/modules/generated/sklearn.mixture.GaussianMixture.html

Get metadata routing of this object. Please check User Guide on how the routing mechanism works.

```APIDOC
## get_metadata_routing()

### Description
Get metadata routing of this object.

### Returns
- **routing** MetadataRequest
    A `MetadataRequest` encapsulating routing information.

```

--------------------------------

### Perceptron.get_metadata_routing

Source: https://scikit-learn.org/dev/modules/generated/sklearn.linear_model.Perceptron.html

Get metadata routing of this object. Please check User Guide on how the routing mechanism works.

```APIDOC
## Perceptron.get_metadata_routing

### Description
Get metadata routing of this object. Please check User Guide on how the routing mechanism works.

### Parameters
None

### Returns
* **routing** (MetadataRequest) - A `MetadataRequest` encapsulating routing information.
```

--------------------------------

### Creating and using a Product kernel

Source: https://scikit-learn.org/dev/modules/generated/sklearn.gaussian_process.kernels.Product.html

Demonstrates how to create a Product kernel by combining ConstantKernel and RBF, then use it with GaussianProcessRegressor for fitting and scoring. The example shows the resulting kernel representation.

```python
>>> from sklearn.datasets import make_friedman2
>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import (
...     RBF, Product, ConstantKernel)
>>> X, y = make_friedman2(n_samples=500, noise=0, random_state=0)
>>> kernel = Product(ConstantKernel(2), RBF())
>>> gpr = GaussianProcessRegressor(kernel=kernel,
...         random_state=0).fit(X, y)
>>> gpr.score(X, y)
1.0
>>> kernel
1.41**2 * RBF(length_scale=1)
```

--------------------------------

### get_metadata_routing

Source: https://scikit-learn.org/dev/modules/generated/sklearn.base.BaseEstimator.html

Get metadata routing of this object. Please check User Guide on how the routing mechanism works.

```APIDOC
## get_metadata_routing

### Description
Get metadata routing of this object. Please check User Guide on how the routing mechanism works.

### Returns
- **routing** (MetadataRequest) - A `MetadataRequest` encapsulating routing information.
```

--------------------------------

### Pipeline Initialization with Metadata Requests

Source: https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_metadata_routing.html

Demonstrates how to instantiate a `SimplePipeline` with an `ExampleTransformer` and a `RouterConsumerClassifier`, setting specific metadata requests for each component. This example shows how to enable metadata like `sample_weight` and `groups` for different methods.

```python
from sklearn.base import clone
from sklearn.utils.metadata_routing import MetadataRouter, MethodMapping, process_routing, check_metadata

# Assuming ExampleClassifier and RouterConsumerClassifier are defined elsewhere
# For demonstration purposes, let's define minimal versions if not provided:
class ExampleClassifier(BaseEstimator):
    def fit(self, X, y, sample_weight=None):
        check_metadata(self, sample_weight=sample_weight)
        return self
    def predict(self, X, groups=None):
        check_metadata(self, groups=groups)
        return X

class RouterConsumerClassifier(BaseEstimator):
    def __init__(self, estimator):
        self.estimator = estimator

    def get_metadata_routing(self):
        router = MetadataRouter(owner=self)
        router.add(estimator=self.estimator, method_mapping=MethodMapping().add(caller="fit", callee="fit").add(caller="predict", callee="predict"))
        return router

    def fit(self, X, y, **params):
        routed_params = process_routing(self, "fit", **params)
        self.estimator_ = clone(self.estimator).fit(X, y, **routed_params["estimator"]["fit"])
        return self

    def predict(self, X, **params):
        routed_params = process_routing(self, "predict", **params)
        return self.estimator_.predict(X, **routed_params["estimator"]["predict"])

# The actual pipeline instantiation from the source:
pipe = SimplePipeline(
    transformer=ExampleTransformer()
    # we set transformer's fit to receive sample_weight
    .set_fit_request(sample_weight=True)
    # we set transformer's transform to receive groups
    .set_transform_request(groups=True),
    classifier=RouterConsumerClassifier(
        estimator=ExampleClassifier()
        # we want this sub-estimator to receive sample_weight in fit
        .set_fit_request(sample_weight=True)
        # but not groups in predict
        .set_predict_request(groups=False),
    )
    # and we want the meta-estimator to receive sample_weight as well
    .set_fit_request(sample_weight=True),
)

```

--------------------------------

### Gradient Boosting Classifier Initialization

Source: https://scikit-learn.org/dev/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

Initializes a GradientBoostingClassifier with default parameters. This is a basic setup for starting with the algorithm.

```python
from sklearn.ensemble import GradientBoostingClassifier

clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=0)
```

--------------------------------

### Navigate to Documentation Directory

Source: https://scikit-learn.org/dev/developers/contributing.html

Change the current directory to the 'doc' folder to begin building the documentation.

```bash
cd doc
```

--------------------------------

### Filter estimators by type

Source: https://scikit-learn.org/dev/modules/generated/sklearn.utils.discovery.all_estimators.html

Retrieves a list of estimators filtered by a specific type. This example shows how to get only classifiers.

```python
from sklearn.utils.discovery import all_estimators
classifiers = all_estimators(type_filter="classifier")
classifiers[:2]
```

--------------------------------

### PLSSVD Example

Source: https://scikit-learn.org/dev/modules/generated/sklearn.cross_decomposition.PLSSVD.html

Demonstrates how to initialize PLSSVD, fit it to sample data, and transform the data. It also shows how to check the shapes of the transformed data.

```python
import numpy as np
from sklearn.cross_decomposition import PLSSVD
X = np.array([[0., 0., 1.],
              [1., 0., 0.],
              [2., 2., 2.],
              [2., 5., 4.]])
y = np.array([[0.1, -0.2],
              [0.9, 1.1],
              [6.2, 5.9],
              [11.9, 12.3]])
pls = PLSSVD(n_components=2).fit(X, y)
X_c, y_c = pls.transform(X, y)
X_c.shape, y_c.shape
((4, 2), (4, 2))
```

--------------------------------

### OPTICS get_metadata_routing method

Source: https://scikit-learn.org/dev/modules/generated/sklearn.cluster.OPTICS.html

Get metadata routing of this object. Please check User Guide on how the routing mechanism works.

```APIDOC
## OPTICS.get_metadata_routing

### Description
Get metadata routing of this object. Please check User Guide on how the routing mechanism works.

### Method
`get_metadata_routing()`

### Returns
- **routing** (MetadataRequest) - Metadata routing object.
```

--------------------------------

### set_output

Source: https://scikit-learn.org/dev/modules/generated/sklearn.cluster.Birch.html

Set output container. Refer to the user guide for more details and Introducing the set_output API for an example on how to use the API.

```APIDOC
## set_output

### Description
Set output container. Refer to the user guide for more details and Introducing the set_output API for an example on how to use the API.

### Parameters
#### Parameters
- **transform** ({"default", "pandas", "polars"}, default=None) - Configure output of `transform` and `fit_transform`. * "default": Default output format of a transformer * "pandas": DataFrame output * "polars": Polars output * `None`: Transform configuration is unchanged

### Returns
#### Returns
- **self** (estimator instance) - Estimator instance.

### Added in version
1.4: "polars" option was added.
```

--------------------------------

### Data Preparation and Algorithm Initialization

Source: https://scikit-learn.org/dev/auto_examples/cluster/plot_cluster_comparison.html

This snippet shows the setup for comparing clustering algorithms. It includes data loading, normalization, bandwidth estimation for MeanShift, connectivity matrix creation for Ward and average linkage, and initialization of various clustering algorithm objects with dataset-specific parameters.

```python
plt.figure(figsize=(9 * 2 + 3, 13))
plt.subplots_adjust(
    left=0.02, right=0.98, bottom=0.001, top=0.95, wspace=0.05, hspace=0.01
)

plot_num = 1

default_base = {
    "quantile": 0.3,
    "eps": 0.3,
    "damping": 0.9,
    "preference": -200,
    "n_neighbors": 3,
    "n_clusters": 3,
    "min_samples": 7,
    "xi": 0.05,
    "min_cluster_size": 0.1,
    "allow_single_cluster": True,
    "hdbscan_min_cluster_size": 15,
    "hdbscan_min_samples": 3,
    "random_state": 42,
}

datasets = [
    (
        noisy_circles,
        {
            "damping": 0.77,
            "preference": -240,
            "quantile": 0.2,
            "n_clusters": 2,
            "min_samples": 7,
            "xi": 0.08,
        },
    ),
    (
        noisy_moons,
        {
            "damping": 0.75,
            "preference": -220,
            "n_clusters": 2,
            "min_samples": 7,
            "xi": 0.1,
        },
    ),
    (
        varied,
        {
            "eps": 0.18,
            "n_neighbors": 2,
            "min_samples": 7,
            "xi": 0.01,
            "min_cluster_size": 0.2,
        },
    ),
    (
        aniso,
        {
            "eps": 0.15,
            "n_neighbors": 2,
            "min_samples": 7,
            "xi": 0.1,
            "min_cluster_size": 0.2,
        },
    ),
    (blobs, {"min_samples": 7, "xi": 0.1, "min_cluster_size": 0.2}),
    (no_structure, {}),
]

for i_dataset, (dataset, algo_params) in enumerate(datasets):
    # update parameters with dataset-specific values
    params = default_base.copy()
    params.update(algo_params)

    X, y = dataset

    # normalize dataset for easier parameter selection
    X = StandardScaler().fit_transform(X)

    # estimate bandwidth for mean shift
    bandwidth = cluster.estimate_bandwidth(X, quantile=params["quantile"])

    # connectivity matrix for structured Ward
    connectivity = kneighbors_graph(
        X, n_neighbors=params["n_neighbors"], include_self=False
    )
    # make connectivity symmetric
    connectivity = 0.5 * (connectivity + connectivity.T)

    # Create cluster objects
    ms = cluster.MeanShift(bandwidth=bandwidth, bin_seeding=True)
    two_means = cluster.MiniBatchKMeans(
        n_clusters=params["n_clusters"],
        random_state=params["random_state"],
    )
    ward = cluster.AgglomerativeClustering(
        n_clusters=params["n_clusters"], linkage="ward", connectivity=connectivity
    )
    spectral = cluster.SpectralClustering(
        n_clusters=params["n_clusters"],
        eigen_solver="arpack",
        affinity="nearest_neighbors",
        random_state=params["random_state"],
    )
    dbscan = cluster.DBSCAN(eps=params["eps"])
    hdbscan = cluster.HDBSCAN(
        min_samples=params["hdbscan_min_samples"],
        min_cluster_size=params["hdbscan_min_cluster_size"],
        allow_single_cluster=params["allow_single_cluster"],
        copy=True,
    )
    optics = cluster.OPTICS(
        min_samples=params["min_samples"],
        xi=params["xi"],
        min_cluster_size=params["min_cluster_size"],
    )
    affinity_propagation = cluster.AffinityPropagation(
        damping=params["damping"],
        preference=params["preference"],
        random_state=params["random_state"],
    )
    average_linkage = cluster.AgglomerativeClustering(
        linkage="average",
        metric="cityblock",
        n_clusters=params["n_clusters"],
        connectivity=connectivity,
    )
    birch = cluster.Birch(n_clusters=params["n_clusters"])
    gmm = mixture.GaussianMixture(
        n_components=params["n_clusters"],
        covariance_type="full",
        random_state=params["random_state"],
    )

    clustering_algorithms = (
        ("MiniBatch\nKMeans", two_means),
        ("Affinity\nPropagation", affinity_propagation),
        ("MeanShift", ms),
        ("Spectral\nClustering", spectral),
        ("Ward", ward),
        ("Agglomerative\nClustering", average_linkage),
        ("DBSCAN", dbscan),
        ("HDBSCAN", hdbscan),
        ("OPTICS", optics),
        ("BIRCH", birch),
        ("Gaussian\nMixed", gmm),
    )

    for name, algorithm in clustering_algorithms:
        t0 = time.time()

        # catch warnings related to kneighbors_graph
        with warnings.catch_warnings():
            warnings.filterwarnings(
                "ignore",
                message="the number of connected components of the "
                "connectivity matrix is [0-9]{1,2}" 
                " > 1. Completing it to avoid stopping the tree early.",
                category=UserWarning,
            )
            warnings.filterwarnings(
                "ignore",
                message="Graph is not fully connected, spectral embedding"
                " may not work as expected.",
                category=UserWarning,
            )
            algorithm.fit(X)

        t1 = time.time()
        if hasattr(algorithm, "labels_"):
            labels = algorithm.labels_
        else:
            labels = algorithm.predict(X)

        # Number of clusters in labels, ignoring noise if the algorithm reports it
        n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
        n_noise = list(labels).count(-1)

        print(f"Algorithm: {name}\n  Estimated number of clusters: {n_clusters}\n  Estimated number of noise points: {n_noise}")
        if hasattr(algorithm, "cluster_centers_"):
            centers = algorithm.cluster_centers_

            # Plot result
            plt.subplot(len(datasets), len(clustering_algorithms), plot_num)
            if i_dataset == 0:
                plt.title(name, size=17)
            plt.scatter(X[:, 0], X[:, 1], c=labels, s=10, cmap="viridis")

            # Plot the cluster centers
            plt.scatter(centers[:, 0], centers[:, 1], c="black", s=50, alpha=0.7)

            plt.xlim(-2, 2)
            plt.ylim(-2, 2)
            plt.xticks(())
            plt.yticks(())
            plt.text(
                0.99,
                0.02,
                (
                    "%.2f" % (t1 - t0)
                ).lstrip("0"),
                size=11,
                horizontalalignment='right',
                color="w",
            )
            plot_num += 1

        elif hasattr(algorithm, "cluster_centers_indices"):
            centers = X[algorithm.cluster_centers_indices_]
            plt.subplot(len(datasets), len(clustering_algorithms), plot_num)
            if i_dataset == 0:
                plt.title(name, size=17)
            plt.scatter(X[:, 0], X[:, 1], c=labels, s=10, cmap="viridis")
            plt.scatter(centers[:, 0], centers[:, 1], c="black", s=50, alpha=0.7)
            plt.xlim(-2, 2)
            plt.ylim(-2, 2)
            plt.xticks(())
            plt.yticks(())
            plt.text(
                0.99,
                0.02,
                ("%.2f" % (t1 - t0)).lstrip("0"),
                size=11,
                horizontalalignment='right',
                color="w",
            )
            plot_num += 1
        else:
            # DBSCAN does not have a cluster_centers_ attribute.
            # For this algorithm, we will plot the points and the noise points.
            # The noise points are marked with a label of -1.
            plt.subplot(len(datasets), len(clustering_algorithms), plot_num)
            if i_dataset == 0:
                plt.title(name, size=17)
            plt.scatter(X[:, 0], X[:, 1], c=labels, s=10, cmap="viridis")
            plt.xlim(-2, 2)
            plt.ylim(-2, 2)
            plt.xticks(())
            plt.yticks(())
            plt.text(
                0.99,
                0.02,
                ("%.2f" % (t1 - t0)).lstrip("0"),
                size=11,
                horizontalalignment='right',
                color="w",
            )
            plot_num += 1

plt.show()

```

--------------------------------

### Setup and Data Generation

Source: https://scikit-learn.org/dev/auto_examples/linear_model/plot_sgdocsvm_vs_ocsvm.html

Imports necessary libraries and generates synthetic training, testing, and outlier data for the One-Class SVM comparison. Sets up plotting fonts and random state for reproducibility.

```python
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
import numpy as np

from sklearn.kernel_approximation import Nystroem
from sklearn.linear_model import SGDOneClassSVM
from sklearn.pipeline import make_pipeline
from sklearn.svm import OneClassSVM

font = {"weight": "normal", "size": 15}

matplotlib.rc("font", **font)

random_state = 42
rng = np.random.RandomState(random_state)

# Generate train data
X = 0.3 * rng.randn(500, 2)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))

# OCSVM hyperparameters
nu = 0.05
gamma = 2.0

# Fit the One-Class SVM
clf = OneClassSVM(gamma=gamma, kernel="rbf", nu=nu)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
n_error_train = y_pred_train[y_pred_train == -1].size
n_error_test = y_pred_test[y_pred_test == -1].size
n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size

```

--------------------------------

### PartialDependenceDisplay

Source: https://scikit-learn.org/dev/modules/generated/sklearn.inspection.PartialDependenceDisplay.html

Partial Dependence Plot (PDP) and Individual Conditional Expectation (ICE).
It is recommended to use `from_estimator` to create a `PartialDependenceDisplay`. All parameters are stored as attributes.
For general information regarding `scikit-learn` visualization tools, see the Visualization Guide. For guidance on interpreting these plots, refer to the Inspection Guide.
For an example on how to use this class, see the following example: Advanced Plotting With Partial Dependence.
Added in version 0.22.

```APIDOC
class sklearn.inspection.PartialDependenceDisplay(_pd_results_, *_*, _features_, _feature_names_, _target_idx_, _deciles_, _kind='average'_, _subsample=1000_, _random_state=None_, _is_categorical=None_)

Parameters:

**pd_results** : list of Bunch
    Results of `partial_dependence` for `features`.

**features** : list of (int,) or list of (int, int)
    Indices of features for a given plot. A tuple of one integer will plot a partial dependence curve of one feature. A tuple of two integers will plot a two-way partial dependence curve as a contour plot.

**feature_names** : list of str
    Feature names corresponding to the indices in `features`.

**target_idx** : int
    * In a multiclass setting, specifies the class for which the PDPs should be computed. Note that for binary classification, the positive class (index 1) is always used.
    * In a multioutput setting, specifies the task for which the PDPs should be computed.
    Ignored in binary classification or classical regression settings.

**deciles** : dict
    Deciles for feature indices in `features`.

**kind** : {‘average’, ‘individual’, ‘both’} or list of such str, default=’average’
    Whether to plot the partial dependence averaged across all the samples in the dataset or one line per sample or both.
    * `kind='average'` results in the traditional PD plot;
    * `kind='individual'` results in the ICE plot;
    * `kind='both'` results in plotting both the ICE and PD on the same plot.
    A list of such strings can be provided to specify `kind` on a per-plot basis. The length of the list should be the same as the number of interaction requested in `features`.
    Note
    ICE (‘individual’ or ‘both’) is not a valid option for 2-ways interactions plot. As a result, an error will be raised. 2-ways interaction plots should always be configured to use the ‘average’ kind instead.
    Note
    The fast `method='recursion'` option is only available for `kind='average'` and `sample_weights=None`. Computing individual dependencies and doing weighted averages requires using the slower `method='brute'`.
    Added in version 0.24: Add `kind` parameter with `'average'`, `'individual'`, and `'both'` options.
    Added in version 1.1: Add the possibility to pass a list of string specifying `kind` for each plot.

**subsample** : float, int or None, default=1000
    Sampling for ICE curves when `kind` is ‘individual’ or ‘both’. If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to be used to plot ICE curves. If int, represents the maximum absolute number of samples to use.
    Note that the full dataset is still used to calculate partial dependence when `kind='both'`.
    Added in version 0.24.

**random_state** : int, RandomState instance or None, default=None
    Controls the randomness of the selected samples when subsamples is not `None`. See Glossary for details.
    Added in version 0.24.

**is_categorical** : list of (bool,) or list of (bool, bool), default=None
    Whether each target feature in `features` is categorical or not. The list should be same size as `features`. If `None`, all features are assumed to be continuous.
    Added in version 1.2.

Attributes:

**bounding_ax_** : matplotlib Axes or None
    If `ax` is an axes or None, the `bounding_ax_` is the axes where the grid of partial dependence plots are drawn. If `ax` is a list of axes or a numpy array of axes, `bounding_ax_` is None.

**axes_** : ndarray of matplotlib Axes
    If `ax` is an axes or None, `axes_[i, j]` is the axes on the i-th row and j-th column. If `ax` is a list of axes, `axes_[i]` is the i-th item in `ax`. Elements that are None correspond to a nonexisting axes in that position.

**lines_** : ndarray of matplotlib Artists
    If `ax` is an axes or None, `lines_[i, j]` is the partial dependence curve on the i-th row and j-th column. If `ax` is a list of axes, `lines_[i]` is the partial dependence curve corresponding to the i-th item in `ax`. Elements that are None correspond to a nonexisting axes or an axes that does not include a line plot.

**deciles_vlines_** : ndarray of matplotlib LineCollection
    If `ax` is an axes or None, `vlines_[i, j]` is the line collection representing the x axis deciles of the i-th row and j-th column. If `ax` is a list of axes, `vlines_[i]` corresponds to the i-th item in `ax`. Elements that are None correspond to a nonexisting axes or an axes that does not include a PDP plot.
```

--------------------------------

### Build Documentation with Filtered Examples

Source: https://scikit-learn.org/dev/developers/contributing.html

Builds the documentation and runs only examples whose filenames contain 'plot_calibration'. This is useful for testing specific example changes.

```bash
EXAMPLES_PATTERN="plot_calibration" make html
```

--------------------------------

### RandomizedSearchCV Example

Source: https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

Demonstrates how to use RandomizedSearchCV with Logistic Regression on the Iris dataset. It shows parameter distribution setup and fitting the model.

```python
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
iris = load_iris()
logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200,
                              random_state=0)
distributions = dict(C=uniform(loc=0, scale=4),
                     l1_ratio=[0, 1])
clf = RandomizedSearchCV(logistic, distributions, random_state=0)
search = clf.fit(iris.data, iris.target)
search.best_params_
```

--------------------------------

### Create a Pipeline with make_pipeline

Source: https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_pipeline_display.html

This example demonstrates creating a pipeline using the `make_pipeline` utility function, which automatically names the steps. It's a convenient way to build pipelines.

```python
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# Create a pipeline using make_pipeline
pipe = make_pipeline(
    StandardScaler(),
    LogisticRegression(solver='liblinear')
)

# Print the pipeline steps to see the auto-generated names
print(pipe.steps)

```

--------------------------------

### MiniBatchNMF Example

Source: https://scikit-learn.org/dev/modules/generated/sklearn.decomposition.MiniBatchNMF.html

This example demonstrates how to initialize and use the MiniBatchNMF model. It shows the basic steps of creating an instance, fitting it to data, and obtaining the transformed data and components.

```python
import numpy as np
X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
from sklearn.decomposition import MiniBatchNMF
model = MiniBatchNMF(n_components=2, init='random', random_state=0)
W = model.fit_transform(X)
H = model.components_
```

--------------------------------

### Import necessary libraries

Source: https://scikit-learn.org/dev/_downloads/19e9c0cb24a132133cef3b311caaf199/plot_nca_illustration.ipynb

Imports libraries for plotting, numerical operations, and Neighborhood Components Analysis. This setup is required for the subsequent examples.

```python
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
from scipy.special import logsumexp

from sklearn.datasets import make_classification
from sklearn.neighbors import NeighborhoodComponentsAnalysis
```

--------------------------------

### LeavePGroupsOut Example

Source: https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.LeavePGroupsOut.html

Demonstrates how to use LeavePGroupsOut to split data into training and testing sets, leaving out a specified number of groups. This example shows how to get the number of splits and iterate through them, printing the train and test indices along with their corresponding group labels.

```python
import numpy as np
from sklearn.model_selection import LeavePGroupsOut
X = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([1, 2, 1])
groups = np.array([1, 2, 3])
lpgo = LeavePGroupsOut(n_groups=2)
lpgo.get_n_splits(groups=groups)
print(lpgo)
for i, (train_index, test_index) in enumerate(lpgo.split(X, y, groups)):
    print(f"Fold {i}:")
    print(f"  Train: index={train_index}, group={groups[train_index]}")
    print(f"  Test:  index={test_index}, group={groups[test_index]}")
```

--------------------------------

### Anomaly Detection Algorithms Comparison Setup

Source: https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_anomaly_comparison.html

Sets up parameters and defines a list of anomaly detection algorithms to be compared. Includes imports for necessary libraries and data generation functions.

```python
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import time

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

from sklearn import svm
from sklearn.covariance import EllipticEnvelope
from sklearn.datasets import make_blobs, make_moons
from sklearn.ensemble import IsolationForest
from sklearn.kernel_approximation import Nystroem
from sklearn.linear_model import SGDOneClassSVM
from sklearn.neighbors import LocalOutlierFactor
from sklearn.pipeline import make_pipeline

matplotlib.rcParams["contour.negative_linestyle"] = "solid"

# Example settings
n_samples = 300
outliers_fraction = 0.15
n_outliers = int(outliers_fraction * n_samples)
n_inliers = n_samples - n_outliers

# define outlier/anomaly detection methods to be compared.
# the SGDOneClassSVM must be used in a pipeline with a kernel approximation
# to give similar results to the OneClassSVM
anomaly_algorithms = [
    (
        "Robust covariance",
        EllipticEnvelope(contamination=outliers_fraction, random_state=42),
    ),
    ("One-Class SVM", svm.OneClassSVM(nu=outliers_fraction, kernel="rbf", gamma=0.1)),
    (
        "One-Class SVM (SGD)",
        make_pipeline(
            Nystroem(gamma=0.1, random_state=42, n_components=150),
            SGDOneClassSVM(
                nu=outliers_fraction,
                shuffle=True,
                fit_intercept=True,
                random_state=42,
                tol=1e-6,
            ),
        ),
    ),
    (
        "Isolation Forest",
        IsolationForest(contamination=outliers_fraction, random_state=42),
    ),
    (
        "Local Outlier Factor",
        LocalOutlierFactor(n_neighbors=35, contamination=outliers_fraction),
    ),
]

```

--------------------------------

### Basic QDA Example

Source: https://scikit-learn.org/dev/modules/generated/sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis.html

Demonstrates the basic usage of QuadraticDiscriminantAnalysis with sample data. This snippet shows how to import the class, create sample data, and fit the model.

```python
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = QuadraticDiscriminantAnalysis()
clf.fit(X, y)
```

--------------------------------

### Get Accuracy Scorer

Source: https://scikit-learn.org/dev/modules/generated/sklearn.metrics.get_scorer.html

Demonstrates how to obtain the 'accuracy' scorer and use it to evaluate a fitted classifier. This example requires numpy and DummyClassifier from scikit-learn.

```python
>>> import numpy as np
>>> from sklearn.dummy import DummyClassifier
>>> from sklearn.metrics import get_scorer
>>> X = np.reshape([0, 1, -1, -0.5, 2], (-1, 1))
>>> y = np.array([0, 1, 1, 0, 1])
>>> classifier = DummyClassifier(strategy="constant", constant=0).fit(X, y)
>>> accuracy = get_scorer("accuracy")
>>> accuracy(classifier, X, y)
0.4
```

--------------------------------

### Data Generation and Preprocessing Setup

Source: https://scikit-learn.org/dev/auto_examples/preprocessing/plot_discretization_classification.html

Imports necessary libraries and defines helper functions for plotting and estimator naming. Sets up the mesh grid size for plotting decision boundaries.

```python
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap

from sklearn.datasets import make_circles, make_classification, make_moons
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.exceptions import ConvergenceWarning
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import KBinsDiscretizer, StandardScaler
from sklearn.svm import SVC, LinearSVC
from sklearn.utils._testing import ignore_warnings

h = 0.02  # step size in the mesh


def get_name(estimator):
    name = estimator.__class__.__name__
    if name == "Pipeline":
        name = [get_name(est[1]) for est in estimator.steps]
        name = " + ".join(name)
    return name


# list of (estimator, param_grid), where param_grid is used in GridSearchCV
# The parameter spaces in this example are limited to a narrow band to reduce
# its runtime. In a real use case, a broader search space for the algorithms
```

--------------------------------

### Load Diabetes Dataset

Source: https://scikit-learn.org/dev/modules/generated/sklearn.datasets.load_diabetes.html

Loads the diabetes dataset and accesses its target values and data shape. Use this to get started with the dataset for regression tasks.

```python
>>> from sklearn.datasets import load_diabetes
>>> diabetes = load_diabetes()
>>> diabetes.target[:3]
array([151.,  75., 141.])
>>> diabetes.data.shape
(442, 10)
```

--------------------------------

### ParameterGrid Initialization and Usage

Source: https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.ParameterGrid.html

Demonstrates how to initialize ParameterGrid with a dictionary of parameters and iterate over the generated combinations. It also shows how to handle a sequence of dictionaries for more complex grid exploration.

```APIDOC
## ParameterGrid

class sklearn.model_selection.ParameterGrid(_param_grid_)

### Description

Grid of parameters with a discrete number of values for each. Can be used to iterate over parameter value combinations with the Python built-in function iter. The order of the generated parameter combinations is deterministic.

### Parameters

#### param_grid
- **dict of str to sequence, or sequence of such** - The parameter grid to explore, as a dictionary mapping estimator parameters to sequences of allowed values. An empty dict signifies default parameters. A sequence of dicts signifies a sequence of grids to search, and is useful to avoid exploring parameter combinations that make no sense or have no effect.

### Examples

```python
>>> from sklearn.model_selection import ParameterGrid
>>> param_grid = {'a': [1, 2], 'b': [True, False]}
>>> list(ParameterGrid(param_grid)) == (
...    [{'a': 1, 'b': True}, {'a': 1, 'b': False},
...     {'a': 2, 'b': True}, {'a': 2, 'b': False}])
True
```

```python
>>> grid = [{'kernel': ['linear']}, {'kernel': ['rbf'], 'gamma': [1, 10]}]
>>> list(ParameterGrid(grid)) == [{'kernel': 'linear'},
...                               {'kernel': 'rbf', 'gamma': 1},
...                               {'kernel': 'rbf', 'gamma': 10}]
True
>>> ParameterGrid(grid)[1] == {'kernel': 'rbf', 'gamma': 1}
True
```
```

--------------------------------

### StratifiedKFold Example

Source: https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.StratifiedKFold.html

Demonstrates how to use StratifiedKFold to split data into stratified train and test sets. It shows how to get the number of splits and iterate through the generated folds.

```python
import numpy as np
from sklearn.model_selection import StratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])
skf = StratifiedKFold(n_splits=2)
skf.get_n_splits()
print(skf)
for i, (train_index, test_index) in enumerate(skf.split(X, y)):
    print(f"Fold {i}:")
    print(f"  Train: index={train_index}")
    print(f"  Test:  index={test_index}")
```