### Run integration tests Source: https://github.com/kno10/python-kmedoids/blob/main/docs/index.md Validate the installation by running the integration tests using Python's unittest module. ```sh python -m unittest discover tests ``` -------------------------------- ### Install kmedoids with pip or conda Source: https://github.com/kno10/python-kmedoids/blob/main/docs/index.md Use pip or conda to install pre-built packages for various systems. For uncommon architectures, Rust may need to be installed first. ```sh pip install kmedoids ``` ```sh conda install -c conda-forge kmedoids ``` -------------------------------- ### Install kmedoids with pip Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Install the kmedoids package using pip. This is the standard method for most users. ```sh pip install kmedoids ``` -------------------------------- ### Validate kmedoids installation Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Run integration tests to validate the kmedoids installation. This requires numpy to be installed. ```sh pip install numpy python -m unittest discover tests ``` -------------------------------- ### Install kmedoids with conda Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Install the kmedoids package from the conda-forge channel. This is an alternative installation method. ```sh conda install -c conda-forge kmedoids ``` -------------------------------- ### Build kmedoids from source Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Build and install the kmedoids package from source using maturin. This method is useful for development or when pre-built packages are unavailable. Ensure you have Rust/Cargo installed. ```sh pip install maturin git clone https://github.com/kno10/python-kmedoids.git cd python-kmedoids maturin develop --release ``` -------------------------------- ### Compile kmedoids from source Source: https://github.com/kno10/python-kmedoids/blob/main/docs/index.md Compile the package from source using maturin, requiring Rust and Python 3. Ensure a virtual environment is activated before installation. ```sh # activate your desired virtual environment first pip install maturin git clone https://github.com/kno10/python-kmedoids.git cd python-kmedoids # build and install the package: maturin develop --release ``` -------------------------------- ### Choose optimal number of clusters with DynMSC Source: https://github.com/kno10/python-kmedoids/blob/main/docs/index.md Use the DynMSC algorithm to find the optimal number of clusters (k) by optimizing the Medoid Silhouette score within a specified range. This example uses a subset of the MNIST dataset. ```python import kmedoids, numpy from sklearn.datasets import fetch_openml from sklearn.metrics.pairwise import euclidean_distances X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) X = X[:10000] diss = euclidean_distances(X) kmin = 10 kmax = 20 dm = kmedoids.dynmsc(diss, kmax, kmin) print("Optimal number of clusters according to the Medoid Silhouette:", dm.bestk) print("Medoid Silhouette over range of k:", dm.losses) print("Range of k:", dm.rangek) ``` -------------------------------- ### Compare FastPAM1 and PAM with BUILD init Source: https://context7.com/kno10/python-kmedoids/llms.txt Compares the results of FastPAM1 and standard PAM when both use the 'build' initialization strategy. Asserts that their loss and medoids are identical. ```python import kmedoids fp1 = kmedoids.fastpam1(dist, medoids=2, init="build") pam = kmedoids.pam(dist, medoids=2, init="build") print("FastPAM1 loss:", fp1.loss) # 9.0 print("PAM loss: ", pam.loss) # 9.0 (identical) assert fp1.loss == pam.loss, "Results should match" print("Medoids:", fp1.medoids) print("Labels:", fp1.labels) ``` -------------------------------- ### kmedoids.pam_build - PAM BUILD Phase Only Source: https://context7.com/kno10/python-kmedoids/llms.txt Runs only the greedy PAM BUILD initialization. This is useful for obtaining initial medoids that can be used as a warm-start for iterative algorithms like FasterPAM. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) # Get deterministic initial medoids from BUILD build_result = kmedoids.pam_build(dist, k=2) print("BUILD medoids:", build_result.medoids) # e.g. [0 2] print("BUILD loss: ", build_result.loss) # Use the BUILD medoids as a fixed start for FasterPAM result = kmedoids.fasterpam(dist, medoids=build_result.medoids) print("FasterPAM loss after BUILD init:", result.loss) # Equivalent to using init="build" directly result2 = kmedoids.fasterpam(dist, medoids=2, init="build") assert result.loss == result2.loss ``` -------------------------------- ### kmedoids.pam_build Source: https://context7.com/kno10/python-kmedoids/llms.txt Runs only the greedy PAM BUILD initialization, which constructs k initial medoids by repeatedly selecting the point that minimizes the total distance to already-selected medoids. Returns a KMedoidsResult that can be used as a warm-start for iterative algorithms. ```APIDOC ## `kmedoids.pam_build` — PAM BUILD Phase Only Runs only the greedy PAM BUILD initialization, which constructs k initial medoids by repeatedly selecting the point that minimizes the total distance to already-selected medoids. Returns a `KMedoidsResult` that can be used as a warm-start for iterative algorithms. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) # Get deterministic initial medoids from BUILD build_result = kmedoids.pam_build(dist, k=2) print("BUILD medoids:", build_result.medoids) # e.g. [0 2] print("BUILD loss: ", build_result.loss) # Use the BUILD medoids as a fixed start for FasterPAM result = kmedoids.fasterpam(dist, medoids=build_result.medoids) print("FasterPAM loss after BUILD init:", result.loss) # Equivalent to using init="build" directly result2 = kmedoids.fasterpam(dist, medoids=2, init="build") assert result.loss == result2.loss ``` ``` -------------------------------- ### KMedoids with Precomputed Distances and Raw Features Source: https://context7.com/kno10/python-kmedoids/llms.txt Demonstrates initializing KMedoids with a precomputed distance matrix and with raw feature arrays using the 'euclidean' metric. Shows fitting the model and accessing labels, medoid indices, inertia, and cluster centers. ```python import numpy as np import kmedoids from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.metrics.pairwise import euclidean_distances # --- With precomputed distance matrix --- dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.int32) km = kmedoids.KMedoids(n_clusters=2, method='fasterpam', init='build', random_state=0) km.fit(dist) print("Labels: ", km.labels_) print("Medoid indices: ", km.medoid_indices_) print("Inertia (loss): ", km.inertia_) # Transform: returns distances to each medoid dist_to_medoids = km.transform(dist) print("Shape:", dist_to_medoids.shape) # (5, 2) # --- With raw features and euclidean metric (requires sklearn) --- X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4]], dtype=np.float64) km_euc = kmedoids.KMedoids(n_clusters=2, metric='euclidean', method='fasterpam', random_state=0) km_euc.fit(X) print("Euclidean labels:", km_euc.labels_) print("Cluster centers: ", km_euc.cluster_centers_) # actual data points # --- DynMSC with automatic k selection --- km_dyn = kmedoids.KMedoids(n_clusters=10, method='dynmsc', random_state=42) km_dyn.fit(dist.astype(np.float32)) print("DynMSC labels:", km_dyn.labels_) # --- fit_predict convenience method --- labels = kmedoids.KMedoids(2, method='pam', init='build').fit_predict(dist) print("fit_predict labels:", labels) ``` -------------------------------- ### Classic PAM Clustering with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Use the classic PAM algorithm for correctness baselines or legacy reproducibility. It uses BUILD initialization followed by SWAP optimization. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.int32) # Default init="build" uses PAM BUILD phase before SWAP result = kmedoids.pam(dist, medoids=2, max_iter=100, init="build") print("Loss:", result.loss) # 9 print("Medoids:", result.medoids) # [0 2] or similar print("Labels:", result.labels) # Compare speed and quality vs FasterPAM on larger data import time from sklearn.datasets import fetch_openml from sklearn.metrics.pairwise import euclidean_distances X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) X = X[:5000].astype(np.float32) diss = euclidean_distances(X).astype(np.float32) t0 = time.time() fp = kmedoids.fasterpam(diss, 10, random_state=0) print(f"FasterPAM: {(time.time()-t0)*1000:.1f} ms, loss={fp.loss:.2f}") t0 = time.time() pam = kmedoids.pam(diss, 10, init="build") print(f"PAM: {(time.time()-t0)*1000:.1f} ms, loss={pam.loss:.2f}") ``` -------------------------------- ### kmedoids.pam Source: https://context7.com/kno10/python-kmedoids/llms.txt Implements the classic Partitioning Around Medoids (PAM) algorithm. It includes the BUILD initialization phase followed by iterative SWAP optimization, serving as a baseline for correctness. ```APIDOC ## pam(dist, medoids, *, max_iter=300, init='build') ### Description Performs k-medoids clustering using the classic PAM algorithm. ### Parameters - **dist** (numpy.ndarray) - A square, symmetric distance/dissimilarity matrix. - **medoids** (int) - The desired number of clusters (k). - **max_iter** (int, optional) - Maximum number of iterations for the SWAP phase. Defaults to 300. - **init** (str or list, optional) - Initialization method. Must be 'build' or a list of initial medoid indices. Defaults to 'build'. ### Returns - **result** (object) - An object containing clustering results: - **loss** (float) - The total sum of distances from each point to its assigned medoid. - **labels** (numpy.ndarray) - An array where each element is the index of the assigned medoid for the corresponding data point. - **medoids** (numpy.ndarray) - An array containing the indices of the selected medoids. ### Request Example ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.int32) result = kmedoids.pam(dist, medoids=2, max_iter=100, init="build") print("Loss:", result.loss) print("Medoids:", result.medoids) print("Labels:", result.labels) ``` ``` -------------------------------- ### Compare FasterPAM and PAM on MNIST dataset Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Compares the performance and loss of FasterPAM and standard PAM algorithms on a subset of the MNIST dataset. Requires pre-computed distance matrix. ```python import kmedoids, numpy, time from sklearn.datasets import fetch_openml from sklearn.metrics.pairwise import euclidean_distances X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) X = X[:10000] diss = euclidean_distances(X) start = time.time() fp = kmedoids.fasterpam(diss, 100) print("FasterPAM took: %.2f ms" % ((time.time() - start)*1000)) print("Loss with FasterPAM:", fp.loss) start = time.time() pam = kmedoids.pam(diss, 100) print("PAM took: %.2f ms" % ((time.time() - start)*1000)) print("Loss with PAM:", pam.loss) ``` -------------------------------- ### Compare FasterPAM and PAM on MNIST dataset Source: https://github.com/kno10/python-kmedoids/blob/main/docs/index.md Compare the performance and loss of FasterPAM and PAM algorithms on a subset of the MNIST dataset. Calculates Euclidean distances and measures execution time. ```python import kmedoids import numpy from sklearn.datasets import fetch_openml from sklearn.metrics.pairwise import euclidean_distances X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) X = X[:10000] diss = euclidean_distances(X) start = time.time() fp = kmedoids.fasterpam(diss, 100) print("FasterPAM took: %.2f ms" % ((time.time() - start)*1000)) print("Loss with FasterPAM:", fp.loss) start = time.time() pam = kmedoids.pam(diss, 100) print("PAM took: %.2f ms" % ((time.time() - start)*1000)) print("Loss with PAM:", pam.loss) ``` -------------------------------- ### FastPAM1 Clustering with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Use FastPAM1 for a drop-in replacement for classic PAM when exact PAM-equivalent behavior is needed at reduced computational cost. It finds each best swap O(k) times faster. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float64) ``` -------------------------------- ### Basic kmedoids clustering Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Perform k-medoids clustering using the fasterpam algorithm with a precomputed distance matrix. The loss of the clustering is printed. ```python import kmedoids c = kmedoids.fasterpam(distmatrix, 5) print("Loss is:", c.loss) ``` -------------------------------- ### kmedoids.fastpam1 Source: https://context7.com/kno10/python-kmedoids/llms.txt Implements the FastPAM1 algorithm, which performs the same sequence of swaps as classic PAM but computes each best swap more efficiently. It offers a performance improvement over PAM while maintaining equivalent swap behavior. ```APIDOC ## fastpam1(dist, medoids, *, max_iter=300, init='build') ### Description Performs k-medoids clustering using the FastPAM1 algorithm. ### Parameters - **dist** (numpy.ndarray) - A square, symmetric distance/dissimilarity matrix. - **medoids** (int) - The desired number of clusters (k). - **max_iter** (int, optional) - Maximum number of iterations for the SWAP phase. Defaults to 300. - **init** (str or list, optional) - Initialization method. Can be 'random', 'build', or a list of initial medoid indices. Defaults to 'build'. ### Returns - **result** (object) - An object containing clustering results: - **loss** (float) - The total sum of distances from each point to its assigned medoid. - **labels** (numpy.ndarray) - An array where each element is the index of the assigned medoid for the corresponding data point. - **medoids** (numpy.ndarray) - An array containing the indices of the selected medoids. ### Request Example ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float64) result = kmedoids.fastpam1(dist, medoids=2) print("Loss:", result.loss) print("Medoids:", result.medoids) print("Labels:", result.labels) ``` ``` -------------------------------- ### Scikit-learn compatible kmedoids clustering Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Use the scikit-learn compatible API for k-medoids clustering with the fasterpam method. The inertia of the clustering is printed. ```python import kmedoids km = kmedoids.KMedoids(5, method='fasterpam') c = km.fit(distmatrix) print("Loss is:", c.inertia_) ``` -------------------------------- ### kmedoids.fasterpam Source: https://context7.com/kno10/python-kmedoids/llms.txt Implements the FasterPAM clustering algorithm, an accelerated variant of PAM that optimizes swap selection for improved performance. It supports multi-threading, various initialization strategies, and different data types. ```APIDOC ## fasterpam(dist, medoids, *, max_iter=300, init='random', random_state=None, n_cpu=None) ### Description Performs k-medoids clustering using the FasterPAM algorithm. ### Parameters - **dist** (numpy.ndarray) - A square, symmetric distance/dissimilarity matrix. - **medoids** (int) - The desired number of clusters (k). - **max_iter** (int, optional) - Maximum number of iterations for the SWAP phase. Defaults to 300. - **init** (str or list, optional) - Initialization method. Can be 'random', 'build', or a list of initial medoid indices. Defaults to 'random'. - **random_state** (int, optional) - Seed for random number generation for reproducible results. Defaults to None. - **n_cpu** (int, optional) - Number of CPU cores to use for parallel execution. Auto-detected if None. ### Returns - **result** (object) - An object containing clustering results: - **loss** (float) - The total sum of distances from each point to its assigned medoid. - **labels** (numpy.ndarray) - An array where each element is the index of the assigned medoid for the corresponding data point. - **medoids** (numpy.ndarray) - An array containing the indices of the selected medoids. - **n_iter** (int) - The number of iterations performed. - **n_swap** (int) - The number of swaps performed during the optimization. ### Request Example ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float64) result = kmedoids.fasterpam(dist, medoids=2, max_iter=100, init="random", random_state=42) print("Loss:", result.loss) print("Labels:", result.labels) print("Medoids:", result.medoids) ``` ``` -------------------------------- ### FasterPAM Clustering with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Use FasterPAM for accelerated k-medoids clustering. Supports multi-threading, various initializations, and numpy dtypes. Reproducible results with `random_state`. ```python import numpy as np import kmedoids # Build a symmetric distance matrix (5 points) dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float64) # Cluster into k=2 groups, reproducible with a seed result = kmedoids.fasterpam(dist, medoids=2, max_iter=100, init="random", random_state=42) print("Loss (sum of distances to medoids):", result.loss) # Loss (sum of distances to medoids): 9.0 print("Cluster labels:", result.labels) # Cluster labels: [0 0 1 1 1] (array index of the assigned medoid) print("Medoid indices:", result.medoids) # Medoid indices: [0 2] print("Iterations:", result.n_iter) print("Swaps performed:", result.n_swap) # Use PAM BUILD initialization for a deterministic, higher-quality starting point result_build = kmedoids.fasterpam(dist, medoids=2, init="build") print("Loss with BUILD init:", result_build.loss) # Parallel execution on large matrices (auto-detected for n >= 1000) large_dist = np.random.rand(2000, 2000).astype(np.float32) large_dist = (large_dist + large_dist.T) / 2 np.fill_diagonal(large_dist, 0) result_par = kmedoids.fasterpam(large_dist, medoids=10, n_cpu=4, random_state=0) print("Parallel loss:", result_par.loss) ``` -------------------------------- ### PAMSIL Clustering with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Employ PAMSIL to optimize the full (non-medoid) Silhouette criterion using the PAM SWAP framework. Note that this is generally slower than Medoid Silhouette variants. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) result = kmedoids.pamsil(dist, medoids=2, init="build") print("Silhouette criterion loss:", result.loss) # e.g. 0.3138 print("Medoids:", result.medoids) print("Labels:", result.labels) ``` -------------------------------- ### KMedoids Class - Scikit-learn Compatible API Source: https://context7.com/kno10/python-kmedoids/llms.txt Demonstrates the usage of the KMedoids class for clustering with both precomputed distance matrices and raw feature arrays, including fit, transform, and fit_predict methods. Also shows DynMSC for automatic k selection. ```APIDOC ## `kmedoids.KMedoids` — sklearn-Compatible API A scikit-learn `BaseEstimator`/`ClusterMixin` wrapper supporting all clustering methods via a standard `fit`/`predict`/`transform`/`fit_predict` interface. Accepts precomputed distance matrices (`metric="precomputed"`, default) or raw feature arrays with any sklearn-supported metric. The `n_clusters` parameter doubles as the maximum k for `method="dynmsc"`. ```python import numpy as np import kmedoids from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.metrics.pairwise import euclidean_distances # --- With precomputed distance matrix --- dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.int32) km = kmedoids.KMedoids(n_clusters=2, method='fasterpam', init='build', random_state=0) pm.fit(dist) print("Labels: ", km.labels_) print("Medoid indices: ", km.medoid_indices_) print("Inertia (loss): ", km.inertia_) # Transform: returns distances to each medoid dist_to_medoids = km.transform(dist) print("Shape:", dist_to_medoids.shape) # (5, 2) # --- With raw features and euclidean metric (requires sklearn) --- X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4]], dtype=np.float64) km_euc = kmedoids.KMedoids(n_clusters=2, metric='euclidean', method='fasterpam', random_state=0) km_euc.fit(X) print("Euclidean labels:", km_euc.labels_) print("Cluster centers: ", km_euc.cluster_centers_) # actual data points # --- DynMSC with automatic k selection --- km_dyn = kmedoids.KMedoids(n_clusters=10, method='dynmsc', random_state=42) km_dyn.fit(dist.astype(np.float32)) print("DynMSC labels:", km_dyn.labels_) # --- fit_predict convenience method --- labels = kmedoids.KMedoids(2, method='pam', init='build').fit_predict(dist) print("fit_predict labels:", labels) ``` ``` -------------------------------- ### kmedoids.alternating - Alternating k-Medoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Implements a k-means-style k-medoids algorithm. It alternates between assigning points to nearest medoids and updating medoids. This is typically faster per iteration than PAM but may yield worse cluster quality. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.int32) alt = kmedoids.alternating(dist, medoids=2, init="build") fp = kmedoids.fasterpam(dist, medoids=2, init="build") print("Alternating loss:", alt.loss) # May be higher than PAM print("FasterPAM loss: ", fp.loss) # Usually lower print("Alternating medoids:", alt.medoids) ``` -------------------------------- ### kmedoids.fastermsc Source: https://context7.com/kno10/python-kmedoids/llms.txt FasterMSC directly optimizes the Average Medoid Silhouette by eagerly accepting any improving swap, using an O(k²) speedup over PAMMEDSIL. It finds clusterings with higher silhouette scores than PAM-family loss minimization, at the cost of needing float-typed dissimilarity matrices. ```APIDOC ## `kmedoids.fastermsc` — FasterMSC: Fast Medoid Silhouette Clustering FasterMSC directly optimizes the Average Medoid Silhouette by eagerly accepting any improving swap, using an O(k²) speedup over PAMMEDSIL. It finds clusterings with higher silhouette scores than PAM-family loss minimization, at the cost of needing float-typed dissimilarity matrices. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) result = kmedoids.fastermsc(dist, medoids=2, init="build") print("Avg Medoid Silhouette (loss):", result.loss) # e.g. 0.8172 print("Medoids:", result.medoids) print("Labels:", result.labels) print("Iterations:", result.n_iter) print("Swaps:", result.n_swap) ``` ``` -------------------------------- ### FastMSC Clustering with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Use FastMSC for efficient clustering that provides a balance between PAMMEDSIL's accuracy and FasterMSC's speed. It uses the same swaps as PAMMEDSIL but is significantly faster. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) fmsc = kmedoids.fastmsc(dist, medoids=2, init="build") pammsil = kmedoids.pammedsil(dist, medoids=2, init="build") print("FastMSC loss: ", fmsc.loss) # same result as PAMMEDSIL print("PAMMEDSIL loss: ", pammsil.loss) assert fmsc.loss == pammsil.loss print("Medoids:", fmsc.medoids) ``` -------------------------------- ### Find optimal number of clusters using DynMSC Source: https://github.com/kno10/python-kmedoids/blob/main/README.md Uses the DynMSC algorithm to find the optimal number of clusters based on the Medoid Silhouette index within a specified range. Requires a pre-computed distance matrix. ```python import kmedoids, numpy from sklearn.datasets import fetch_openml from sklearn.metrics.pairwise import euclidean_distances X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) X = X[:10000] diss = euclidean_distances(X) kmin, kmax = 10, 20 dm = kmedoids.dynmsc(diss, kmax, kmin) print("Optimal number of clusters according to the Medoid Silhouette:", dm.bestk) print("Medoid Silhouette over range of k:", dm.losses) print("Range of k:", dm.rangek) ``` -------------------------------- ### kmedoids.fastermsc - Faster Medoid Silhouette Clustering Source: https://context7.com/kno10/python-kmedoids/llms.txt Optimizes the Average Medoid Silhouette score directly using an O(k²) speedup. This algorithm typically finds clusterings with higher silhouette scores than PAM-family loss minimization but requires float-typed dissimilarity matrices. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) result = kmedoids.fastermsc(dist, medoids=2, init="build") print("Avg Medoid Silhouette (loss):", result.loss) # e.g. 0.8172 print("Medoids:", result.medoids) print("Labels:", result.labels) print("Iterations:", result.n_iter) print("Swaps:", result.n_swap) ``` -------------------------------- ### PAMMEDSIL Clustering with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Utilize PAMMEDSIL for clustering that directly optimizes the Medoid Silhouette criterion. This algorithm is preserved for reproducibility and comparison with faster methods. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) result = kmedoids.pammedsil(dist, medoids=2, init="build") print("Medoid Silhouette:", result.loss) # e.g. 0.8172 print("Medoids:", result.medoids) print("Labels:", result.labels) ``` -------------------------------- ### Silhouette Index Evaluation with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Compute the average Silhouette score for a given clustering using a dissimilarity matrix. Supports parallel computation and retrieval of per-sample silhouette values. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.int32) # Get clustering labels result = kmedoids.pam(dist, medoids=2) print("Labels:", result.labels) # Compute average silhouette (scalar) avg_sil, _ = kmedoids.silhouette(dist, result.labels) print("Average Silhouette:", avg_sil) # Get per-sample silhouette values avg_sil, sample_sils = kmedoids.silhouette(dist, result.labels, samples=True, n_cpu=1) print("Per-sample Silhouette:", sample_sils) # e.g. [0.75, 0.60, 0.50, 0.45, 0.30] # Parallel computation (samples=True requires n_cpu=1) avg_par, _ = kmedoids.silhouette(dist.astype(np.float32), result.labels, n_cpu=2) print("Parallel Silhouette:", avg_par) ``` -------------------------------- ### kmedoids.alternating Source: https://context7.com/kno10/python-kmedoids/llms.txt A k-means-style k-medoids algorithm that alternates between assigning each point to its nearest medoid and updating each medoid to the point in its cluster that minimizes the cluster's total intra-distance. Significantly faster per iteration than PAM-family algorithms but typically yields substantially worse cluster quality. ```APIDOC ## `kmedoids.alternating` — Alternating k-Medoids (k-Means Style) A k-means-style k-medoids algorithm that alternates between assigning each point to its nearest medoid and updating each medoid to the point in its cluster that minimizes the cluster's total intra-distance. Significantly faster per iteration than PAM-family algorithms but typically yields substantially worse cluster quality. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.int32) alt = kmedoids.alternating(dist, medoids=2, init="build") fp = kmedoids.fasterpam(dist, medoids=2, init="build") print("Alternating loss:", alt.loss) # May be higher than PAM print("FasterPAM loss: ", fp.loss) # Usually lower print("Alternating medoids:", alt.medoids) ``` ``` -------------------------------- ### kmedoids.dynmsc - Automatic Cluster Count Selection Source: https://context7.com/kno10/python-kmedoids/llms.txt Automatically selects the optimal number of clusters (k) by running FasterMSC for a range of k values and choosing the one with the highest Average Medoid Silhouette score. Requires a float-typed dissimilarity matrix. ```python import numpy as np import kmedoids from sklearn.datasets import fetch_openml from sklearn.metrics.pairwise import euclidean_distances X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) X = X[:5000].astype(np.float32) diss = euclidean_distances(X).astype(np.float32) # Search for the best k in the range [5, 20] # kmax should be 2-3x the number of clusters you expect kmin, kmax = 5, 20 dm = kmedoids.dynmsc(diss, medoids=kmax, minimum_k=kmin, random_state=42) print("Best k (auto-selected):", dm.bestk) print("Best loss (Avg Medoid Silhouette):", dm.loss) print("Labels for best k:", dm.labels[:10], " ...") print("Medoids for best k:", dm.medoids) print("Silhouette scores over range:", dict(zip(dm.rangek, dm.losses))) # e.g. {5: 0.71, 6: 0.74, 7: 0.72, ..., 20: 0.51} # Use via sklearn-compatible API from kmedoids import KMedoids km = KMedoids(n_clusters=kmax, method='dynmsc') km.fit(diss) print("sklearn dynmsc bestk:", km.medoid_indices_) ``` -------------------------------- ### Medoid Silhouette Index Evaluation with kmedoids Source: https://context7.com/kno10/python-kmedoids/llms.txt Calculate the Average Medoid Silhouette, an efficient approximation of the full Silhouette score. This metric uses distances to medoid points for faster computation. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) # Cluster and evaluate result = kmedoids.fasterpam(dist, medoids=2) avg_msil, _ = kmedoids.medoid_silhouette(dist, result.medoids) print("Avg Medoid Silhouette:", avg_msil) # Get per-sample values avg_msil, sample_msils = kmedoids.medoid_silhouette(dist, result.medoids, samples=True) print("Per-sample Medoid Silhouette:", sample_msils) # Compare full Silhouette vs Medoid Silhouette avg_sil, _ = kmedoids.silhouette(dist.astype(np.int32), result.labels, n_cpu=1) print(f"Full Silhouette: {avg_sil:.4f} | Medoid Silhouette: {avg_msil:.4f}") ``` -------------------------------- ### kmedoids.pamsil Source: https://context7.com/kno10/python-kmedoids/llms.txt PAMSIL clustering algorithm. This algorithm optimizes the full (non-medoid) Silhouette criterion using the PAM SWAP framework and is generally slower than Medoid Silhouette variants. ```APIDOC ## `kmedoids.pamsil` — PAMSIL Clustering PAMSIL (Van der Laan, Pollard & Bryan, 2003) optimizes the full (non-medoid) Silhouette criterion using the PAM SWAP framework. It is generally slower than the Medoid Silhouette variants; use `fastermsc`/`fastmsc` or the standard `fasterpam` for typical clustering workloads. ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) result = kmedoids.pamsil(dist, medoids=2, init="build") print("Silhouette criterion loss:", result.loss) # e.g. 0.3138 print("Medoids:", result.medoids) print("Labels:", result.labels) ``` ``` -------------------------------- ### Result Objects: KMedoidsResult and DynkResult Source: https://context7.com/kno10/python-kmedoids/llms.txt Details the fields available in KMedoidsResult (returned by fixed-k clustering) and DynkResult (returned by dynmsc, including automatic k-selection metadata). ```APIDOC ## Result Objects: `KMedoidsResult` and `DynkResult` `KMedoidsResult` is returned by all fixed-k clustering functions; `DynkResult` is returned by `dynmsc` and extends it with automatic k-selection metadata (`bestk`, `losses`, `rangek`). ```python import numpy as np import kmedoids dist = np.array([ [0, 2, 3, 4, 5], [2, 0, 6, 7, 8], [3, 6, 0, 9, 10], [4, 7, 9, 0, 11], [5, 8, 10, 11, 0] ], dtype=np.float32) # KMedoidsResult fields r = kmedoids.fasterpam(dist, 2, random_state=0) print(type(r)) # print(r.loss) # float: total sum of distances to medoids print(r.labels) # ndarray[int]: cluster index for each point print(r.medoids) # ndarray[int]: indices of medoid points print(r.n_iter) # int: number of SWAP iterations performed print(r.n_swap) # int: total number of swaps accepted # DynkResult fields (superset of KMedoidsResult) dm = kmedoids.dynmsc(dist, 3, minimum_k=2, random_state=0) print(type(dm)) # print(dm.bestk) # int: optimal k by Avg Medoid Silhouette print(dm.losses) # ndarray: Avg Medoid Silhouette for each k in rangek print(dm.rangek) # range object: range(minimum_k, kmax+1) print(dm.loss) # float: Avg Medoid Silhouette for the best k print(dm.labels) # ndarray: cluster labels for the best k clustering print(dm.medoids) # ndarray: medoid indices for the best k clustering ``` ```