### Poisson Learning Fit Method Setup Source: https://jwcalder.github.io/GraphLearning/ssl.html Prepares the graph and source term for Poisson learning. It modifies the weight matrix and calculates the source term based on one-hot encoded labels and their mean. ```python n = self.graph.num_nodes unique_labels = np.unique(train_labels) k = len(unique_labels) #Zero out diagonal for faster convergence W = self.graph.weight_matrix W = W - sparse.spdiags(W.diagonal(),0,n,n) G = graph.graph(W) #Poisson source term onehot = utils.labels_to_onehot(train_labels,k) source = np.zeros((n, onehot.shape[1])) source[train_ind] = onehot - np.mean(onehot, axis=0) ``` -------------------------------- ### Two Moons Dataset SSL Example Source: https://jwcalder.github.io/GraphLearning/ssl.html Demonstrates semi-supervised learning on the two-moons dataset using the Laplace algorithm. Requires training data and a weight matrix. ```python import numpy as np import graphlearning as gl import matplotlib.pyplot as plt import sklearn.datasets as datasets X,labels = datasets.make_moons(n_samples=500,noise=0.1) W = gl.weightmatrix.knn(X,10) train_ind = gl.trainsets.generate(labels, rate=5) train_labels = labels[train_ind] model = gl.ssl.laplace(W) pred_labels = model.fit_predict(train_ind, train_labels) accuracy = gl.ssl.ssl_accuracy(pred_labels, labels, train_ind) print("Accuracy: %.2f%%"%accuracy) plt.scatter(X[:,0],X[:,1], c=pred_labels) plt.scatter(X[train_ind,0],X[train_ind,1], c='r') plt.show() ``` -------------------------------- ### PoissonMBO Initialization and Example Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the PoissonMBO model and demonstrates its usage on the MNIST dataset with one label per class. It shows how to load data, construct the graph, train the model, and evaluate accuracy. ```python import graphlearning as gl labels = gl.datasets.load('mnist', labels_only=True) W = gl.weightmatrix.knn('mnist', 10, metric='vae') num_train_per_class = 1 train_ind = gl.trainsets.generate(labels, rate=num_train_per_class) train_labels = labels[train_ind] class_priors = gl.utils.class_priors(labels) model = gl.ssl.poisson_mbo(W, class_priors) pred_labels = model.fit_predict(train_ind,train_labels,all_labels=labels) accuracy = gl.ssl.ssl_accuracy(labels,pred_labels,train_ind) print(model.name + ': %.2f%%'%accuracy) ``` -------------------------------- ### Gradient Descent Solver Setup for Poisson Learning Source: https://jwcalder.github.io/GraphLearning/ssl.html Sets up matrices for the gradient descent solver in Poisson learning. It calculates the degree matrix 'D' and the matrix 'P' based on the graph's weight matrix. ```python elif self.solver == "gradient_descent": #Setup matrices D = G.degree_matrix(p=-1) P = D*W.transpose() Db = D*source ``` -------------------------------- ### Model Change Active Learning Example Source: https://jwcalder.github.io/GraphLearning/active_learning.html Demonstrates the Model Change active learning algorithm. It initializes an active learner, iteratively selects query points, simulates human labeling, and updates the learner. Useful for scenarios where selecting points that maximally change the model is desired. ```python import graphlearning.active_learning as al import graphlearning as gl import numpy as np import matplotlib.pyplot as plt import sklearn.datasets as datasets X,labels = datasets.make_moons(n_samples=500,noise=0.1) W = gl.weightmatrix.knn(X,10) train_ind = gl.trainsets.generate(labels, rate=5) plt.scatter(X[:,0],X[:,1], c=labels) plt.scatter(X[train_ind,0],X[train_ind,1], c='r') plt.show() # compute initial, low-rank (spectral truncation) covariance matrix evals, evecs = model.graph.eigen_decomp(normalization='normalized', k=50) C = np.diag(1. / (evals + 1e-11)) AL = gl.active_learning.active_learner(model, gl.active_learning.model_change, train_ind, y[train_ind], C=C.copy(), V=evecs.copy()) for i in range(10): query_points = AL.select_queries() # return this iteration's newly chosen points query_labels = y[query_points] # simulate the human in the loop process AL.update(query_points, query_labels) # update the active_learning object's labeled set # plot plt.scatter(X[:,0],X[:,1], c=y) plt.scatter(X[AL.labeled_ind,0],X[AL.labeled_ind,1], c='r') plt.scatter(X[query_points,0],X[query_points,1], c='r', marker='*', s=200, edgecolors='k', linewidths=1) plt.show() print(AL.labeled_ind) print(AL.labels) ``` -------------------------------- ### Uncertainty Sampling Active Learning Example Source: https://jwcalder.github.io/GraphLearning/active_learning.html Demonstrates how to use the uncertainty sampling active learning algorithm. It involves initializing the active learner, selecting queries, simulating human labeling, and updating the labeled set iteratively. Useful for scenarios where model uncertainty is a key indicator of informative samples. ```python import graphlearning.active_learning as al import graphlearning as gl import numpy as np import matplotlib.pyplot as plt import sklearn.datasets as datasets X,labels = datasets.make_moons(n_samples=500,noise=0.1) W = gl.weightmatrix.knn(X,10) train_ind = gl.trainsets.generate(labels, rate=5) plt.scatter(X[:,0],X[:,1], c=labels) plt.scatter(X[train_ind,0],X[train_ind,1], c='r') plt.show() model = gl.ssl.laplace(W) AL = gl.active_learning.active_learner(model, gl.active_learning.unc_sampling, train_ind, y[train_ind]) for i in range(10): query_points = AL.select_queries() # return this iteration's newly chosen points query_labels = y[query_points] # simulate the human in the loop process AL.update(query_points, query_labels) # update the active_learning object's labeled set # plot plt.scatter(X[:,0],X[:,1], c=y) plt.scatter(X[AL.labeled_ind,0],X[AL.labeled_ind,1], c='r') plt.scatter(X[query_points,0],X[query_points,1], c='r', marker='*', s=200, edgecolors='k', linewidths=1) plt.show() print(AL.labeled_ind) print(AL.labels) ``` -------------------------------- ### Get Node Neighbors Source: https://jwcalder.github.io/GraphLearning/graph.html Retrieves the neighbors of a specified node. Optionally returns the weights of these neighbors. ```python def neighbors(self, i, return_weights=False): """Neighbors ======= Returns neighbors of node i. ## Parameters **`i`**:`int ` Index of vertex to return neighbors of. **`return_weights`**:`bool (optional)` , default=`False` Whether to return the weights of neighbors as well. """ ``` -------------------------------- ### VAE Initialization and Training Execution Source: https://jwcalder.github.io/GraphLearning/weightmatrix.html Sets up the VAE model, optimizer, and data loaders, then executes the training loop for a specified number of epochs. Includes GPU configuration and data preprocessing. Requires `torch`, `optim`, `DataLoader`, `MyDataset`, `VAE`, `device`, `kwargs`, `batch_size`, `learning_rate`, and `epochs` to be defined. ```python from torch import nn, optim from torch.nn import functional as F from torchvision import datasets, transforms from torchvision.utils import save_image from torch.utils.data import DataLoader import numpy as np import torch # Assuming layer_widths, no_cuda, batch_size, learning_rate, epochs are defined elsewhere # Assuming data and target are numpy arrays or similar structures layer_widths = [data.shape[1]] + layer_widths log_interval = 10 #how many batches to wait before logging training status #GPU settings cuda = not no_cuda and torch.cuda.is_available() device = torch.device("cuda" if cuda else "cpu") kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {} #Convert to torch dataloaders data = data - data.min() data = data/data.max() data = torch.from_numpy(data).float() target = np.zeros((data.shape[0],)).astype(int) target = torch.from_numpy(target).long() dataset = MyDataset(data, target) data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, **kwargs) #Put model on GPU and set up optimizer model = VAE(layer_widths).to(device) optimizer = optim.Adam(model.parameters(), lr=learning_rate) #Training epochs for epoch in range(1, epochs + 1): train(epoch) ``` -------------------------------- ### Fokker-Planck Clustering Example Source: https://jwcalder.github.io/GraphLearning/clustering.html Demonstrates Fokker-Planck clustering on the two-skies dataset. Requires graphlearning, numpy, and matplotlib. ```python import numpy as np import graphlearning as gl import matplotlib.pyplot as plt X,L = gl.datasets.two_skies(1000) W = gl.weightmatrix.knn(X,10) knn_ind,knn_dist = gl.weightmatrix.knnsearch(X,50) rho = 1/np.max(knn_dist,axis=1) model = gl.clustering.fokker_planck(W,num_clusters=2,t=1000,beta=0.5,rho=rho) labels = model.fit_predict() plt.scatter(X[:,0],X[:,1], c=labels) plt.show() ``` -------------------------------- ### Initialize SSL Class Priors Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the SSL Class Priors with various parameters controlling reweighting, normalization, mean shift, and solver tolerance. Sets up the accuracy filename based on selected options. ```python def __init__(self, W, class_priors, reweighting='none', normalization='combinatorial', mean_shift=False, tol=1e-5, order=1, X=None, tau=0): """Class Priors Whether to shift output to mean zero. tol : float (optional), default=1e-5 Tolerance for conjugate gradient solver. alpha : float (optional), default=2 Parameter for `properly` reweighting. zeta : float (optional), default=1e7 Parameter for `properly` reweighting. r : float (optional), default=0.1 Radius for `properly` reweighting. References --------- [1] X. Zhu, Z. Ghahramani, and J. D. Lafferty. [Semi-supervised learning using gaussian fields and harmonic functions.](https://www.aaai.org/Papers/ICML/2003/ICML03-118.pdf) Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003. [2] J. Calder, B. Cook, M. Thorpe, D. Slepcev. [Poisson Learning: Graph Based Semi-Supervised Learning at Very Low Label Rates.](http://proceedings.mlr.press/v119/calder20a.html), Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1306-1316, 2020. [3] Z. Shi, S. Osher, and W. Zhu. [Weighted nonlocal laplacian on interpolation from sparse data.](https://idp.springer.com/authorize/casa?redirect_uri=https://link.springer.com/article/10.1007/s10915-017-0421-z&casa_token=33Z7gqJy3mMAAAAA:iMO0pGmpn_qf5PioVIGocSRq_p4CDm-KNOQhgIC1uvqG9pWlZ6t7I-IZtSJfocFDEHCdMpK8j7Fx1XbzDQ) Journal of Scientific Computing 73.2 (2017): 1164-1177. [4] J. Calder, D. Slepčev. [Properly-weighted graph Laplacian for semi-supervised learning.](https://link.springer.com/article/10.1007/s00245-019-09637-3) Applied mathematics & optimization (2019): 1-49. """ super().__init__(W, class_priors) self.reweighting = reweighting self.normalization = normalization self.mean_shift = mean_shift self.tol = tol self.order = order self.X = X #Set up tau if type(tau) in [float,int]: self.tau = np.ones(self.graph.num_nodes)*tau elif type(tau) is np.ndarray: self.tau = tau #Setup accuracy filename fname = '_laplace' self.name = 'Laplace Learning' if self.reweighting != 'none': fname += '_' + self.reweighting self.name += ': ' + self.reweighting + ' reweighted' if self.normalization != 'combinatorial': fname += '_' + self.normalization self.name += ' ' + self.normalization if self.mean_shift: fname += '_meanshift' self.name += ' with meanshift' if self.order > 1: fname += '_order%d'%int(self.order) self.name += ' order %d'%int(self.order) if np.max(self.tau) > 0: fname += '_tau_%.3f'%np.max(self.tau) self.name += ' tau=%.3f'%np.max(self.tau) self.accuracy_filename = fname ``` -------------------------------- ### Get Neighbors of a Node Source: https://jwcalder.github.io/GraphLearning/graph.html Retrieves the indices of a node's neighbors. Optionally returns edge weights if `return_weights` is True. ```python def neighbors(self, i, return_weights=False): """Neighbors ====== Returns neighbors of node i. Parameters ---------- i : int Index of vertex to return neighbors of. return_weights : bool (optional), default=False Whether to return the weights of neighbors as well. Returns ------- N : numpy array, int Array of nearest neighbor indices. W : numpy array, float Weights of edges to neighbors. """ N = self.weight_matrix[i,:].nonzero()[1] N = N[N != i] if return_weights: return N, self.weight_matrix[i,N].toarray().flatten() else: return N ``` -------------------------------- ### Initialize and Run Active Learner Source: https://jwcalder.github.io/GraphLearning/active_learning.html Demonstrates initializing an active learner with a pre-computed covariance matrix and then iteratively selecting and updating with queried points. Useful for active learning on graphs where initial graph properties are known. ```python import graphlearning.active_learning as al import graphlearning as gl import numpy as np import matplotlib.pyplot as plt import sklearn.datasets as datasets X,labels = datasets.make_moons(n_samples=500,noise=0.1) W = gl.weightmatrix.knn(X,10) train_ind = gl.trainsets.generate(labels, rate=5) plt.scatter(X[:,0],X[:,1], c=labels) plt.scatter(X[train_ind,0],X[train_ind,1], c='r') plt.show() # compute initial, low-rank (spectral truncation) covariance matrix evals, evecs = model.graph.eigen_decomp(normalization='normalized', k=50) C = np.diag(1. / (evals + 1e-11)) AL = gl.active_learning.active_learner(model, gl.active_learning.model_change_var_opt, train_ind, y[train_ind], C=C.copy(), V=evecs.copy()) for i in range(10): query_points = AL.select_queries() # return this iteration's newly chosen points query_labels = y[query_points] # simulate the human in the loop process AL.update(query_points, query_labels) # update the active_learning object's labeled set # plot plt.scatter(X[:,0],X[:,1], c=y) plt.scatter(X[AL.labeled_ind,0],X[AL.labeled_ind,1], c='r') plt.scatter(X[query_points,0],X[query_points,1], c='r', marker='*', s=200, edgecolors='k', linewidths=1) plt.show() print(AL.labeled_ind) print(AL.labels) ``` -------------------------------- ### Get Neighbors of a Node Source: https://jwcalder.github.io/GraphLearning/graph.html Retrieves the indices of a node's neighbors. Optionally, it can also return the weights of the edges connecting to these neighbors. ```python def neighbors(self, i, return_weights=False): """Neighbors ====== Returns neighbors of node i. Parameters ---------- i : int Index of vertex to return neighbors of. return_weights : bool (optional), default=False Whether to return the weights of neighbors as well. Returns ------- N : numpy array, int Array of nearest neighbor indices. W : numpy array, float Weights of edges to neighbors. """ N = self.weight_matrix[i,:].nonzero()[1] N = N[N != i] if return_weights: return N, self.weight_matrix[i,N].toarray().flatten() else: return N ``` -------------------------------- ### Get SSL Accuracy Filename Source: https://jwcalder.github.io/GraphLearning/ssl.html Constructs and returns the filename for storing SSL accuracy results, appending '_classpriors' if class priors are used. ```python def get_accuracy_filename(self): """Get accuracy filename ======== Returns name of the file that will store the accuracy results for `ssl_trials.py`. Returns ------- fname : str Accuracy filename. """ fname = self.accuracy_filename if self.class_priors is not None: fname += '_classpriors' fname += '_accuracy.csv' return fname ``` -------------------------------- ### poisson.__init__ Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Poisson learning model, which solves the Poisson equation for semi-supervised learning. ```APIDOC ## poisson.__init__ ### Description Initializes the Poisson learning model, which solves the Poisson equation for semi-supervised learning. ### Parameters #### Path Parameters - **W** (optional) - The graph weight matrix. - **class_priors** (optional) - Prior probabilities for each class. - **solver** (str, optional) - The solver to use (default: 'conjugate_gradient'). - **p** (int, optional) - The power parameter for the graph Laplacian (default: 1). - **use_cuda** (bool, optional) - Whether to use CUDA for computation (default: False). - **min_iter** (int, optional) - Minimum number of iterations (default: 50). - **max_iter** (int, optional) - Maximum number of iterations (default: 1000). - **tol** (float, optional) - Tolerance for convergence (default: 1e-3). - **spectral_cutoff** (int, optional) - Spectral cutoff for the graph Laplacian (default: 10). ``` -------------------------------- ### Initialize Centered Kernel SSL Algorithm Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Centered Kernel SSL algorithm. Parameters include graph weight matrix, class priors, solver tolerance, and number of power iterations for eigenvalue calculation. ```python class centered_kernel (W=None, class_priors=None, tol=1e-10, power_it=100, alpha=1.05) """# Centered Kernel Method Semi-supervised learning via the centered kernel method of [1]. ## Parameters **`W`**:`numpy array, scipy sparse matrix,` or `graphlearning graph object (optional)`, default=`None` Weight matrix representing the graph. **`class_priors`**:`numpy array (optional)` , default=`None` Class priors (fraction of data belonging to each class). If provided, the predict function will attempt to automatic balance the label predictions to predict the correct number of nodes in each class. **`tol`**:`float (optional)` , default=`1e-10` Tolerance to solve equation. **`power_it`**:`int (optional)` , default=`100` Number of power iterations to find largest eigenvalue. **`alpha`**:`float (optional)` , default = 1.05 Value of αα as a fraction of largest eigenvalue. """ ``` -------------------------------- ### INCRES Clustering Source: https://jwcalder.github.io/GraphLearning/clustering.html Implements the INCRES clustering algorithm. This example uses the MNIST dataset and requires graphlearning. The 'vae' metric for knn is used. ```python import graphlearning as gl W = gl.weightmatrix.knn('mnist', 10, metric='vae') labels = gl.datasets.load('mnist', labels_only=True) model = gl.clustering.incres(W, num_clusters=10) pred_labels = model.fit_predict(all_labels=labels) accuracy = gl.clustering.clustering_accuracy(pred_labels,labels) print('Clustering Accuracy: %.2f%%'%accuracy) ``` -------------------------------- ### INCRES Clustering MNIST Example Source: https://jwcalder.github.io/GraphLearning/clustering.html Applies INCRES clustering to the MNIST dataset. Requires graphlearning and uses VAE metric for KNN graph. ```python import graphlearning as gl W = gl.weightmatrix.knn('mnist', 10, metric='vae') labels = gl.datasets.load('mnist', labels_only=True) model = gl.clustering.incres(W, num_clusters=10) pred_labels = model.fit_predict(all_labels=labels) accuracy = gl.clustering.clustering_accuracy(pred_labels,labels) ``` -------------------------------- ### Convert Scipy Sparse to Torch Sparse Source: https://jwcalder.github.io/GraphLearning/utils.html Converts a SciPy sparse matrix to a PyTorch sparse tensor. Requires the PyTorch library to be installed. ```python def torch_sparse(A): """Torch sparse matrix, from scipy sparse ====== Converts a scipy sparse matrix into a torch sparse matrix. Parameters ---------- A : (n,n) scipy sparse matrix Matrix to convert to torch sparse Returns ------- A_torch : (n,n) torch.sparse.FloatTensor Sparse matrix in torch form. """ import torch A = A.tocoo() values = A.data indices = np.vstack((A.row, A.col)) i = torch.LongTensor(indices) v = torch.FloatTensor(values) shape = A.shape A_torch = torch.sparse.FloatTensor(i, v, torch.Size(shape)) return A_torch ``` -------------------------------- ### Initialize and Run Sigma Optimization Active Learner Source: https://jwcalder.github.io/GraphLearning/active_learning.html Demonstrates initializing the Sigma Optimization active learner with a pre-computed covariance matrix and eigenvectors, then iteratively selecting and updating labeled data. ```python import graphlearning.active_learning as al import graphlearning as gl import numpy as np import matplotlib.pyplot as plt import sklearn.datasets as datasets X,labels = datasets.make_moons(n_samples=500,noise=0.1) W = gl.weightmatrix.knn(X,10) train_ind = gl.trainsets.generate(labels, rate=5) plt.scatter(X[:,0],X[:,1], c=labels) plt.scatter(X[train_ind,0],X[train_ind,1], c='r') plt.show() # compute initial, low-rank (spectral truncation) covariance matrix evals, evecs = model.graph.eigen_decomp(normalization='normalized', k=50) C = np.diag(1. / (evals + 1e-11)) AL = gl.active_learning.active_learner(model, gl.active_learning.sigma_opt, train_ind, y[train_ind], C=C.copy(), V=evecs.copy()) for i in range(10): query_points = AL.select_queries() # return this iteration's newly chosen points query_labels = y[query_points] # simulate the human in the loop process AL.update(query_points, query_labels) # update the active_learning object's labeled set # plot plt.scatter(X[:,0],X[:,1], c=y) plt.scatter(X[AL.labeled_ind,0],X[AL.labeled_ind,1], c='r') plt.scatter(X[query_points,0],X[query_points,1], c='r', marker='*', s=200, edgecolors='k', linewidths=1) plt.show() print(AL.labeled_ind) print(AL.labels) ``` -------------------------------- ### Get Neighbors of a Node Source: https://jwcalder.github.io/GraphLearning/graph.html Retrieves the indices of a node's neighbors. Optionally, it can also return the weights of the edges connecting to these neighbors. Self-loops are excluded. ```python def neighbors(self, i, return_weights=False): """Neighbors ====== Returns neighbors of node i. Parameters ---------- i : int Index of vertex to return neighbors of. return_weights : bool (optional), default=False Whether to return the weights of neighbors as well. Returns ------- N : numpy array, int Array of nearest neighbor indices. W : numpy array, float Weights of edges to neighbors. """ N = self.weight_matrix[i,:].nonzero()[1] N = N[N != i] if return_weights: return N, self.weight_matrix[i,N].toarray().flatten() else: return N ``` -------------------------------- ### Initialize Tau and Filename Source: https://jwcalder.github.io/GraphLearning/ssl.html Sets up the tau parameter and constructs an accuracy filename based on various graph learning configurations. ```python #Set up tau if type(tau) in [float,int]: self.tau = np.ones(self.graph.num_nodes)*tau elif type(tau) is np.ndarray: self.tau = tau #Setup accuracy filename fname = '_laplace' self.name = 'Laplace Learning' if self.reweighting != 'none': fname += '_' + self.reweighting self.name += ': ' + self.reweighting + ' reweighted' if self.normalization != 'combinatorial': fname += '_' + self.normalization self.name += ' ' + self.normalization if self.mean_shift: fname += '_meanshift' self.name += ' with meanshift' if self.order > 1: fname += '_order%d'%int(self.order) self.name += ' order %d'%int(self.order) if np.max(self.tau) > 0: fname += '_tau_%.3f'%np.max(self.tau) self.name += ' tau=%.3f'%np.max(self.tau) self.accuracy_filename = fname ``` -------------------------------- ### Multiclass MBO Initialization Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Multiclass MBO algorithm with graph, class priors, and various learning parameters. ```python class multiclass_mbo(ssl): def __init__(self, W=None, class_priors=None, Ns=6, T=10, dt=0.15, mu=50, num_eig=50): """Multiclass MBO =================== Semi-supervised learning via the Multiclass MBO method [1]. Parameters ---------- W : numpy array, scipy sparse matrix, or graphlearning graph object (optional), default=None Weight matrix representing the graph. class_priors : numpy array (optional), default=None Class priors (fraction of data belonging to each class). If provided, the predict function will attempt to automatic balance the label predictions to predict the correct number of nodes in each class. Ns : int (optional), default=6 Number of inner iterations. T : int (optional), default=10 Number of outer iterations. dt : float (optional), default=0.15 Time step. mu : float (optional), default=50 Fidelity penalty. num_eig : int (optional), default=300 Number of eigenvectors. References --------- [1] C. Garcia-Cardona, E. Merkurjev, A.L. Bertozzi, A. Flenner, and A.G. Percus. [Multiclass data segmentation using diffuse interface methods on graphs.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.743.9516&rep=rep1&type=pdf) IEEE transactions on pattern analysis and machine intelligence, 36(8), 1600-1613, 2014. """ super().__init__(W, class_priors) self.Ns = Ns self.T = T self.dt = dt self.mu = mu self.num_eig = num_eig self.requires_eig = True #Setup accuracy filename self.accuracy_filename = '_multiclass_mbo_Ns_%d_T_%d_dt_%.3f_mu_%.2f'%(Ns,T,dt,mu) self.name = 'Multiclass MBO' ``` -------------------------------- ### Volume MBO Initialization Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Volume MBO model. Requires class priors and optionally sets temperature and volume constraint. ```python if class_priors is None: sys.exit("Class priors must be provided for Volume MBO.") self.class_counts = (self.graph.num_nodes*class_priors).astype(int) self.temperature = temperature self.volume_constraint = volume_constraint ``` -------------------------------- ### Get accuracy filename for SSL trials Source: https://jwcalder.github.io/GraphLearning/ssl.html Constructs and returns the filename for storing accuracy results generated by `ssl_trials.py`. Appends '_classpriors' to the filename if class priors are used. ```python def get_accuracy_filename(self): """Get accuracy filename ======== Returns name of the file that will store the accuracy results for `ssl_trials.py`. Returns ------- fname : str Accuracy filename. """ fname = self.accuracy_filename if self.class_priors is not None: fname += '_classpriors' fname += '_accuracy.csv' return fname ``` -------------------------------- ### Poisson Learning with Gradient Descent Source: https://jwcalder.github.io/GraphLearning/ssl.html Demonstrates semi-supervised learning using the Poisson model with the gradient descent solver. Requires graphlearning, numpy, and scikit-learn. The example visualizes accuracy and training points. ```python import numpy as np import graphlearning as gl import matplotlib.pyplot as plt import sklearn.datasets as datasets X,labels = datasets.make_moons(n_samples=500,noise=0.1) W = gl.weightmatrix.knn(X,10,symmetrize=False) train_ind = gl.trainsets.generate(labels, rate=5) train_labels = labels[train_ind] model = gl.ssl.poisson(W, solver='gradient_descent') pred_labels = model.fit_predict(train_ind, train_labels) accuracy = gl.ssl.ssl_accuracy(pred_labels, labels, train_ind) print("Accuracy: %.2f%%"%accuracy) plt.scatter(X[:,0],X[:,1], c=pred_labels) plt.scatter(X[train_ind,0],X[train_ind,1], c='r') plt.show() ``` -------------------------------- ### Initialize Volume MBO Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Volume MBO model. Requires class priors to be provided. Sets up graph, temperature, and volume constraint parameters. ```python class volume_mbo(ssl): def __init__(self, W=None, class_priors=None, temperature=0.1, volume_constraint=0.5): """Volume MBO =================== Semi-supervised learning with the VolumeMBO method [1]. class_priors must be provided. Parameters ---------- W : numpy array, scipy sparse matrix, or graphlearning graph object (optional), default=None Weight matrix representing the graph. class_priors : numpy array Class priors (fraction of data belonging to each class). temperature : float (optional), default=0.1 Temperature for volume constrained MBO. volume_constraint : float (optional), default=0.5 The number of points in each class is constrained to be a mulitple \(\lambda\) of the true class size, where \[ \text{volume_constraint} \leq \lambda \leq 2-\text{volume_constraint}.\] Setting `volume_constraint=1` yields the tightest constraint. References ---------- [1] M. Jacobs, E. Merkurjev, and S. Esedoḡlu. [Auction dynamics: A volume constrained MBO scheme.](https://www.sciencedirect.com/science/article/pii/S0021999117308033?casa_token=kNahPd2vu50AAAAA:uJQYQVnmMBV_oL0CG1UcOIulY4vhclMGTztm-jjAzy9Lns7rtoOnKs4iyvLOjKXaHU-D6qrQJT4) Journal of Computational Physics 354:288-310, 2018. """ super().__init__(W, None) if class_priors is None: sys.exit("Class priors must be provided for Volume MBO.") self.class_counts = (self.graph.num_nodes*class_priors).astype(int) self.temperature = temperature self.volume_constraint = volume_constraint #Setup accuracy filename and model name self.accuracy_filename = '_volume_mbo_temp_%.2f_vol_%.2f'%(temperature,volume_constraint) self.name = 'Volume MBO (T=%.2f, V=%.2f)'%(temperature,volume_constraint) ``` -------------------------------- ### Initialize Modularity MBO Class Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Modularity MBO class with graph parameters and learning settings. Sets up the accuracy filename and model name. ```python class modularity_mbo(ssl): def __init__(self, W=None, class_priors=None, gamma=0.5, epsilon=1, lamb=1, T=20, Ns=5): """Modularity MBO =================== Semi-supervised learning via the Modularity MBO method [1]. Parameters ---------- W : numpy array, scipy sparse matrix, or graphlearning graph object (optional), default=None Weight matrix representing the graph. class_priors : numpy array (optional), default=None Class priors (fraction of data belonging to each class). If provided, the predict function will attempt to automatic balance the label predictions to predict the correct number of nodes in each class. gamma : float (optional), default=0.5 Parameter in algorithm. epsilon : float (optional), default=1 Parameter in algorithm. lamb : float (optional), default=1 Parameter in algorithm. T : int (optional), default=20 Number of outer iterations. Ns : int (optional), default=5 Number of inner iterations. References --------- [1] Z.M. Boyd, E. Bae, X.C. Tai, and A.L. Bertozzi. [Simplified energy landscape for modularity using total variation.](https://publications.ffi.no/nb/item/asset/dspace:4288/1619750.pdf) SIAM Journal on Applied Mathematics, 78(5), 2439-2464, 2018. """ super().__init__(W, class_priors) self.gamma = gamma self.epsilon = epsilon self.lamb = lamb self.requires_eig = True self.T = T self.Ns = Ns #Setup accuracy filename self.accuracy_filename = '_modularity_mbo_gamma_%.2f_epsilon_%.2f_lamb_%.2f'%(gamma,epsilon,lamb) self.name = 'Modularity MBO' ``` -------------------------------- ### Model Change Variance Optimization Initialization Source: https://jwcalder.github.io/GraphLearning/active_learning.html Initializes the model_change_var_opt class, which combines Model Change and Variance Optimization for active learning. This setup is suitable for scenarios where both model alteration and uncertainty reduction are desired. ```python import graphlearning.active_learning as al import graphlearning as gl import numpy as np import matplotlib.pyplot as plt import sklearn.datasets as datasets X,labels = datasets.make_moons(n_samples=500,noise=0.1) W = gl.weightmatrix.knn(X,10) train_ind = gl.trainsets.generate(labels, rate=5) plt.scatter(X[:,0],X[:,1], c=labels) plt.scatter(X[train_ind,0],X[train_ind,1], c='r') plt.show() ``` -------------------------------- ### Poisson Learning Initialization Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Poisson learning model with specified parameters. Supports different solvers and GPU acceleration for gradient descent. Note that if p is not 1, the solver defaults to 'spectral'. ```python super().__init__(W, class_priors) if solver not in ['conjugate_gradient', 'spectral', 'gradient_descent']: sys.exit("Invalid Poisson solver") self.solver = solver self.p = p if p != 1: self.solver = 'spectral' self.use_cuda = use_cuda self.min_iter = min_iter self.max_iter = max_iter self.tol = tol self.spectral_cutoff = spectral_cutoff #Setup accuracy filename fname = '_poisson' if self.p != 1: fname += '_p%.2f'%p if self.solver == 'spectral': fname += '_N%d'%self.spectral_cutoff self.requries_eig = True self.accuracy_filename = fname #Setup Algorithm name self.name = 'Poisson Learning' ``` -------------------------------- ### Active Learning Workflow with Graph Learning Source: https://jwcalder.github.io/GraphLearning/active_learning.html Demonstrates the common workflow for active learning using the graphlearning library. This includes defining a model and acquisition function, instantiating the active learner, selecting query points, acquiring labels, and updating the learner. ```python import graphlearning as gl # define ssl model and acquisition function model = gl.ssl.laplace(W) # graph-based ssl classifier with a given graph acq_func = gl.active_learning.unc_sampling # acquisition function for prioritizing which points to query # instantiate active learner object AL = gl.active_learner( model=model, acq_function=acq_func, labeled_ind=.., # indices of initially labeled nodes labels=, # (integer) labels for initially labeled nodes policy='max', # active learning policy (i.e., 'max', or 'prop') **kwargs=... # other keyword arguments for the specified acq_function ) # select next query points query_point = AL.select_queries( batch_size=1 # number of query points to select at this iteration ) # acquire label for query points query_labels = y[query_point] # update the labeled data of active_learner object (including the graph-based ssl ``model`` outputs) AL.update(query_points, query_labels) ``` -------------------------------- ### ARS Initialization and Execution Source: https://jwcalder.github.io/GraphLearning/graph.html Handles initialization for the ARS algorithm, including PCA-based initialization or random initialization, and then calls the C extension for the main ARS computation. This snippet is part of a larger function and assumes prior setup. ```python #Import c extensions from . import cextensions if use_pca and (X.shape[1] > init_dim): X = X - np.mean(X, axis=0) vals, Q = sparse.linalg.eigsh(X.T@X, k=init_dim, which='LM') X = X@Q #Type casting and memory blocking X = np.ascontiguousarray(X,dtype=np.float64) #Y = np.zeros((X.shape[0],dim),dtype=float) #Y = X[:,-dim:].copy() #Y -= np.mean(Y,axis=0) #sig = np.sqrt(np.sum(Y*Y,axis=0)) #Y = Y/sig #print(np.mean(Y,axis=0)) #print(np.sum(Y*Y,axis=0)) #Y = Y*0.001 if type(init) == np.ndarray: skip_random_init = True Y = init.copy() Y = np.ascontiguousarray(Y,dtype=np.float64) elif init == 'pca': skip_random_init = True pca = PCA(n_components=dim,svd_solver="randomized") pca.set_output(transform="default") Y = pca.fit_transform(X).astype(np.float32, copy=False) Y = Y / np.std(Y[:, 0]) * 1e-4 Y = np.ascontiguousarray(Y,dtype=np.float64) else: skip_random_init = False Y = np.zeros((X.shape[0],dim)) Y = np.ascontiguousarray(Y,dtype=np.float64) cextensions.ars(X,Y,dim,perplexity,kappa,iters,time_step,theta1,theta2,alpha,num_early,int(prog),int(skip_random_init),int(dump)) return Y ``` -------------------------------- ### Modularity MBO Initialization Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the Modularity MBO algorithm with graph, class priors, and specific algorithm parameters. ```python class modularity_mbo(ssl): def __init__(self, W=None, class_priors=None, gamma=0.5, epsilon=1, lamb=1, T=20, Ns=5): """Modularity MBO =================== Semi-supervised learning via the Modularity MBO method [1]. Parameters ---------- W : numpy array, scipy sparse matrix, or graphlearning graph object (optional), default=None Weight matrix representing the graph. class_priors : numpy array (optional), default=None Class priors (fraction of data belonging to each class). If provided, the predict function will attempt to automatic balance the label predictions to predict the correct number of nodes in each class. gamma : float (optional), default=0.5 Parameter in algorithm. epsilon : float (optional), default=1 Parameter in algorithm. """ super().__init__(W, class_priors) self.gamma = gamma self.epsilon = epsilon self.lamb = lamb self.T = T self.Ns = Ns self.requires_eig = True #Setup accuracy filename self.accuracy_filename = '_modularity_mbo_gamma_%.2f_epsilon_%.2f_lamb_%.2f_T_%d_Ns_%d'%(gamma,epsilon,lamb,T,Ns) self.name = 'Modularity MBO' ``` -------------------------------- ### Load Accuracy Scores from CSV Source: https://jwcalder.github.io/GraphLearning/ssl.html Loads accuracy scores from CSV files generated by `ssl_trials`. It calculates and returns summary statistics (mean and standard deviation of accuracy) across multiple trials for different numbers of training examples. ```python def load_accuracy(self, tag=''): """Loads accuracy scores from each trial from csv files created by `ssl_trials` and returns summary statistics (mean and standard deviation of accuracy). Parameters ---------- tag : str (optional), default='' An extra identifying tag to add to the accuracy filename. Returns ------- num_train : numpy array Number of training examples in each label rate experiment. acc_mean : numpy array Mean accuracy over all trials in each experiment. acc_stddev : numpy array Standard deviation of accuracy over all trials in each experiment. num_trials : int Number of trials for each label rate. """ accuracy_filename = os.path.join(results_dir, tag+self.get_accuracy_filename()) X = utils.csvread(accuracy_filename) num_train = np.unique(X[:,0]) acc_mean = [] acc_stddev = [] for n in num_train: Y = X[X[:,0]==n,1:] acc_mean += [np.mean(Y,axis=0)] acc_stddev += [np.std(Y,axis=0)] num_trials = int(len(X[:,0])/len(num_train)) acc_mean = np.array(acc_mean) acc_stddev = np.array(acc_stddev) return num_train, acc_mean, acc_stddev, num_trials ``` -------------------------------- ### Initialize AMLE SSL Algorithm Source: https://jwcalder.github.io/GraphLearning/ssl.html Initializes the AMLE SSL algorithm with various parameters controlling the solver and output. The `weighted` parameter affects solver speed by determining if the graph is converted to a 0/1 adjacency matrix. ```python class amle(ssl): def __init__(self, W=None, class_priors=None, tol=1e-3, max_num_it=1e5, weighted=False, prog=False): """AMLE learning =================== Semi-supervised learning by the Absolutely Minimal Lipschitz Extension (AMLE). This is the same as p-Laplace with \(p=\infty\), except that AMLE has an option `weighted=False` that significantly accelerates the solver. Parameters ---------- W : numpy array, scipy sparse matrix, or graphlearning graph object (optional), default=None (n,n) Weight matrix representing the graph. class_priors : numpy array (optional), default=None Class priors (fraction of data belonging to each class). If provided, the predict function will attempt to automatic balance the label predictions to predict the correct number of nodes in each class. tol : float (optional), default=1e-3 Tolerance with which to solve the equation. max_num_it : int (optional), default=1e5 Maximum number of iterations weighted : bool (optional), default=False When False, the graph is converted to a 0/1 adjacency matrix, which affords a much faster solver. prog : bool (optional), default=False Whether to print progress information. """ super().__init__(W, class_priors) self.tol = tol self.max_num_it = max_num_it self.weighted = weighted self.prog = prog self.onevsrest = True #Setup accuracy filename and model name self.accuracy_filename = '_amle' if not self.weighted: self.accuracy_filename += '_unweighted' self.name = 'AMLE' ```