### Quickstart: Run Model-X Knockoffs with Lasso

Source: https://amspector100.github.io/knockpy/installation.html

A quick example demonstrating how to use knockpy to perform variable selection with cross-validated lasso. This snippet generates synthetic data and applies the KnockoffFilter.

```python
import knockpy as kpy
from knockpy.knockoff_filter import KnockoffFilter

# Generate synthetic data from a Gaussian linear model
data_gen_process = kpy.dgp.DGP()
data_gen_process.sample_data(
        n=1500, # Number of datapoints
        p=500, # Dimensionality
        sparsity=0.1,
        x_dist='gaussian',
)
X = data_gen_process.X
y = data_gen_process.y
Sigma=data_gen_process.Sigma

# Run model-X knockoffs
kfilter = KnockoffFilter(
        fstat='lasso',
        ksampler='gaussian',
)
rejections = kfilter.forward(X=X, y=y, Sigma=Sigma)

```

--------------------------------

### Install choldate

Source: https://amspector100.github.io/knockpy/_sources/installation.rst.txt

Install the choldate library from GitHub using pip. This is a prerequisite for installing knockpy.

```bash
pip install git+git://github.com/jcrudy/choldate.git
```

--------------------------------

### Install scikit-dsdp for SDP knockoffs

Source: https://amspector100.github.io/knockpy/installation.html

Install scikit-dsdp to speed up computation for SDP knockoffs. This package can be challenging to install on non-Linux environments.

```bash
pip install scikit-dsdp
```

--------------------------------

### Install Cython using pip

Source: https://amspector100.github.io/knockpy/_sources/installation.rst.txt

Install or upgrade Cython using pip. This is a prerequisite for speeding up certain knockoff computations.

```bash
pip install cython>=0.29.14
```

--------------------------------

### Install lightweight knockpy

Source: https://amspector100.github.io/knockpy/_sources/installation.rst.txt

Install a lightweight version of knockpy that includes most functionality but may be slower for certain algorithms. This is recommended if the full installation fails.

```bash
pip install knockpy
```

--------------------------------

### Quickstart: Variable Selection with Knockpy

Source: https://amspector100.github.io/knockpy/_sources/installation.rst.txt

Demonstrates using knockpy with a cross-validated lasso feature statistic and Gaussian knockoffs for variable selection. Requires synthetic data generation.

```python
import knockpy as kpy
from knockpy.knockoff_filter import KnockoffFilter

# Generate synthetic data from a Gaussian linear model
data_gen_process = kpy.dgp.DGP()
data_gen_process.sample_data(
	n=1500, # Number of datapoints
	p=500, # Dimensionality
	sparsity=0.1,
	x_dist='gaussian',
)
X = data_gen_process.X
y = data_gen_process.y
Sigma=data_gen_process.Sigma

# Run model-X knockoffs
kfilter = KnockoffFilter(
	fstat='lasso',
	k المed='gaussian',
)
rejections = kfilter.forward(X=X, y=y, Sigma=Sigma)
```

--------------------------------

### Install knockpy with fast dependencies

Source: https://amspector100.github.io/knockpy/_sources/installation.rst.txt

Install the main knockpy package with the 'fast' extra, which includes optimized dependencies for faster computation.

```bash
pip install knockpy[fast]
```

--------------------------------

### Instantiate and Run KnockoffGGM

Source: https://amspector100.github.io/knockpy/_modules/knockpy/ggm.html

Example of fitting KnockoffGGM under the global null for a Gaussian graphical model with no edges. Requires fake data generation and specifies the LCD statistic with FX knockoffs.

```python
import numpy as np
from knockpy.ggm import KnockoffGGM

# Fake data-generating process for Gaussian graphical model
X = np.random.randn(300, 30)

# LCD statistic with FX knockoffs
gkf = KnockoffGGM(
    fstat='lcd',
    knockoff_kwargs={"method":"mvr"},
)
edges = gkf.forward(X=X, verbose=True)
```

--------------------------------

### Install Cython using conda

Source: https://amspector100.github.io/knockpy/_sources/installation.rst.txt

Install Cython using the conda package manager. This is an alternative if pip installation fails, as conda includes a C compiler.

```bash
conda install cython
```

--------------------------------

### Check PyTorch Availability

Source: https://amspector100.github.io/knockpy/_modules/knockpy/utilities.html

Verifies if PyTorch is installed and raises a ValueError with installation instructions if it's not available.

```python
def check_kpytorch_available(purpose):
    try:
        import torch as torch
    except ImportError as err:
        raise ValueError(
            f"Pytorch is required for {purpose}, but importing torch raised {err}. See https://pytorch.org/get-started/."
        )

```

--------------------------------

### Run KnockoffFilter with Model-X Approach

Source: https://amspector100.github.io/knockpy/usage.html

Applies the KnockoffFilter to the dataset using the forward method. This example demonstrates the model-X approach, requiring the covariance matrix (Sigma) as input, and specifies the desired false discovery rate (fdr).

```python
# Flags of whether each feature was rejected
rejections = kfilter.forward(
    X=X,
    y=y,
    Sigma=Sigma,
    fdr=0.1,  # desired level of false discovery rate control
)
```

--------------------------------

### Data Generation for AR1 Covariance Matrix Example

Source: https://amspector100.github.io/knockpy/mrcknock.html

This snippet sets up a data generation process for features with an AR1 covariance matrix, a scenario where SDP knockoffs are also expected to perform poorly. It samples data for a sparse Gaussian linear model.

```python
# Generate data where X is Gaussian with an AR1 covariance matrix
# where Cov(Xj, X_{j+1}) is sampled from Beta(a,b)
# and y | X is a sparse Gaussian linear model
np.random.seed(111)
data_gen_process = kpy.dgp.DGP()
data_gen_process.sample_data(
    method="ar1",
    a=3,
    b=1,
    n=650,  # number of data-points
    p=500,  # dimensionality
    sparsity=0.1,  # number of non-null coefficients
    coeff_dist="uniform",  # distribution of size of non-null coefficients
    coeff_size=1,  # size of non-null coefficients
    corr_signals=True,  # non-nulls features are correlated
)
X = data_gen_process.X
y = data_gen_process.y
beta = data_gen_process.beta
Sigma = data_gen_process.Sigma
```

--------------------------------

### Instantiate Knockoff Filters with Different Samplers

Source: https://amspector100.github.io/knockpy/usage.html

Shows how to instantiate KnockoffFilter with various knockoff generation methods ('maxent', 'sdp', 'mvr') and samplers ('gaussian', 'fx', 'artk').

```python
# This uses gaussian maxent knockoffs
kfilter1 = KnockoffFilter(ksampler="gaussian", knockoff_kwargs={"method": "maxent"})

# This uses fixed-X SDP knockoffs
kfilter2 = KnockoffFilter(ksampler="fx", knockoff_kwargs={"method": "sdp"})

# Metropolized sampler for heavy-tailed t markov chain using MVR-guided proposals
kfilter3 = KnockoffFilter(ksampler="artk", knockoff_kwargs={"method": "mvr"})
```

--------------------------------

### Initialize Knockoff Sampler

Source: https://amspector100.github.io/knockpy/_modules/knockpy/metro.html

Initializes the Knockoff Sampler by setting up Gibbs graphs, clique dictionaries, and divide-and-conquer strategies. It prepares data structures for sampling subsets of variables.

```python
if gibbs_graph[i, j] != 0:
                    self.cliques.append((i, j))
                    temp = gibbs_graph[i, j]
                    self.log_potentials.append(make_ising_logpotential(temp, i, j))

        # Maps variables to cliques they're part of
        # Clique key maps the clique back to the log-potential
        self.clique_dict = {i: [] for i in range(self.p)}
        for clique_key, clique in enumerate(self.cliques):
            for j in clique:
                self.clique_dict[j].append((clique_key, clique))

        # Split X up into blocks along the n-axis.
        # Each divconq key corresponds to one way to
        # divide/conquer the variables, and to one
        # of these blocks.
        self.dc_keys = []
        for div_type in ["row", "col"]:
            for trans in [max_width - 1, max_width - 2]:
                self.dc_keys.append(f"{div_type}-{trans}")
        rand_inds = np.arange(self.n)
        np.random.shuffle(rand_inds)
        self.X_ninds = {key: [] for key in self.dc_keys}
        for i, j in enumerate(rand_inds):
            key = self.dc_keys[j % len(self.dc_keys)]
            self.X_ninds[key].append(i)

        # Structure of self.divconq_info
        # (1) Dictionary takes a divide-and-conquery key
        # (translation + row/col)
        # (2) This maps to a list of dictionaries. Each
        # dictionary corresponds to one of the blocks
        # in the divide and conquer procedure.
        # (3) Each dictionary takes three keys: inds,
        # cliques, lps.
        # - inds is the list of ORIGINAL coordinates
        # of the block of variables. E.g. if block 1
        # corresponds to columns 1,5,6, in X,
        # then inds = [1,5,6]
        # - cliques is a list of cliques in the NEW coordinates
        # of the block. So for example, if 1,5 was a clique in
        # the prior example, then (0,1) would be in this list.
        # - lps are the log-potentials corresponding to
        # the cliques above.
        self.divconq_info = {}

        # This maps divconq key
        # to a list of separators (e.g. knockoffs = features)
        # for these indices.
        self.separators = {}

        for dc_key in self.dc_keys:
            div_type = dc_key.split("-")[0]
            trans = int(dc_key.split("-")[-1])
            seps, dict_list = self._divide_variables(
                dc_key=dc_key,
                translation=trans,
                max_width=max_width,
                div_type=div_type,
            )
            self.separators[dc_key] = seps
            self.divconq_info[dc_key] = dict_list

        # Initialize samplers. Each sampler will only sample
        # a subset of variables for a subset of the data
        # Structure: maps divconq key to list of n_inds, p_inds, sampler
        self.samplers = {key: [] for key in self.dc_keys}
        for key in self.dc_keys:
            # Fetch indicies
            n_inds = np.array(self.X_ninds[key])
            sep_inds = self.separators[key]

            # Helper for conditional cov matrices
            V11 = V[sep_inds][:, sep_inds]  # s x s
            V11_inv = utilities.chol2inv(V11)

            for div_group_dict in self.divconq_info[key]:
                p_inds = div_group_dict["inds"]

                # Find conditional covariance matrix V
                # for p_inds given
                # the conditioned-on-separators
                V22 = V[p_inds][:, p_inds]  # p_i x p_i
                V21 = V[p_inds][:, sep_inds]  # p_i x s
                Vcond = V22 - np.dot(
                    V21,
                    np.dot(V11_inv, V21.T),
                )

                sampler = MetropolizedKnockoffSampler(
                    lf=None,
                    X=X[n_inds][:, p_inds],
                    mu=mu[p_inds],
                    Sigma=Vcond,
                    undir_graph=gibbs_graph[p_inds][:, p_inds] != 0,
                    cliques=div_group_dict["cliques"],
                    log_potentials=div_group_dict["lps"],
                    buckets=self.buckets,
                    S=None,
                    **kwargs,
                )
                if sampler.width > max_width:
                    warnings.warn(
                        f"Treewidth heuristic inaccurate during divide/conquer, sampler for {p_inds} has width {sampler.width} > max_width {max_width}"
                    )

                self.samplers[key].append((n_inds, p_inds, sampler))
```

--------------------------------

### PSGDSolver Initialization

Source: https://amspector100.github.io/knockpy/_modules/knockpy/kpytorch/mrcgrad.html

Initializes the Projected Gradient Descent (PGD) solver for MRC knockoffs. It sets up parameters for optimization, including the correlation matrix, groups, loss calculator, and convergence criteria.

```python
    def __init__(
        self,
        Sigma,
        groups,
        losscalc=None,
        lr=1e-2,
        verbose=False,
        max_epochs=100,
        tol=1e-5,
        line_search_iter=10,
        convergence_tol=1e-1,
        **kwargs,
    ):
        # Add Sigma
        self.p = Sigma.shape[0]
        self.Sigma = Sigma
        self.groups = groups
        self.opt_S = None  # Output initialization
        self.opt_loss = np.inf

        # Save parameters for optimization
        self.lr = lr
        self.verbose = verbose
        self.max_epochs = max_epochs
        self.tol = tol
        self.line_search_iter = line_search_iter
        self.convergence_tol = convergence_tol

        # Sort by groups for ease of computation
        inds, inv_inds = utilities.permute_matrix_by_groups(groups)
        self.inds = inds
        self.inv_inds = inv_inds
        self.sorted_Sigma = self.Sigma[inds][:, inds]
        self.sorted_groups = self.groups[inds]

        # Loss calculator
        if losscalc is not None:
            self.losscalc = losscalc
```

--------------------------------

### check_pyglmnet_available

Source: https://amspector100.github.io/knockpy/_modules/knockpy/utilities.html

Checks if the pyglmnet library is installed and available. Raises a ValueError if pyglmnet is not found, providing instructions for installation.

```APIDOC
## check_pyglmnet_available

### Description
Checks if the pyglmnet library is installed and available. Raises a ValueError if pyglmnet is not found, providing instructions for installation.

### Parameters
- **purpose** (string) - A description of why pyglmnet is needed.
```

--------------------------------

### check_kpytorch_available

Source: https://amspector100.github.io/knockpy/_modules/knockpy/utilities.html

Checks if the PyTorch library is installed and available. Raises a ValueError if PyTorch is not found, providing instructions for installation.

```APIDOC
## check_kpytorch_available

### Description
Checks if the PyTorch library is installed and available. Raises a ValueError if PyTorch is not found, providing instructions for installation.

### Parameters
- **purpose** (string) - A description of why PyTorch is needed.
```

--------------------------------

### Initialize Knockoff Class

Source: https://amspector100.github.io/knockpy/_modules/knockpy/knockoffs.html

Initializes the Knockoff class with feature data and optional parameters for covariance estimation, grouping, and S-matrix computation. Defaults are provided for most parameters if not specified.

```python
def __init__(
        self,
        X,
        mu=None,
        Sigma=None,
        invSigma=None,
        groups=None,
        sample_tol=1e-5,
        S=None,
        method=None,
        verbose=False,
        **kwargs,
    ):
        # Save parameters with defaults
        self.X = X
        self.n = X.shape[0]
        self.p = X.shape[1]
        if mu is None:
            mu = X.mean(axis=0)
        self.mu = mu
        if Sigma is None:
            Sigma, invSigma = utilities.estimate_covariance(X, tol=1e-2)
        self.Sigma = Sigma
        if invSigma is None:
            invSigma = np.linalg.inv(Sigma)
        self.invSigma = invSigma
        if groups is None:
            groups = np.arange(1, self.p + 1, 1)
        self.groups = groups
        self.sample_tol = sample_tol
        self.verbose = verbose

        # Save S information and possibly compute S matrix
        self.S = S
        self.method = method
        self.kwargs = kwargs
        if self.S is None:
            if self.verbose:
                print("Computing knockoff S matrix...")
            self.S = smatrix.compute_smatrix(
                Sigma=self.Sigma, groups=self.groups, method=self.method, **self.kwargs
            )

```

--------------------------------

### GibbsGridSampler Initialization

Source: https://amspector100.github.io/knockpy/apiref.html

Initializes the GibbsGridSampler for discrete Gibbs grids using a divide-and-conquer approach combined with metropolized knockoff sampling. Requires design matrix X, Gibbs graph, and covariance matrix Sigma.

```python
GibbsGridSampler(_X_ , _gibbs_graph_ , _Sigma_ , _Q =None_, _mu =None_, _max_width =6_, _** kwargs_)
```

--------------------------------

### Initialize KnockoffFilter

Source: https://amspector100.github.io/knockpy/apiref.html

Initializes the KnockoffFilter with specified feature statistics and knockoff sampler. Uses LedoitWolf covariance estimation by default.

```python
from knockpy.knockoff_filter import KnockoffFilter
kfilter = KnockoffFilter(
    fstat='lcd',
    ksampler='gaussian',
    knockoff_kwargs={"method":"mvr"},
)
rejections = kfilter.forward(X=dgprocess.X, y=dgprocess.y)
```

--------------------------------

### Example Output for MVR Knockoffs

Source: https://amspector100.github.io/knockpy/mrcknock.html

This is an example of the output generated after running the power and false positive rate calculation for MVR knockoffs.

```text
MVR knockoffs made 55.0 discoveries.
MVR knockoffs had a power of 94.0% and false positive rate of 14.55%.
```

--------------------------------

### Sample Proposals (Continuous/Discrete)

Source: https://amspector100.github.io/knockpy/_modules/knockpy/metro.html

Generates a knockoff proposal, either continuous using random normal sampling or discrete by sampling from bucket probabilities. Allows passing pre-computed conditional mean and variance.

```python
    def sample_proposals(
        self,
        X,
        prev_proposals,
        cond_mean=None,
        cond_var=None,
    ):
        """
        Samples a continuous or discrete proposal given the design
        matrix and the previous proposals. Can pass in the conditional
        mean and variance of the new proposals, if cached, to save
        computation.
        """
        # These will be compatible as long as Sigma is
        if cond_mean is None or cond_var is None:
            cond_mean, cond_var = self.fetch_proposal_params(
                X=X, prev_proposals=prev_proposals
            )
        # Continuous sampling
        if self.buckets is None:
            proposal = np.sqrt(cond_var) * np.random.randn(self.n, 1) + cond_mean

        # Discrete sampling
        if self.buckets is not None:
            # Cumulative probability buckets
            bucket_probs = gaussian_log_likelihood(
                X=self.buckets,
                mu=cond_mean,
                var=cond_var,
            )
            bucket_probs = scipy.special.softmax(
                bucket_probs.astype(np.float32), axis=-1
            )
            cuml_bucket_probs = bucket_probs.cumsum(axis=-1)
```

--------------------------------

### many_ks_tests

Source: https://amspector100.github.io/knockpy/apiref.html

Gets p values by running ks tests and then does a multiple testing correction.

```APIDOC
## many_ks_tests(sample1s, sample2s)

### Description
Gets p values by running ks tests and then does a multiple testing correction.

### Parameters
#### Path Parameters
- **sample1s** (list of arrays) - The first list of samples.
- **sample2s** (list of arrays) - The second list of samples.
```

--------------------------------

### KnockoffFilter Class

Source: https://amspector100.github.io/knockpy/apiref.html

The KnockoffFilter class performs knockoff-based inference from start to finish, wrapping both the KnockoffSampler and FeatureStatistic classes.

```APIDOC
## class knockpy.knockoff_filter.KnockoffFilter(fstat='lasso', ksampler='gaussian', fstat_kwargs={}, knockoff_kwargs={})

Performs knockoff-based inference, from start to finish.
This wraps both the `knockoffs.KnockoffSampler` and `knockoff_stats.FeatureStatistic` classes.

### Parameters

*   **fstat** (str or knockpy.knockoff_stats.FeatureStatistic) - The feature statistic to use in the knockoff filter. Defaults to a lasso statistic. This may also be a string identifier, including:
    *   ‘lasso’ or ‘lcd’: cross-validated lasso coefficients differences
    *   ‘mlr’: the masked likelihood ratio (MLR) statistic, which has provable optimality properties for knockoffs
    *   ‘lsm’: signed maximum of the lasso path statistic as in Barber and Candes 2015
    *   ‘dlasso’: Cross-validated debiased lasso coefficients
    *   ‘ridge’: Cross validated ridge coefficients
    *   ‘ols’: Ordinary least squares coefficients
    *   ‘margcorr’: marginal correlations between features and response
    *   ‘deeppink’: The deepPINK statistic as in Lu et al. 2018
    *   ‘mlrspline’ or ‘spline’: The MLR statistic applied to regression spline basis functions.
    *   ‘randomforest’: A random forest with swap importances

    Note that when using FX knockoffs, the lcd statistic does not use cross-valdilidation and sets `alphas = sigma2 * sqrt(log(p) / n).`

*   **ksampler** (str or knockpy.knockoffs.KnockoffSampler) - The knockoff sampler to use in the knockoff filter. This may also be a string identifier, including:
    *   ‘gaussian’: Gaussian Model-X knockoffs
    *   ‘fx’: Fixed-X knockoffs.
    *   ‘metro’: Generic metropolized knockoff sampler
    *   ‘artk’: t-tailed Markov chain.
    *   ‘blockt’: Blocks of t-distributed
    *   ‘gibbs_grid’: Discrete gibbs grid

    An alternative to specifying the ksampler is to simply pass in a knockoff matrix during the `forward` call.

*   **fstat_kwargs** (dict) - Kwargs to pass to the feature statistic `fit` function, excluding the required arguments, defaults to {}

*   **knockoff_kwargs** (dict) - Kwargs for instantiating the knockoff sampler argument if the ksampler argument is a string identifier. This can be the empt dict for some identifiers such as “gaussian” or “fx”, but additional keyword arguments are required for complex samplers such as the “metro” identifier. Defaults to {}

### Example

Here we fit the KnockoffFilter on fake data from a Gaussian linear model:
```python
# Fake data-generating process for Gaussian linear model
import knockpy as kpy
dgprocess = kpy.dgp.DGP()
dgprocess.sample_data(n=500, p=500, sparsity=0.1)

# LCD statistic with Gaussian MX knockoffs
```
```

--------------------------------

### Initialize KnockoffFilter with LCD and Gaussian Knockoffs

Source: https://amspector100.github.io/knockpy/_modules/knockpy/knockoff_filter.html

Instantiate the KnockoffFilter using the LCD statistic and Gaussian knockoffs. This configuration is suitable for Gaussian linear models and utilizes LedoitWolf covariance estimation by default. Additional keyword arguments can be passed to the knockoff sampler.

```python
from knockpy.knockoff_filter import KnockoffFilter
kfilter = KnockoffFilter(
    fstat='lcd',
    ksampler='gaussian',
    knockoff_kwargs={"method":"mvr"},
)
```

--------------------------------

### Get Active Coordinate Helper

Source: https://amspector100.github.io/knockpy/_modules/knockpy/metro.html

A helper function for divide-and-conquer in Ising models, used to extract the active coordinate from a variable index.

```python
def _get_ac(self, i, div_type):
        """
        Helper function for divide-and-conquer
        in Ising model. Extracts active coordinate
        from variable i.
        Returns ac, lc, wc.
        """
        lc, wc = self.num2coords(i)
        if div_type == "row":
```

--------------------------------

### Convert Graph to Cliques

Source: https://amspector100.github.io/knockpy/_modules/knockpy/dgp.html

Transforms a connection graph into a dictionary of cliques, represented as binary connections. Used for Gibbs grid setup.

```python
def graph2cliques(Q):
    """
    Turns graph Q of connections into binary cliques for Gibbs grid
    """
    p = Q.shape[0]
    clique_dict = {i: [] for i in range(p)}
    for i in range(p):
        for j in range(i):
            if Q[i, j] != 0:
                clique_dict[i].append((i, j))
                clique_dict[j].append((j, i))
    # Remove duplicates
    for i in range(p):
        clique_dict[i] = list(set(clique_dict[i]))
    return clique_dict
```

--------------------------------

### Initialize GibbsGridSampler

Source: https://amspector100.github.io/knockpy/_modules/knockpy/metro.html

Initializes a GibbsGridSampler for discrete Gibbs grids using a divide-and-conquer approach combined with metropolized knockoff sampling. It infers parameters from the input data and graph, and sets up internal structures for clique learning and log-potentials.

```python
def __init__(self, X, gibbs_graph, Sigma, Q=None, mu=None, max_width=6, **kwargs):
        # Infer bucketization and V
        V = Sigma
        self.n = X.shape[0]
        self.p = X.shape[1]
        self.X = X
        self.V = V
        self.S = None
        self.gibbs_graph = gibbs_graph
        self.gridwidth = int(np.sqrt(self.p))
        if self.gridwidth**2 != self.p:
            raise ValueError(f"p {self.p} must be a square number for gibbs grid")
        self.buckets = np.unique(X)
        if mu is None:
            mu = X.mean(axis=0)

        # Div and conquer makes this nontrivial
        # (Note to self: maybe TODO?)
        if "S" in kwargs:
            kwargs.pop("S")
        if "invSigma" in kwargs:
            kwargs.pop("invSigma")

        # Dummy order / inv_order variables for consistency
        self.order = np.arange(self.p)
        self.inv_order = np.arange(self.p)

        def make_ising_logpotential(temp, i, j):
            def lp(X):
                return -1 * temp * np.abs(X[:, 0] - X[:, 1])

            return lp

        # Learn cliques, log-potentials
        self.cliques = []
        self.log_potentials = []
        for i in range(self.p):
            for j in range(i):
```

--------------------------------

### Preprocess Groups for Knockpy

Source: https://amspector100.github.io/knockpy/_modules/knockpy/utilities.html

Maps unique elements in a group array to consecutive integers starting from 1. This is often a prerequisite for other group-based operations.

```python
import numpy as np

def preprocess_groups(groups):
    """
    Maps the m unique elements of a 1D "groups" array to the integers from 1 to m.
    """
    unique_vals = np.sort(np.unique(groups))
    conversion = {unique_vals[i]: i for i in range(unique_vals.shape[0])}
    return np.array([conversion[x] + 1 for x in groups])
```

--------------------------------

### Compute q1 and q2

Source: https://amspector100.github.io/knockpy/apiref.html

Computes q1 and q2 as specified by page 33 of the paper. No specific setup is mentioned, but it relates to the metropolized sampler.

```python
log_q12(_x_flags_ , _j_)
```

--------------------------------

### DeepPinkModel Initialization

Source: https://amspector100.github.io/knockpy/_modules/knockpy/kpytorch/deeppink.html

Initializes the DeepPinkModel with specified dimensions and MLP architecture. Supports optional normalization of the initial sparse layer weights.

```python
class DeepPinkModel(nn.Module):
    def __init__(self, p, hidden_sizes=[64], y_dist="gaussian", normalize_Z=True):
        """
        Adapted from https://arxiv.org/pdf/1809.01185.pdf.

        The module has two components:
        1. A sparse linear layer with dimension 2*p to p.
        However, there are only 2*p weights (each feature
        and knockoff points only to their own unique node).
        This is (maybe?) followed by a ReLU activation.
        2. A multilayer perceptron (MLP)

        Parameters
        ----------
        p : int
            The dimensionality of the data
        hidden_sizes: list
            A list of hidden sizes for the mlp layer(s).
            Defaults to [64].
        normalize_Z : bool
            If True, the first sparse linear layer is normalized
            so the weights for each feature/knockoff pair have an
            l1 norm of 1. This can modestly improve power in some
            settings.
        """

        super().__init__()

        # Initialize weight for first layer
        self.p = p
        self.y_dist = y_dist
        self.Z_weight = nn.Parameter(torch.ones(2 * p))
        self.norm_Z_weight = normalize_Z

        # Save indices/reverse indices to prevent violations of FDR control
        self.inds, self.rev_inds = utilities.random_permutation_inds(2 * p)
        self.feature_inds = self.rev_inds[0 : self.p]
        self.ko_inds = self.rev_inds[self.p :]

        # Create MLP layers
        mlp_layers = [nn.Linear(p, hidden_sizes[0])]
        for i in range(len(hidden_sizes) - 1):
            mlp_layers.append(nn.ReLU())
            mlp_layers.append(nn.Linear(hidden_sizes[i], hidden_sizes[i + 1]))
        # Prepare for either MSE loss or cross entropy loss
        mlp_layers.append(nn.ReLU())
        if y_dist == "gaussian":
            mlp_layers.append(nn.Linear(hidden_sizes[-1], 1))
        else:
            mlp_layers.append(nn.Linear(hidden_sizes[-1], 2))

        # Then create MLP
        self.mlp = nn.Sequential(*mlp_layers)
```

--------------------------------

### Check pyglmnet Availability

Source: https://amspector100.github.io/knockpy/_modules/knockpy/utilities.html

Checks for the availability of the pyglmnet library, essential for certain functionalities. Provides an informative error message and installation link if the library is missing.

```python
def check_pyglmnet_available(purpose):
    try:
        import pyglmnet as pyglmnet
    except ImportError as err:
        raise ValueError(
            f"pyglmnet is required for {purpose}, but importing pyglmnet raised {err}. See https://github.com/glm-tools/pyglmnet/."
        )

```

--------------------------------

### Initialize KnockoffFilter with MetropolizedKnockoffSampler

Source: https://amspector100.github.io/knockpy/usage.html

Shows how to initialize a KnockoffFilter using a pre-configured MetropolizedKnockoffSampler instance. This allows using custom sampling methods within the filter.

```python
kfilter_metro = KnockoffFilter(ksampler=metrosampler, fstat="ridge")
```

--------------------------------

### SDP Solver Implementation in KnockPy

Source: https://amspector100.github.io/knockpy/_modules/knockpy/mac.html

This code block outlines the process of setting up and solving a semidefinite programming problem. It includes sorting the covariance matrix, creating block variables for the S matrix, defining constraints, and constructing the optimization objective based on different norms. The problem is then solved, and the resulting S matrix is unsorted and processed.

```python
        elif dsdp_warning:
            warnings.warn(constants.DSDP_WARNING)

    # Sort the covariance matrix according to the groups
    inds, inv_inds = utilities.permute_matrix_by_groups(groups)
    sortedSigma = Sigma[inds][:, inds]

    # Create blocks of semidefinite matrix S,
    # as well as the whole matrix S
    variables = []
    constraints = []
    S_rows = []
    shift = 0
    for j in range(m):
        # Create block variable
        gj = int(group_sizes[j])
        Sj = cp.Variable((gj, gj), symmetric=True)
        constraints += [Sj >> 0]
        variables.append(Sj)

        # Create row of S
        if shift == 0 and shift + gj < p:
            rowj = cp.hstack([Sj, cp.Constant(np.zeros((gj, p - gj)))])
        elif shift + gj < p:
            rowj = cp.hstack(
                [
                    cp.Constant(np.zeros((gj, shift))),
                    Sj,
                    cp.Constant(np.zeros((gj, p - gj - shift))),
                ]
            )
        elif shift + gj == p and shift > 0:
            rowj = cp.hstack([cp.Constant(np.zeros((gj, shift))), Sj])
        elif gj == p and shift == 0:
            rowj = cp.hstack([Sj])

        else:
            raise ValueError(
                f"shift ({shift}) and gj ({gj}) add up to more than p ({p})"
            )
        S_rows.append(rowj)

        # Incremenet shift
        shift += gj

    # Construct S and Grahm Matrix
    S = cp.atoms.affine.wraps.psd_wrap(
        cp.vstack(S_rows)
    )  # does this improve performance?
    sortedSigma = cp.Constant(sortedSigma)
    constraints += [2 * sortedSigma - S >> 0]

    # Construct optimization objective
    if objective == "abs":
        objective = cp.Minimize(cp.sum(cp.abs(sortedSigma - S)))
    elif objective == "pnorm":
        objective = cp.Minimize(cp.pnorm(sortedSigma - S, norm_type))
    elif objective == "norm":
        objective = cp.Minimize(cp.norm(sortedSigma - S, norm_type))
    # Note we already checked objective is one of these values earlier

    # Construct, solve the problem.
    problem = cp.Problem(objective, constraints)
    problem.solve(verbose=verbose, **kwargs)
    if verbose:
        print("Finished solving SDP!")

    # Unsort and get numpy
    S = S.value
    if S is None:
        raise ValueError(
            "SDP formulation is infeasible. Try decreasing the tol parameter."
        )
    S = S[inv_inds][:, inv_inds]

    # Clip 0 and 1 values
    for i in range(p):
        S[i, i] = max(tol, min(1 - tol, S[i, i]))

    # Scale to make this PSD using binary search
    S, gamma = utilities.scale_until_PSD(Sigma, S, tol, num_iter)

    # Return unsorted S value
    return S
```

--------------------------------

### Get Variable Ordering from Junction Tree

Source: https://amspector100.github.io/knockpy/_modules/knockpy/metro.html

Extracts a variable ordering and active frontier from a junction tree. This is used for the metro knockoff sampler and is adapted from external research.

```python
def get_ordering(T):
    """
    Takes a junction tree and returns a variable ordering for the metro
    knockoff sampler. The code from this function is adapted from
    the code distributed with https://arxiv.org/abs/1903.00434.

    Parameters
    ----------
    T : A networkx graph that is a junction tree.
    Nodes must be sets with elements 0,...,p-1.

    Returns
    -------
    order : a numpy array with unique elements 0,...,p-1
    active_frontier : list of lists
        a list of length p gwhere entry j is the set of entries > j
        that are in V_j. This specifies the conditional independence structure
        of a joint covariate distribution. See page 34 of
        https://arxiv.org/abs/1903.00434.
    """
    # Initialize
    T = T.copy()
    order = []
    active_frontier = []

    while T.number_of_nodes() > 0:
        # Loop through leaf nodes
        gen = (x for x in T.nodes() if T.degree(x) <= 1)
        active_node = next(gen)

        # Parent nodes of leaf nodes
        # active_vars get added to the order in this step
        # activated set are just the variables in the active node
        parents = list(T[active_node].keys())
        if len(parents) == 0:
            active_vars = set(active_node)
            activated_set = active_vars.copy()
        else:
            active_vars = set(active_node.difference(parents[0]))
            activated_set = active_vars.union(parents[0]).difference(set(order))
        for i in list(active_vars)[::-1]:  # This line was changed too
            order += [i]
            frontier = list(activated_set.difference(set(order)))
            active_frontier += [frontier]
        T.remove_node(active_node)

    return [np.array(order), active_frontier]
```

--------------------------------

### Initialize KnockoffFilter with Masked Likelihood Ratio (MLR) Statistics

Source: https://amspector100.github.io/knockpy/usage.html

Shows how to initialize a KnockoffFilter using MLR statistics. Use 'mlr' for Gaussian linear or binary regression models, and 'mlr_spline' for generalized additive models based on regression splines.

```python
# default MLR statistics; these work for both continuous and binary responses
kfilter_mlr = KnockoffFilter(ksampler="gaussian", fstat="mlr")
# masked likelihood ratio statistics based on regression splines
kfilter_mlr_splines = KnockoffFilter(ksampler="gaussian", fstat="mlr_spline")
```

--------------------------------

### Instantiate KnockoffFilter with LCD and Gaussian MX

Source: https://amspector100.github.io/knockpy/apiref.html

Instantiates the KnockoffFilter using the LCD statistic and Gaussian Model-X knockoffs. Requires prior data generation.

```python
import knockpy as kpy
dgprocess = kpy.dgp.DGP()
dgprocess.sample_data(n=500, p=500, sparsity=0.1)
```

--------------------------------

### Gaussian Graphical Model Edge Detection

Source: https://amspector100.github.io/knockpy/_sources/usage.ipynb.txt

Utilize KnockoffGGM for detecting edges in Gaussian Graphical models using LCD statistics with FX knockoffs. This example demonstrates a scenario with no edges under the global null.

```python
# Fake data-generating process for Gaussian graphical model
# under global null with no edges
import numpy as np
from knockpy.ggm import KnockoffGGM

X = np.random.randn(300, 30)

# LCD statistic with FX knockoffs
gkf = KnockoffGGM(
    fstat="lcd",
    knockoff_kwargs={"method": "mvr"},
)
edges = gkf.forward(X=X, verbose=False)
np.any(edges)
```

--------------------------------

### Get Variable Ordering

Source: https://amspector100.github.io/knockpy/apiref.html

Takes a junction tree and returns a variable ordering for the metro knockoff sampler. Code is adapted from a provided arXiv paper. Nodes must be sets with elements 0 to p-1.

```python
knockpy.metro.get_ordering(_T_)
```

--------------------------------

### Calculate DeepPINK Feature Statistics (W)

Source: https://amspector100.github.io/knockpy/_modules/knockpy/knockoff_stats.html

Computes feature statistics (W) using the DeepPINK model. Supports different feature importance calculation methods and group aggregation strategies. Ensure kpytorch is installed.

```python
utilities.check_kpytorch_available(purpose="deepPINK statistics")
from .kpytorch import deeppink

# Bind data
n = X.shape[0]
p = X.shape[1]
features = np.concatenate([X, Xk], axis=1)

# The deeppink model class will shuffle statistics,
# but for compatability we create indices anyway
self.inds = np.arange(2 * p)
self.rev_inds = self.inds

# By default, all variables are their own group
if groups is None:
    groups = np.arange(1, p + 1)
self.groups = groups

# Parse y_dist, hidden_sizes, initialize model
parse_y_dist(y)
if "hidden_sizes" not in kwargs:
    kwargs["hidden_sizes"] = [min(n, p)]
self.model = deeppink.DeepPinkModel(p=p, **kwargs)

# Train model
self.model.train()
self.model = deeppink.train_deeppink(self.model, features, y, **train_kwargs)
self.model.eval()

# Get Z statistics
feature_importance = str(feature_importance).lower()
if feature_importance == "deeppink":
    self.Z = self.model.feature_importances(weight_scores=True)
elif feature_importance == "unweighted":
    self.Z = self.model.feature_importances(weight_scores=False)
elif feature_importance == "swap":
    self.Z = self.swap_feature_importances(features, y)
elif feature_importance == "swapint":
    self.Z = self.swap_path_feature_importances(features, y)
else:
    raise ValueError(
        f"feature_importance {feature_importance} must be one of 'deeppink', 'unweighted', 'swap', 'swapint'"
    )

# Get W statistics
self.W = combine_Z_stats(
    self.Z, self.groups, antisym=antisym, group_agg=group_agg
)

# Possibly score model
self.cv_score_model(features=features, y=y, cv_score=cv_score)

return self.W
```

--------------------------------

### Run KnockoffFilter Forward Pass with Provided Knockoffs

Source: https://amspector100.github.io/knockpy/usage.html

Demonstrates how to manually pass generated knockoffs (Xk) to the KnockoffFilter's forward method for feature selection. This is an alternative to letting the sampler generate them internally.

```python
out = kfilter_metro.forward(X=X_metro, y=y_metro, Xk=Xk)
```

--------------------------------

### Sample Knockoffs with Divide-and-Conquer

Source: https://amspector100.github.io/knockpy/_modules/knockpy/metro.html

Samples knockoffs using a divide-and-conquer strategy. Initializes knockoff and acceptance matrices, then iterates through different variable separation methods. For each method, it processes blocks of variables, samples knockoffs using specialized samplers, and updates the main knockoff matrix. Finally, it checks the validity of the generated knockoffs.

```python
def sample_knockoffs(self, **kwargs):
        """
        Samples knockoffs using divide-and-conquer approach.

        Parameters
        ----------
        kwargs : dict
            Keyword args for ``MetropolizedKnockoffSampler.sample_knockoffs``.

        Returns
        -------
        Xk : np.ndarray
            A ``(n, p)``-shaped knockoff matrix in the original order
            the variables were passed in.
        """

        self.Xk = np.zeros((self.n, self.p))
        self.Xk[:] = np.nan
        self.final_acc_probs = self.Xk.copy()
        self.acceptances = self.Xk.copy()

        # Loop through different ways of separating variables
        for key in self.dc_keys:
            # N inds for this particular method of separation
            n_inds = np.array(self.X_ninds[key])

            # Initialize output
            Xkblock = np.zeros((len(n_inds), self.p))
            Xkblock[:] = np.nan
            accblock = Xkblock.copy()
            probblock = Xkblock.copy()

            # Set separating knockoffs = to their features
            sep_inds = self.separators[key]
            Xkblock[:, sep_inds] = self.X[n_inds][:, sep_inds]
            accblock[:, sep_inds] = 0
            probblock[:, sep_inds] = 0

            # Loop through blocks
            for n_inds, p_inds, sampler in self.samplers[key]:
                # Sample knockoffs
                Xkblock[:, p_inds] = sampler.sample_knockoffs(**kwargs)

                # Save final_acc_probs, acceptances
                accblock[:, p_inds] = sampler.final_acc_probs[:, sampler.inv_order]
                probblock[:, p_inds] = sampler.acceptances[:, sampler.inv_order]

            # Set Xk value for this block
            self.Xk[n_inds] = Xkblock
            self.acceptances[n_inds] = accblock
            self.final_acc_probs[n_inds] = probblock

        # Test validity
        self.check_xk_validity(self.X, self.Xk, testname="IsingSampler", alpha=1e-5)

        return self.Xk
```

--------------------------------

### KnockoffFilter Class Initialization

Source: https://amspector100.github.io/knockpy/apiref.html

Instantiate the KnockoffFilter with specified feature statistics and knockoff sampler options.

```APIDOC
## KnockoffFilter

### Description
Initializes the KnockoffFilter with specified feature statistics and knockoff sampler options. It can be configured with various statistics and sampling methods.

### Parameters

- **fstat** (FeatureStatistic) - The feature statistics to use for inference. Defaults to LedoitWolf covariance estimation.
- **ksampler** (KnockoffSampler) - The knockoff sampler to use during inference.
- **fstat_kwargs** (dict) - Dictionary of kwargs to pass to the `fit` call of `self.fstat`.
- **knockoff_kwargs** (dict) - If `ksampler` is not yet initialized, kwargs to pass to `ksampler`.

### Example
```python
from knockpy.knockoff_filter import KnockoffFilter
kfilter = KnockoffFilter(
    fstat='lcd',
    ksampler='gaussian',
    knockoff_kwargs={"method":"mvr"},
)
```
```

--------------------------------

### Generate Wishart Correlation Matrix

Source: https://amspector100.github.io/knockpy/_modules/knockpy/dgp.html

Generates a correlation matrix using the Wishart distribution. It starts with a random matrix W, computes V = W^T W, converts it to a correlation matrix, and ensures it's positive semi-definite.

```python
from . import utilities
from .utilities import cov2corr, shift_until_PSD

DEFAULT_DF_T = 3


[docs]
def Wishart(d=100, p=100, tol=1e-2):
    """
    Let W be a random `d` x `p` matrix with i.i.d. Gaussian
    entries. Then Sigma = ``cov2corr``(W^T W).
    """

    W = np.random.randn(d, p)
    V = np.dot(W.T, W)
    V = cov2corr(V)
    return cov2corr(shift_until_PSD(V, tol=tol))
```

--------------------------------

### Create Synthetic Dataset for Knockpy

Source: https://amspector100.github.io/knockpy/usage.html

Generates a synthetic dataset with a specified number of data points and features, including a random covariance matrix and sparse coefficients for the outcome variable. This setup is typical for testing variable selection methods.

```python
import numpy as np
import knockpy
import warnings

np.random.seed(123)
n = 300  # number of data points
p = 500  # number of features
Sigma = knockpy.dgp.AR1(p=p, rho=0.5)  # Stationary AR1 process with correlation 0.5

# Sample X
X = np.random.multivariate_normal(mean=np.zeros(p), cov=Sigma, size=(n,))

# Create random sparse coefficients
beta = knockpy.dgp.create_sparse_coefficients(p=p, sparsity=0.1)
y = np.dot(X, beta) + np.random.randn(n)
```

--------------------------------

### Initialize and Cache Optimal S in PSGDSolver

Source: https://amspector100.github.io/knockpy/_modules/knockpy/kpytorch/mrcgrad.html

Initializes the loss calculation and caches the optimal solution S. This is typically done once during solver initialization.

```python
        # Initialize cache of optimal S
        with torch.no_grad():
            init_loss = self.losscalc(smoothing=0)
            if init_loss < 0:
                init_loss = np.inf
        self.cache_S(init_loss)

        # Initialize attributes which save losses over time
        self.all_losses = []
        self.projected_losses = []
        self.improvements = []

    def cache_S(self, new_loss):
        # Cache optimal solution
        with torch.no_grad():
            self.prev_opt_S = self.opt_S
            self.prev_opt_loss = self.opt_loss
            self.opt_loss = new_loss
            self.opt_S = self.losscalc.pull_S().clone().detach().numpy()
```

--------------------------------

### FXSampler Constructor

Source: https://amspector100.github.io/knockpy/api/knockpy.knockoffs.FXSampler.html

Initializes the FXSampler with input data and optional parameters. See GaussianSampler documentation for argument descriptions.

```APIDOC
class FXSampler(_X_, _groups=None_, _sample_tol=1e-05_, _S=None_, _method=None_, _verbose=False_, _**kwargs_)
```