### Train GFlowNet with Python Loop

Source: https://github.com/gfnorg/torchgfn/blob/master/docs/source/guides/example.md

This code trains a GFlowNet model for 1000 iterations, sampling 16 trajectories each time. It uses the tqdm library for progress tracking, and updates the optimizer after computing the loss. Dependencies include torch, tqdm, and the GFlowNet-specific sampler and loss functions.

```Python
for i in (pbar := tqdm(range(1000))):
    # Sample trajectories off-policy with tempered distribution.
    # Log probabilities are omitted; estimator outputs are saved for efficiency.
    trajectories = sampler.sample_trajectories(env=env, n=16, save_logprobs=False, save_estimator_outputs=True, temperature=1.5)
    optimizer.zero_grad()
    loss = gfn.loss(env, trajectories)
    loss.backward()
    optimizer.step()
    if i % 25 == 0:
        pbar.set_postfix({"loss": loss.item()})
```

--------------------------------

### Install torchgfn with Development Dependencies (Bash)

Source: https://github.com/gfnorg/torchgfn/blob/master/docs/source/README.md

Installs torchgfn with specific dependency sets like 'dev' for development, 'scripts' for running examples, or 'all' for complete installation. This allows for tailored installations based on user needs.

```bash
pip install torchgfn[scripts]
```

--------------------------------

### TorchGFN Environment Setup

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

This code snippet sets up the Python environment for using the torchgfn library. It imports necessary modules for actions, environments, estimators, GFlowNet models, preprocessors, states, and utility modules. Dependencies include the `gfn` library and its submodules.

```python
from typing import ClassVar, Tuple, cast

import gfn
from gfn.actions import Actions
from gfn.env import DiscreteEnv
from gfn.estimators import DiscretePolicyEstimator, ScalarEstimator
from gfn.gflownet import FMGFlowNet, TBGFlowNet
from gfn.preprocessors import IdentityPreprocessor
from gfn.states import DiscreteStates
from gfn.utils.modules import MLP
```

--------------------------------

### Setup Flow Matching Estimator and FMGFlowNet in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Creates an MLP module to estimate log edge flows, wraps it in a `DiscretePolicyEstimator`, and constructs an `FMGFlowNet` with this estimator. An Adam optimizer is instantiated, and the training loop is invoked to train the model on the face environment.

```python
# nn.Module that estimates _log_ edge flows.
module = MLP(
    input_dim=env.state_shape[-1],
    output_dim=env.n_actions,
    hidden_dim=n_hid_units,
    n_hidden_layers=1,
)
# This is our _log_ edge flow estimator.
estimator = DiscretePolicyEstimator(
    module=module,
    n_actions=env.n_actions,
)

# The gflownet class wraps our estimator (inclusing sampler functionality).
gflownet = FMGFlowNet(estimator)
optimizer = torch.optim.Adam(gflownet.parameters(), lr=learning_rate)  # TODO: Verify.

visited_terminating_states, states_visited, losses = train(
    gflownet,
    optimizer,
    env,
    n_episodes=n_episodes * 10,
)
```

--------------------------------

### Setup experiment models and optimizer in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

Creates forward and backward MLP models with a hidden dimension, initializes a log‑partition parameter, and configures an Adam optimizer with separate learning rates for the models and logZ. This prepares all learnable components for GFlowNet training.

```python
def setup_experiment(hid_dim=64, lr_model=1e-3, lr_logz=1e-1):
    """Generate the learned parameters and optimizer for an experiment.

    Forward and backward models are MLPs with a single hidden layer. logZ is
    a single parameter. Note that we give logZ a higher learning rate, which is
    a common trick used when utilizing Trajectory Balance.
    """
    # Input = [x_position, n_steps], Output = [mus, standard_deviations].
    forward_model = torch.nn.Sequential(torch.nn.Linear(2, hid_dim),
                                        torch.nn.ELU(),
                                        torch.nn.Linear(hid_dim, hid_dim),
                                        torch.nn.ELU(),
                                        torch.nn.Linear(hid_dim, 2)).to(device)

    backward_model = torch.nn.Sequential(torch.nn.Linear(2, hid_dim),
                                        torch.nn.ELU(),
                                        torch.nn.Linear(hid_dim, hid_dim),
                                        torch.nn.ELU(),
                                        torch.nn.Linear(hid_dim, 2)).to(device)

    logZ = torch.nn.Parameter(torch.tensor(0.0, device=device))

    optimizer = torch.optim.Adam(
        [
            {'params': forward_model.parameters(), 'lr': lr_model},
            {'params': backward_model.parameters(), 'lr': lr_model},
            {'params': [logZ], 'lr': lr_logz},
        ]
    )

    return (forward_model, backward_model, logZ, optimizer)
```

--------------------------------

### Define step and state initialization functions in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

Provides functions to perform a forward step in the environment and to initialize the starting state for a batch. These utilities handle updating the position and step counter, and set the initial x_position based on the environment configuration.

```python
def step(x, action):
    """Takes a forward step in the environment."""
    new_x = torch.zeros_like(x)
    new_x[:, 0] = x[:, 0] + action  # TODO: Complete - add action delta.
    new_x[:, 1] = x[:, 1] + 1  # TODO: Complete - increment step counter.

    return new_x


def initalize_state(batch_size, device, env, randn=False):
    """Trajectory starts at state = (X_0, t=0)."""
    x = torch.zeros((batch_size, 2), device=device)
    x[:, 0] = env.init_value  # TODO: Complete.

    return x
```

--------------------------------

### Training Trajectory Balance GFlowNet in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/docs/source/guides/example.md

This code sets up a HyperGrid environment and trains a Trajectory Balance GFlowNet using forward and backward policy estimators with shared MLP trunks. It requires torch, tqdm, and torchgfn library; inputs are environment states, outputs are trained policies for sampling trajectories. Limitations include on-policy sampling and fixed learning rates, suitable for discrete action spaces but may need adjustments for complex environments.

```Python
import torch
from tqdm import tqdm

from gfn.gflownet import TBGFlowNet
from gfn.gym import HyperGrid  # We use the hyper grid environment
from gfn.preprocessors import KHotPreprocessor
from gfn.modules import DiscretePolicyEstimator
from gfn.samplers import Sampler
from gfn.utils.modules import MLP  # is a simple multi-layer perceptron (MLP)

# 1 - We define the environment.
env = HyperGrid(ndim=4, height=8)  # Grid of size 8x8x8x8
preprocessor = KHotPreprocessor(ndim=env.ndim, height=env.height)

# 2 - We define the needed modules (neural networks).
input_dim = preprocessor.output_dim if preprocessor.output_dim is not None else env.state_shape[-1]
module_PF = MLP(
    input_dim=input_dim,
    output_dim=env.n_actions
)  # Neural network for the forward policy, with as many outputs as there are actions

module_PB = MLP(
    input_dim=input_dim,
    output_dim=env.n_actions - 1,
    trunk=module_PF.trunk  # We share all the parameters of P_F and P_B, except for the last layer
)

# 3 - We define the estimators.
pf_estimator = DiscretePolicyEstimator(module_PF, env.n_actions, is_backward=False, preprocessor=preprocessor)
pb_estimator = DiscretePolicyEstimator(module_PB, env.n_actions, is_backward=True, preprocessor=preprocessor)

# 4 - We define the GFlowNet.
gfn = TBGFlowNet(pf=pf_estimator, pb=pb_estimator, init_logZ=0.0)  # We initialize logZ to 0

# 5 - We define the sampler and the optimizer.
sampler = Sampler(estimator=pf_estimator)  # We use an on-policy sampler, based on the forward policy

# Different policy parameters can have their own LR.
# Log Z gets dedicated learning rate (typically higher).
optimizer = torch.optim.Adam(gfn.pf_pb_parameters(), lr=1e-3)
optimizer.add_param_group({"params": gfn.logz_parameters(), "lr": 1e-1})

# 6 - We train the GFlowNet for 1000 iterations, with 16 trajectories per iteration
for i in (pbar := tqdm(range(1000))):

    # save_logprobs=True makes on-policy training faster
    trajectories = sampler.sample_trajectories(env=env, n=16, save_logprobs=True)
    optimizer.zero_grad()
    loss = gfn.loss(env, trajectories)
    loss.backward()
    optimizer.step()
    if i % 25 == 0:
        pbar.set_postfix({"loss": loss.item()})
```

--------------------------------

### Setup Policy Estimators and TBGFlowNet for Trajectory Balance

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Initializes MLP modules and DiscretePolicyEstimators for forward and backward policies, then combines them into a TBGFlowNet. This is a prerequisite for training a GFlowNet using the trajectory balance objective.

```python
# nn.Modules for the forward and backward policy estimators.
pf_module = MLP(
    input_dim=env.state_shape[-1],
    output_dim=env.n_actions,
    hidden_dim=n_hid_units,
    n_hidden_layers=1,
)
pb_module = MLP(
    input_dim=env.state_shape[-1],
    output_dim=env.n_actions - 1,
    hidden_dim=n_hid_units,
    n_hidden_layers=1,
)
# Estimators for the forward and backward policies.
pf_estimator = DiscretePolicyEstimator(
    module=pf_module,
    n_actions=env.n_actions,
)
pb_estimator = DiscretePolicyEstimator(
    module=pb_module,
    n_actions=env.n_actions,
    is_backward=True,
)

# Our trajectory balance gflownet accepts both policy estimators.
gflownet = TBGFlowNet(
    pf=pf_estimator,
    pb=pb_estimator,
)
```

--------------------------------

### Print Final Partition Function Estimate

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Retrieves the learned 'logZ' parameter from the GFlowNet object, exponentiates it to get the partition function estimate, and prints the result formatted to two decimal places.

```python
print("The partition function estimate is Z={:.2f}".format(
    torch.exp(gflownet.logZ).item()
    )
)
```

--------------------------------

### Training Sub Trajectory Balance GFlowNet in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/docs/source/guides/example.md

This code configures a Sub Trajectory Balance GFlowNet with additional scalar estimator for logF, using shared MLPs for policies in a HyperGrid environment. Dependencies include torch, tqdm, and torchgfn; it processes states via preprocessor, outputs policy and reward estimates. Limitations involve the lambda parameter for sub-trajectory balancing and separate optimizers for logF, best for environments with known reward structures.

```Python
import torch
from tqdm import tqdm

from gfn.gflownet import SubTBGFlowNet
from gfn.gym import HyperGrid  # We use the hyper grid environment
from gfn.preprocessors import KHotPreprocessor
from gfn.modules import DiscretePolicyEstimator, ScalarEstimator
from gfn.samplers import Sampler
from gfn.utils.modules import MLP  # MLP is a simple multi-layer perceptron (MLP)

# 1 - We define the environment.
env = HyperGrid(ndim=4, height=8)  # Grid of size 8x8x8x8
preprocessor = KHotPreprocessor(ndim=env.ndim, height=env.height)

# 2 - We define the needed modules (neural networks).
# The environment has a preprocessor attribute, which is used to preprocess the state before feeding it to the policy estimator
input_dim = preprocessor.output_dim if preprocessor.output_dim is not None else env.state_shape[-1]
module_PF = MLP(
    input_dim=input_dim,
    output_dim=env.n_actions
)  # Neural network for the forward policy, with as many outputs as there are actions

module_PB = MLP(
    input_dim=input_dim,
    output_dim=env.n_actions - 1,
    trunk=module_PF.trunk  # We share all the parameters of P_F and P_B, except for the last layer
)
module_logF = MLP(
    input_dim=input_dim,
    output_dim=1,  # Important for ScalarEstimators!
)

# 3 - We define the estimators.
pf_estimator = DiscretePolicyEstimator(module_PF, env.n_actions, is_backward=False, preprocessor=preprocessor)
pb_estimator = DiscretePolicyEstimator(module_PB, env.n_actions, is_backward=True, preprocessor=preprocessor)
logF_estimator = ScalarEstimator(module=module_logF, preprocessor=env.preprocessor)

# 4 - We define the GFlowNet.
gfn = SubTBGFlowNet(pf=pf_estimator, pb=pb_estimator, logF=logF_estimator, lamda=0.9)

# 5 - We define the sampler and the optimizer.
sampler = Sampler(estimator=pf_estimator)

# Different policy parameters can have their own LR.
# Log F gets dedicated learning rate (typically higher).
optimizer = torch.optim.Adam(gfn.pf_pb_parameters(), lr=1e-3)
optimizer.add_param_group({"params": gfn.logF_parameters(), "lr": 1e-2})
```

--------------------------------

### Train GFlowNet with Trajectory Balance on HyperGrid

Source: https://context7.com/gfnorg/torchgfn/llms.txt

This code demonstrates a complete workflow for training a GFlowNet using Trajectory Balance loss on the HyperGrid environment. It covers environment setup, neural network definition, policy estimator creation, GFlowNet initialization, sampler setup, optimizer configuration, the training loop with sampling and loss computation, and finally, sample generation from the trained model. Dependencies include torch, gfn.gflownet, gfn.gym, gfn.preprocessors, gfn.estimators, gfn.samplers, gfn.utils.modules, and gfn.utils.common.

```python
import torch
from gfn.gflownet import TBGFlowNet
from gfn.gym import HyperGrid
from gfn.preprocessors import KHotPreprocessor
from gfn.estimators import DiscretePolicyEstimator
from gfn.samplers import Sampler
from gfn.utils.modules import MLP
from gfn.utils.common import set_seed

# Set random seed
set_seed(42)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create environment: 4D hypergrid with height 8
env = HyperGrid(
    ndim=4,
    height=8,
    reward_fn_str="original",
    reward_fn_kwargs={"R0": 0.1, "R1": 0.5, "R2": 2.0},
    device=device,
    calculate_partition=True,  # Calculate true log partition function
    store_all_states=True,     # Store all states for validation
    check_action_validity=True
)
print(f"Environment has {env.n_states} states")
print(f"Environment log partition: {env.log_partition}")

# Preprocessor: convert states to k-hot encoding
preprocessor = KHotPreprocessor(height=env.height, ndim=env.ndim)

# Define neural network modules with shared trunk
module_PF = MLP(
    input_dim=preprocessor.output_dim,
    output_dim=env.n_actions,
    hidden_dim=256,
    n_hidden_layers=2
)
module_PB = MLP(
    input_dim=preprocessor.output_dim,
    output_dim=env.n_actions - 1,  # No exit action in backward
    trunk=module_PF.trunk  # Share weights with forward policy
)

# Create policy estimators
pf_estimator = DiscretePolicyEstimator(
    module_PF,
    env.n_actions,
    preprocessor=preprocessor,
    is_backward=False
)
pb_estimator = DiscretePolicyEstimator(
    module_PB,
    env.n_actions,
    preprocessor=preprocessor,
    is_backward=True
)

# Create GFlowNet with Trajectory Balance loss
gflownet = TBGFlowNet(pf=pf_estimator, pb=pb_estimator, init_logZ=0.0)
gflownet = gflownet.to(device)

# Create sampler for trajectory generation
sampler = Sampler(estimator=pf_estimator)

# Setup optimizer with separate learning rates
optimizer = torch.optim.Adam(gflownet.pf_pb_parameters(), lr=1e-3)
optimizer.add_param_group({"params": gflownet.logz_parameters(), "lr": 1e-1})

# Training loop
for iteration in range(1000):
    # Sample trajectories with epsilon-greedy exploration
    trajectories = sampler.sample_trajectories(
        env,
        n=16,  # Batch size
        save_logprobs=True,
        epsilon=0.1  # 10% random actions
    )

    # Compute loss and backpropagate
    optimizer.zero_grad()
    loss = gflownet.loss(env, trajectories, recalculate_all_logprobs=False)
    loss.backward()

    # Gradient clipping and parameter updates
    torch.nn.utils.clip_grad_norm_(gflownet.parameters(), 1.0)
    optimizer.step()

    if iteration % 100 == 0:
        print(f"Iteration {iteration}: Loss = {loss.item():.4f}, "
              f"LogZ = {gflownet.logz.item():.4f}")

# Generate samples from trained model
with torch.no_grad():
    final_trajectories = sampler.sample_trajectories(env, n=1000)
    terminating_states = final_trajectories.terminating_states
    print(f"Generated {len(terminating_states)} unique terminal states")

```

--------------------------------

### Initialize State in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

This function initializes the starting state for trajectories with zero steps and given initial position. It requires PyTorch and an environment object. Inputs are batch size, device, and env; output is initial state tensor. Limitations: Sets step counter to zero, assumes state format.

```python
def initalize_state(batch_size, device, env, randn=False):
    """Trajectory starts at state = (X_0, t=0)."""
    x = torch.zeros((batch_size, 2), device=device)
    x[:, 0] = env.init_value

    return x
```

--------------------------------

### Install torchgfn Core Package (Bash)

Source: https://github.com/gfnorg/torchgfn/blob/master/docs/source/README.md

Installs the latest stable version of the torchgfn package with its core dependencies using pip. This is the primary method for users to get started with the library.

```bash
pip install torchgfn
```

--------------------------------

### Import required libraries and configure device for GFlowNet (Python)

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_graphs.ipynb

Sets up the Python environment by importing Torch, Matplotlib, and GFlowNet utilities. It also initializes a reproducible random seed and specifies the computation device. This snippet is required before building and training the graph model.

```python
import time

import matplotlib.pyplot as plt
import torch
from tensordict import TensorDict
from matplotlib import patches

from gfn.actions import GraphActionType
from gfn.containers import ReplayBuffer
from gfn.estimators import DiscreteGraphPolicyEstimator
from gfn.gflownet.trajectory_balance import TBGFlowNet
from gfn.gym.graph_building import GraphBuildingOnEdges
from gfn.states import GraphStates
from gfn.utils.common import set_seed
from gfn.utils.graphs import get_edge_indices
from gfn.utils.modules import GraphActionGNN


set_seed(7)
device = torch.device('cpu')
```

--------------------------------

### Configure GFlowNet Policies and Environment in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_graphs.ipynb

Initializes the seven-segment display environment, defines forward and backward policy estimators using GraphActionGNN, and sets up the TBGFlowNet for training. This prepares the GFlowNet model for learning to generate valid seven-segment display graphs.

```python
directed = False
n_nodes = 6  # 6 nodes for the seven-segment display
env = SevenSegmentGraphBuilding(
    n_nodes=n_nodes,
    state_evaluator=reward_function,
    directed=directed,
    device=device,
)


pf = DiscreteGraphPolicyEstimator(
    module=GraphActionGNN(
        num_node_classes=env.n_nodes,
        directed=directed,
        num_edge_classes=env.num_edge_classes,
    )
)
pb = DiscreteGraphPolicyEstimator(
    module=GraphActionGNN(
        num_node_classes=env.n_nodes,
        directed=directed,
        is_backward=True,
        num_edge_classes=env.num_edge_classes,
    ),
    is_backward=True,
)

gflownet = TBGFlowNet(pf, pb).to(device)

```

--------------------------------

### Define Custom Discrete Environment in Python

Source: https://context7.com/gfnorg/torchgfn/llms.txt

This Python code defines a custom discrete environment, `CustomGridEnv`, by inheriting from `gfn.env.DiscreteEnv`. It specifies the environment's dynamics, state transitions (step and backward_step), action masks, and reward function. The example usage demonstrates initializing the environment and resetting it to get initial states.

```python
import torch
from gfn.env import DiscreteEnv
from gfn.states import DiscreteStates
from gfn.actions import Actions

class CustomGridEnv(DiscreteEnv):
    """Custom 2D grid environment with custom reward function."""

    def __init__(self, size: int = 10, device: str = "cpu"):
        self.size = size

        # Define initial state (0, 0) and sink state (-1, -1)
        s0 = torch.zeros(2, dtype=torch.long, device=device)
        sf = torch.full((2,), fill_value=-1, dtype=torch.long, device=device)

        # Actions: 0=move right, 1=move up, 2=exit
        n_actions = 3
        state_shape = (2,)  # (x, y) coordinates

        super().__init__(
            n_actions=n_actions,
            s0=s0,
            state_shape=state_shape,
            sf=sf,
            check_action_validity=True
        )

    def step(self, states: DiscreteStates, actions: Actions) -> DiscreteStates:
        """Apply forward actions to states."""
        new_tensor = states.tensor.clone()

        # Action 0: increment x-coordinate
        mask_right = actions.tensor.squeeze(-1) == 0
        new_tensor[mask_right, 0] += 1

        # Action 1: increment y-coordinate
        mask_up = actions.tensor.squeeze(-1) == 1
        new_tensor[mask_up, 1] += 1

        return self.States(new_tensor)

    def backward_step(self, states: DiscreteStates, actions: Actions) -> DiscreteStates:
        """Apply backward actions to states."""
        new_tensor = states.tensor.clone()

        # Action 0: decrement x-coordinate
        mask_left = actions.tensor.squeeze(-1) == 0
        new_tensor[mask_left, 0] -= 1

        # Action 1: decrement y-coordinate
        mask_down = actions.tensor.squeeze(-1) == 1
        new_tensor[mask_down, 1] -= 1

        return self.States(new_tensor)

    def update_masks(self, states: DiscreteStates) -> None:
        """Update action masks based on current states."""
        # Cannot move right if x >= size-1
        states.forward_masks[:, 0] = states.tensor[:, 0] < self.size - 1
        # Cannot move up if y >= size-1
        states.forward_masks[:, 1] = states.tensor[:, 1] < self.size - 1
        # Can always exit (action 2)
        states.forward_masks[:, 2] = True

        # Backward masks (no exit action)
        states.backward_masks[:, 0] = states.tensor[:, 0] > 0  # Can move left
        states.backward_masks[:, 1] = states.tensor[:, 1] > 0  # Can move down

    def reward(self, states: DiscreteStates) -> torch.Tensor:
        """Compute rewards for terminal states."""
        # Example: reward proportional to distance from origin
        distances = torch.sqrt(
            (states.tensor[:, 0].float() ** 2) +
            (states.tensor[:, 1].float() ** 2)
        )
        rewards = torch.exp(-0.1 * (distances - 5.0) ** 2)
        return rewards

# Usage
env = CustomGridEnv(size=10, device="cpu")
initial_states = env.reset(batch_shape=(4,))
```

--------------------------------

### Install torchgfn from Source with All Dependencies (Bash)

Source: https://github.com/gfnorg/torchgfn/blob/master/docs/source/README.md

Installs the latest release of torchgfn directly from the master branch using git clone and pip. It sets up a Conda environment with Python 3.10+ and installs the package with all dependencies, suitable for development or advanced usage.

```bash
git clone https://github.com/GFNOrg/torchgfn.git
conda create -n gfn python=3.10
conda activate gfn
cd torchgfn
pip install -e ".[all]"
```

--------------------------------

### Model Training

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

This snippet shows the call to the training function, passing the necessary arguments to initiate the training process.

```Python
forward_model, backward_model, logZ = train_with_exploration(
    seed,
    batch_size,
    trajectory_length,
    env,
    device,
    init_exploration_noise,
    n_iterations=n_iterations,
)
```

--------------------------------

### Conditional GFlowNet Setup in Python

Source: https://context7.com/gfnorg/torchgfn/llms.txt

This Python code sets up a conditional GFlowNet using `ConditionalHyperGrid`, a custom environment inheriting from `HyperGrid`. It allows the reward function to be interpolated between a uniform reward and an original reward based on provided conditions. The setup includes defining the environment, a preprocessor, and specifying the condition dimension.

```python
import torch
from gfn.estimators import ConditionalDiscretePolicyEstimator
from gfn.gflownet import TBGFlowNet
from gfn.gym import HyperGrid
from gfn.preprocessors import KHotPreprocessor
from gfn.samplers import Sampler
from gfn.utils.modules import MLP

class ConditionalHyperGrid(HyperGrid):
    """HyperGrid with condition-dependent rewards."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.conditions = None
        self._original_reward_fn = self.reward_fn

    def set_conditions(self, conditions: torch.Tensor):
        """Set conditions for reward computation."""
        self.conditions = conditions

    def reward(self, states):
        """Interpolate between uniform and original reward."""
        original_rewards = self._original_reward_fn(states.tensor)

        if self.conditions is None:
            return original_rewards

        # Condition values: 0=uniform, 1=original
        cond = self.conditions.squeeze(-1)
        if cond.shape[0] == 1:
            cond = cond.expand(original_rewards.shape[0])

        # Linear interpolation
        uniform_reward = torch.ones_like(original_rewards)
        rewards = (1 - cond) * uniform_reward + cond * original_rewards
        return rewards

# Setup environment
device = torch.device("cpu")
env = ConditionalHyperGrid(
    ndim=2,
    height=8,
    reward_fn_str="original",
    device=device
)

# Conditional preprocessor combines state and condition
preprocessor = KHotPreprocessor(height=env.height, ndim=env.ndim)
condition_dim = 1
```

--------------------------------

### Sample and Visualize Graph Validity Before Training in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_graphs.ipynb

Generates sample trajectories using an untrained GFlowNet and visualizes the distribution of valid seven-segment display graphs. It calculates and prints the percentage of valid graphs, highlighting the initial performance of the model.

```python
trajectories = gflownet.sample_trajectories(env, n=64)
terminating_states = trajectories.terminating_states
render_states(terminating_states[:8])  # type: ignore

# Distribution of valid digits before training
validity_before = reward_function(terminating_states) == 1.0 # type: ignore
num_valid_before = validity_before.sum().item()
num_total_before = len(terminating_states)

print(f"Before training: {num_valid_before} valid digits out of {num_total_before} samples ({num_valid_before/num_total_before*100:.2f}%)")

# For plotting, we can show a simple bar chart: valid vs. invalid
labels = ['Valid Digits', 'Invalid Graphs']
counts_before = [num_valid_before, num_total_before - num_valid_before]

plt.figure(figsize=(6, 4))
plt.title("Graph Validity - Before training")
plt.bar(labels, counts_before, color=['green', 'red'])
plt.ylabel("Count")
plt.show()

```

--------------------------------

### Get Unique Sets - Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Filters a list of lists to return only unique sets, converting them to sorted tuples. This is useful for de-duplicating configurations.

```python
def get_unique(l: list):

    unique = []
    for i in map(set, l):
      if i not in unique:
        unique.append(i)

    return sorted(map(tuple, unique))
```

--------------------------------

### Initialize and Render Line Environment

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

Sets up a 'LineEnvironment' with specified parameters, including modes (mus), variances, and initial conditions. The environment is then rendered to visualize its configuration.

```python
env = LineEnvironment(
    mus=[-3, 4, 6, 10],
    variances=[0.2, 0.4, 1, 0.2],
    n_sd=4.5,
    init_value=0
)
render(env, tight=False)
```

--------------------------------

### Import Necessary Libraries for GFlowNets

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Imports essential Python libraries for GFlowNets, including PyTorch for neural networks and distributions, Matplotlib for plotting, NumPy for numerical operations, and tqdm for progress bars. These libraries are fundamental for building and training GFN models.

```python
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cm as cm
import random

from torch.distributions.categorical import Categorical
import torch
import torch.nn as nn
from tqdm import tqdm, trange
```

--------------------------------

### Define Facial Features for Drawing

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Defines a dictionary of facial features (smile, frown, eyebrows) as lambda functions that add graphical elements to a Matplotlib plot. These functions allow for the programmatic drawing of expressive faces, used later in the example.

```python
# @title
# These feature globals will be referred to throughout.
_mouth_kwargs = {"closed": False, "fill": False, "lw": 3}
FEATURES = {
    'smile': lambda: plt.gca().add_patch(plt.Polygon(
        np.stack(
            [np.linspace(0.2, 0.8), 0.3 - np.sin(np.linspace(0, 3.14)) * 0.15]
        ).T,
        **_mouth_kwargs
        ) 
    ),
    'frown': lambda: plt.gca().add_patch(plt.Polygon(
        np.stack(
            [np.linspace(0.2, 0.8), 0.15 + np.sin(np.linspace(0, 3.14)) * 0.15]
        ).T,
        **_mouth_kwargs,
        ) 
    ),
    'left_eb_down': lambda: plt.gca().add_line(plt.Line2D(
        [0.15, 0.35], [0.75, 0.7], color=(0, 0, 0))
    ),
    'right_eb_down': lambda: plt.gca().add_line(plt.Line2D(
        [0.65, 0.85], [0.7, 0.75], color=(0, 0, 0))
    ),
    'left_eb_up': lambda: plt.gca().add_line(plt.Line2D(
        [0.15, 0.35], [0.7, 0.75], color=(0, 0, 0))
    ),
    'right_eb_up': lambda: plt.gca().add_line(plt.Line2D(
        [0.65, 0.85], [0.75, 0.7], color=(0, 0, 0))
    ),
}
```

--------------------------------

### Visualize State Space Flows

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Plots the learned edge flows across the state space, showing flow magnitudes at different states and highlighting invalid configurations.

```python
plot_state_space(model=F_sa)
```

--------------------------------

### Train GFlowNet using Trajectory Balance

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Initiates the training process for the GFlowNet using the configured optimizer, environment, and a specified number of episodes. It returns visited states and losses for analysis.

```python
visited_terminating_states, states_visited, losses = train(
    gflownet,
    optimizer,
    env,
    n_episodes=n_episodes * 10,
)
```

--------------------------------

### Create custom policy mixin with diagnostics in PyTorch

Source: https://github.com/gfnorg/torchgfn/blob/master/docs/source/guides/estimator_policy_mixin.md

Advanced example extending PolicyMixin to inject custom diagnostics into the training loop. Overrides compute_dist and log_probs to track call counts and log probability statistics via ctx.extras. Enables debugging and monitoring of policy behavior during sampling.

```python
from typing import Any, Optional
from torch.distributions import Distribution
from gfn.estimators import PolicyMixin

class TracingPolicyMixin(PolicyMixin):
    def compute_dist(self, states_active, ctx, step_mask=None, save_estimator_outputs=False, **kw):
        dist, ctx = super().compute_dist(states_active, ctx, step_mask, save_estimator_outputs, **kw)
        ctx.extras.setdefault("num_compute_calls", 0)
        ctx.extras["num_compute_calls"] += 1
        return dist, ctx

    def log_probs(self, actions_active, dist: Distribution, ctx: Any, step_mask=None, vectorized=False, save_logprobs=False):
        lp, ctx = super().log_probs(actions_active, dist, ctx, step_mask, vectorized, save_logprobs)
        ctx.extras.setdefault("last_lp_mean", lp.mean().detach())
        return lp, ctx
```

--------------------------------

### Sample from Trained GFlowNet and Visualize (Python)

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_graphs.ipynb

This code snippet demonstrates sampling trajectories from a trained GFlowNet and visualizing the resulting graphs. It also calculates and prints the proportion of valid digits generated after training and provides a bar plot comparing the distribution of valid and invalid graphs before and after training. It uses PyTorch for graph structures and Matplotlib for visualization.

```python
trajectories = gflownet.sample_trajectories(env, n=64)
terminating_states = trajectories.terminating_states
render_states(terminating_states[:8])  # type: ignore


# Distribution of valid digits after training
validity_after = reward_function(terminating_states) == 1.0 # type: ignore
num_valid_after = validity_after.sum().item()
num_total_after = len(terminating_states)

print(f"After training: {num_valid_after} valid digits out of {num_total_after} samples ({num_valid_after/num_total_after*100:.2f}%)")

# Plotting comparison
labels = ['Valid Digits', 'Invalid Graphs']
counts_after = [num_valid_after, num_total_after - num_valid_after]
# We need counts_before from the pre-training section. Assuming it's still in scope.
# If not, we might need to re-run that part or store it.
# For now, let's assume `counts_before` is available.

width = 0.35  # the width of the bars
x = torch.arange(len(labels))  # the label locations

fig, ax = plt.subplots(figsize=(8, 5))
rects1 = ax.bar(x - width/2, counts_before, width, label='Before Training', color='salmon')
rects2 = ax.bar(x + width/2, counts_after, width, label='After Training', color='lightgreen')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Count')
ax.set_title('Graph Validity Comparison')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()
plt.show()
```

--------------------------------

### Import libraries for GFlowNets tutorial

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

Imports necessary libraries including matplotlib for plotting, torch for tensor operations, and numpy for numerical computations. Sets the device to CUDA if available.

```python
from matplotlib import pyplot as plt
from torch.distributions import Normal
import math
import numpy as np
import torch
import random
from tqdm import trange

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
```

--------------------------------

### Environment Definition

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

This code defines a LineEnvironment, setting up the environment parameters such as means, variances, and initial value.

```Python
env = LineEnvironment(
    mus=[2, 5],
    variances=[0.2, 0.2],
    n_sd=4.5,
    init_value=0
)
```

--------------------------------

### Initialize Line Environment in Python

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_continuous.ipynb

Initializes a LineEnvironment with specified modes (mus) and variances. This setup is used to create a more challenging distribution for training models. The `n_sd` parameter controls the standard deviation range, and `init_value` sets the initial state.

```python
env = LineEnvironment(mus=[-3, 3], variances=[0.2, 0.2], n_sd=4.5, init_value=0)
render(env)
```

--------------------------------

### Configuration: Set Fixed Hyperparameters

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

Sets and prints fixed hyperparameters for experiments, including the number of hidden units, episodes, learning rate, and random seed. These values are used consistently across runs.

```python
# Fixed hyperparameters.
n_hid_units = 512
n_episodes = 10_000
learning_rate = 3e-3
seed = 42

print("For all experiments, our hyperparameters will be:")
print("    + n_hid_units={}".format(n_hid_units))
print("    + n_episodes={}".format(n_episodes))
print("    + learning_rate={}".format(learning_rate))
print("    + seed={}".format(seed))
```

--------------------------------

### Instantiate and train FlowModel

Source: https://github.com/gfnorg/torchgfn/blob/master/tutorials/notebooks/intro_discrete.ipynb

This code block prepares for training the `FlowModel`. It sets a random seed for reproducibility, instantiates the `FlowModel` with specified hidden units, and initializes an Adam optimizer. It also includes a comment indicating that losses will be accumulated for later processing.

```python
set_seed(seed)

# Instantiate model n_hid_units optimizer
F_sa = FlowModel(n_hid_units)
opt = torch.optim.Adam(F_sa.parameters(), learning_rate)

# To not complicate the code, I'll just accumulate losses here and take a
```