### Initializing GraphNet Installation Options UI in JavaScript
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/installation/quick-start.html
This JavaScript snippet defines the data arrays for PyTorch versions, operating systems, and CUDA capabilities. It then dynamically appends corresponding `div` elements to the HTML, creating interactive selection options for the user.
```JavaScript
var torchList = [ ['torch-2.2.0', 'PyTorch 2.2.*'], ['no_torch', 'w/o PyTorch'], ];
var osList = [ ['linux', 'Linux'], ['macos', 'Mac'], ];
var cudaList = [ ['cu118', '11.8'], ['cu121', '12.1'], ['cpu', 'CPU'], ];
torchList.forEach(x => $("#torch").append(`
${x[1]}
`));
osList.forEach(x => $("#os").append(`${x[1]}
`));
cudaList.forEach(x => $("#cuda").append(`${x[1]}
`));
```
--------------------------------
### Generating GraphNet Installation Commands in JavaScript
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/installation/quick-start.html
This JavaScript function `updateCommand` dynamically generates and displays the appropriate GraphNet installation command based on the user's selections for OS, PyTorch, and CUDA. It handles various combinations, including specific requirements for macOS and CPU-only installations, and suggests optional dependencies like `jammy_flows`.
```JavaScript
function updateCommand() {
var torch = $("#command").attr("torch");
var os = $("#command").attr("os");
var package = $("#command").attr("package");
var cuda = $("#command").attr("cuda");
if (os == "macos" && cuda != "cpu") {
$("#command pre").text('# macOS binaries do not support CUDA');
}
if (cuda != "cpu" && torch == "no_torch") {
$("#command pre").text('# GPU acceleration is not available without PyTorch.');
}
if (os == "linux" && cuda != "cpu" && torch != "no_torch"){
$("#command pre").text(`git clone https://github.com/graphnet-team/graphnet.git\ncd graphnet\n\npip install -r requirements/torch_${$("#command").attr("cuda")}.txt -e .[torch,develop]\n\n#Optionally, install jammy_flows for normalizing flow support:\npip install git+https://github.com/thoglu/jammy_flows.git`);
} else if (os == "linux" && cuda == "cpu" && torch != "no_torch"){
$("#command pre").text(`git clone https://github.com/graphnet-team/graphnet.git\ncd graphnet\n\npip install -r requirements/torch_${$("#command").attr("cuda")}.txt -e .[torch,develop]\n\n#Optionally, install jammy_flows for normalizing flow support:\npip install git+https://github.com/thoglu/jammy_flows.git`);
} else if (os == "linux" && cuda == "cpu" && torch == "no_torch"){
$("#command pre").text(`# Installations without PyTorch are intended for file conversion only\ngit clone https://github.com/graphnet-team/graphnet.git\ncd graphnet\n\npip install -r requirements/torch_${$("#command").attr("cuda")}.txt -e .[develop]\n\n#Optionally, install jammy_flows for normalizing flow support:\npip install git+https://github.com/thoglu/jammy_flows.git`);
}
if (os == "macos" && cuda == "cpu" && torch != "no_torch"){
$("#command pre").text(`git clone https://github.com/graphnet-team/graphnet.git\ncd graphnet\n\npip install -r requirements/torch_macos.txt -e .[torch,develop]\n\n#Optionally, install jammy_flows for normalizing flow support:\npip install git+https://github.com/thoglu/jammy_flows.git`);
}
if (os == "macos" && cuda == "cpu" && torch == "no_torch"){
$("#command pre").text(`# Installations without PyTorch are intended for file conversion only\ngit clone https://github.com/graphnet-team/graphnet.git\ncd graphnet\n\npip install -r requirements/torch_macos.txt -e .[develop]\n\n#Optionally, install jammy_flows for normalizing flow support:\npip install git+https://github.com/thoglu/jammy_flows.git`);
}
}
```
--------------------------------
### Handling UI Selections and Initializing GraphNet Install Options in JavaScript
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/installation/quick-start.html
This JavaScript snippet defines a click event handler for the installation option `div` elements, which updates the selected state and triggers the `updateCommand` function to refresh the displayed installation command. It also includes initial calls to simulate clicks, setting default selections for PyTorch, OS, and CUDA upon page load.
```JavaScript
$(".quick-start .content-column .row div").click(function() {
$(this).parent().children().removeClass("selected");
$(this).addClass("selected");
$("#command").attr($(this).parent().attr("id"), $(this).attr("id"));
updateCommand();
});
$("#torch").children().get(0).click();
$("#linux").click();
$("#pip").click();
$("#cpu").click();
```
--------------------------------
### Example GraphNeT Model Configuration YAML
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This YAML snippet provides an example of a `ModelConfig` file, showcasing how the `architecture` (e.g., `DynEdge`) and `graph_definition` components are structured. It details their respective arguments and class names, illustrating the human-readable and portable format for defining GraphNeT models.
```yaml
arguments:
architecture:
ModelConfig:
arguments:
add_global_variables_after_pooling: false
dynedge_layer_sizes: null
features_subset: null
global_pooling_schemes: [min, max, mean, sum]
nb_inputs: 4
nb_neighbours: 8
post_processing_layer_sizes: null
readout_layer_sizes: null
class_name: DynEdge
graph_definition:
ModelConfig:
arguments:
columns: [0, 1, 2]
```
--------------------------------
### Demonstrating CLI Help and Execution for Data Reading in Python
Source: https://github.com/graphnet-team/graphnet/blob/main/examples/README.md
This snippet illustrates how to interact with GraphNeT example scripts via the command-line interface. It shows how to access the help documentation for the `01_read_dataset.py` script to understand its arguments and then demonstrates a basic execution of the script to read data in SQLite format.
```bash
$ python examples/02_data/01_read_dataset.py --help
(...)
Read a few events from data in an intermediate format.
positional arguments:
{sqlite,parquet}
optional arguments:
-h, --help show this help message and exit
$ python examples/02_data/01_read_dataset.py sqlite
(...)
```
--------------------------------
### Installing Pre-Commit Hooks for GraphNeT Development
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/contribute/contribute.rst
This command installs the pre-commit hooks configured for the GraphNeT project. Once installed, these hooks automatically format code using `black` and `docformatter`, and check for errors and style adherence with `flake8`, `mypy`, and `pydocstyle` every time a change is committed, ensuring consistent code quality and style.
```bash
pre-commit install
```
--------------------------------
### Energy Reconstruction Example with GraphNeT Configuration (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This comprehensive example demonstrates a full energy reconstruction workflow in GraphNeT using configuration files. It covers importing necessary modules, loading model and dataset configurations, building the model, constructing data loaders, training the model, making predictions on a test set, and saving results and the trained model.
```python
# Import(s)
import os
from graphnet.constants import CONFIG_DIR # Local path to graphnet/configs
from graphnet.data.dataloader import DataLoader
from graphnet.models import Model
from graphnet.utilities.config import DatasetConfig, ModelConfig
# Configuration
dataset_config_path = f"{CONFIG_DIR}/datasets/training_example_data_sqlite.yml"
model_config_path = f"{CONFIG_DIR}/models/example_energy_reconstruction_model.yml"
# Build model
model_config = ModelConfig.load(model_config_path)
model = Model.from_config(model_config, trust=True)
# Construct dataloaders
dataset_config = DatasetConfig.load(dataset_config_path)
dataloaders = DataLoader.from_dataset_config(
dataset_config,
batch_size=16,
num_workers=1,
)
# Train model
model.fit(
dataloaders["train"],
dataloaders["validation"],
gpus=[0],
max_epochs=5,
)
# Predict on test set and return as pandas.DataFrame
results = model.predict_as_dataframe(
dataloaders["test"],
additional_attributes=model.target_labels + ["event_no"],
)
# Save predictions and model to file
outdir = "tutorial_output"
os.makedirs(outdir, exist_ok=True)
results.to_csv(f"{outdir}/results.csv")
model.save_state_dict(f"{outdir}/state_dict.pth")
model.save(f"{outdir}/model.pth")
```
--------------------------------
### Example Dataset Configuration File (YAML)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This YAML snippet provides a complete example of a `DatasetConfig` file used in GraphNeT. It defines the data source path, graph definition parameters (like node features and nearest neighbors), pulsemaps, features, truth variables, index column, truth table, and specific train/test/validation selections based on event numbers. This configuration is used to load and process data for training.
```yaml
path: $GRAPHNET/data/examples/sqlite/prometheus/prometheus-events.db
graph_definition:
arguments:
columns: [0, 1, 2]
detector:
arguments: {}
class_name: Prometheus
dtype: null
nb_nearest_neighbours: 8
node_definition:
arguments: {}
class_name: NodesAsPulses
node_feature_names: [sensor_pos_x, sensor_pos_y, sensor_pos_z, t]
class_name: KNNGraph
pulsemaps:
- total
features:
- sensor_pos_x
- sensor_pos_y
- sensor_pos_z
- t
truth:
- injection_energy
- injection_type
- injection_interaction_type
- injection_zenith
- injection_azimuth
- injection_bjorkenx
- injection_bjorkeny
- injection_position_x
- injection_position_y
- injection_position_z
- injection_column_depth
- primary_lepton_1_type
- primary_hadron_1_type
- primary_lepton_1_position_x
- primary_lepton_1_position_y
- primary_lepton_1_position_z
- primary_hadron_1_position_x
- primary_hadron_1_position_y
- primary_hadron_1_position_z
- primary_lepton_1_direction_theta
- primary_lepton_1_direction_phi
- primary_hadron_1_direction_theta
- primary_hadron_1_direction_phi
- primary_lepton_1_energy
- primary_hadron_1_energy
- total_energy
- dummy_pid
index_column: event_no
truth_table: mc_truth
seed: 21
selection:
test: event_no % 5 == 0
validation: event_no % 5 == 1
train: event_no % 5 > 1
```
--------------------------------
### Training and Predicting with GraphNeT for Energy Reconstruction (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This comprehensive example demonstrates the full workflow for training a GraphNeT model for energy reconstruction. It covers loading model and dataset configurations, constructing data loaders, training the model, making predictions on a test set, and saving both the results and the trained model artifacts.
```python
# Import(s)
import os
from graphnet.constants import CONFIG_DIR # Local path to graphnet/configs
from graphnet.data.dataloader import DataLoader
from graphnet.models import Model
from graphnet.utilities.config import DatasetConfig, ModelConfig
# Configuration
dataset_config_path = f"{CONFIG_DIR}/datasets/training_example_data_sqlite.yml"
model_config_path = f"{CONFIG_DIR}/models/example_energy_reconstruction_model.yml"
# Build model
model_config = ModelConfig.load(model_config_path)
model = Model.from_config(model_config, trust=True)
# Construct dataloaders
dataset_config = DatasetConfig.load(dataset_config_path)
dataloaders = DataLoader.from_dataset_config(
dataset_config,
batch_size=16,
num_workers=1,
)
# Train model
model.fit(
dataloaders["train"],
dataloaders["validation"],
gpus=[0],
max_epochs=5,
)
# Predict on test set and return as pandas.DataFrame
results = model.predict_as_dataframe(
dataloaders["test"],
additional_attributes=model.target_labels + ["event_no"],
)
# Save predictions and model to file
outdir = "tutorial_output"
os.makedirs(outdir, exist_ok=True)
results.to_csv(f"{outdir}/results.csv")
model.save_state_dict(f"{outdir}/state_dict.pth")
model.save(f"{outdir}/model.pth")
```
--------------------------------
### Training DynEdge Model Programmatically (Bash)
Source: https://github.com/graphnet-team/graphnet/blob/main/examples/04_training/README.md
This snippet demonstrates how to train a DynEdge GNN model using the GraphNeT library by programmatically constructing the dataset and model, without relying on configuration files. It shows commands for displaying CLI help, initiating energy regression training, and utilizing single or multiple GPUs. This method is recommended for debugging and experimenting with model configurations.
```bash
# Show the CLI
(graphnet) $ python examples/04_training/01_train_dynedge.py --help
# Train energy regression model
(graphnet) $ python examples/04_training/01_train_dynedge.py
# Train using a single GPU
(graphnet) $ python examples/04_training/01_train_dynedge.py --gpus 0
# Train using multiple GPUs
(graphnet) $ python examples/04_training/01_train_dynedge.py --gpus 0 1
```
--------------------------------
### Example Dataset Configuration File in GraphNeT (YAML)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/datasets/datasets.rst
This YAML snippet provides a complete example of a `DatasetConfig` file used in GraphNeT. It defines the path to the input SQLite database, the graph definition (e.g., `KNNGraph` with `Prometheus` detector and `NodesAsPulses` node features), specified pulsemaps, and lists of features and truth variables to be loaded from the dataset. This configuration is crucial for setting up data processing pipelines.
```yaml
path: $GRAPHNET/data/examples/sqlite/prometheus/prometheus-events.db
graph_definition:
arguments:
columns: [0, 1, 2]
detector:
arguments: {}
class_name: Prometheus
dtype: null
nb_nearest_neighbours: 8
node_definition:
arguments: {}
class_name: NodesAsPulses
node_feature_names: [sensor_pos_x, sensor_pos_y, sensor_pos_z, t]
class_name: KNNGraph
pulsemaps:
- total
features:
- sensor_pos_x
- sensor_pos_y
- sensor_pos_z
- t
truth:
- injection_energy
- injection_type
- injection_interaction_type
- injection_zenith
- injection_azimuth
- injection_bjorkenx
- injection_bjorkeny
- injection_position_x
- injection_position_y
- injection_position_z
- injection_column_depth
- primary_lepton_1_type
- primary_hadron_1_type
- primary_lepton_1_position_x
- primary_lepton_1_position_y
- primary_lepton_1_position_z
- primary_hadron_1_position_x
- primary_hadron_1_position_y
- primary_hadron_1_position_z
- primary_lepton_1_direction_theta
- primary_lepton_1_direction_phi
- primary_hadron_1_direction_theta
- primary_hadron_1_direction_phi
- primary_lepton_1_energy
- primary_hadron_1_energy
```
--------------------------------
### Training DynEdge Model from Configuration Files (Bash)
Source: https://github.com/graphnet-team/graphnet/blob/main/examples/04_training/README.md
This snippet illustrates how to train a DynEdge GNN model using GraphNeT with configuration files for dataset loading and model definition. It provides commands for displaying CLI help and training models for energy, vertex position, and direction reconstruction, including handling 'kappa' values for uncertainty. This approach is recommended for standard model configurations due to its readability and shareability.
```bash
# Show the CLI
(graphnet) $ python examples/04_training/03_train_dynedge_from_config.py --help
# Train energy regression model
(graphnet) $ python examples/04_training/03_train_dynedge_from_config.py
# Same as above, as this is the default model config.
(graphnet) $ python examples/04_training/03_train_dynedge_from_config.py \
--model-config configs/models/example_energy_reconstruction_model.yml
# Train a vertex position reconstruction model
(graphnet) $ python examples/04_training/03_train_dynedge_from_config.py \
--model-config configs/models/example_vertex_position_reconstruction_model.yml
# Trains a direction (zenith, azimuth) reconstruction model. Note that the
# chosen `Task` in the model config file also returns estimated "kappa" values,
# i.e. inverse variance, for each predicted feature, meaning that we need to
# manually specify the names of these.
(graphnet) $ python examples/04_training/03_train_dynedge_from_config.py --gpus 0 \
--model-config configs/models/example_direction_reconstruction_model.yml \
--prediction-names zenith_pred zenith_kappa_pred azimuth_pred azimuth_kappa_pred
```
--------------------------------
### Installing GraphNeT in IceCube CVMFS Environment (Bash)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/installation/install.rst
This snippet provides a Bash script to install GraphNeT within an IceCube CVMFS environment. It first clones the GraphNeT repository, then sets up the CVMFS Python runtime with IceTray, updates pip, and finally installs GraphNeT with its dependencies as a user.
```bash
# Download GraphNeT
git clone https://github.com/graphnet-team/graphnet.git
cd graphnet
# Open your favorite CVMFS distribution
eval `/cvmfs/icecube.opensciencegrid.org/py3-v4.2.1/setup.sh`
/cvmfs/icecube.opensciencegrid.org/py3-v4.2.1/RHEL_7_x86_64/metaprojects/icetray/v1.5.1/env-shell.sh
# Update central utils
pip install --upgrade pip>=20
pip install wheel setuptools==59.5.0
# Install graphnet into the CVMFS as a user
pip install --user -r requirements/torch_cpu.txt -e .[torch,develop]
```
--------------------------------
### Specifying PyTorch CPU Wheel Source (Shell)
Source: https://github.com/graphnet-team/graphnet/blob/main/requirements/torch_macos.txt
This snippet provides a `--find-links` argument for `pip` to locate PyTorch CPU wheel files from the official PyTorch download server. This is typically used in `requirements.txt` or directly with `pip install` to ensure specific CPU-only versions are installed.
```Shell
--find-links https://download.pytorch.org/whl/cpu
```
--------------------------------
### Defining a StandardModel for Zenith Reconstruction in GraphNeT (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to import and configure various GraphNeT components (detector, graph, GNN, task, loss function) to construct a `StandardModel`. The example specifically builds a model for zenith angle reconstruction with uncertainties, utilizing `KNNGraph` for data representation, `DynEdge` as the GNN backbone, and `ZenithReconstructionWithKappa` for the physics task, along with `VonMisesFisher2DLoss` for training.
```python
# Choice of graph representation, GNN architecture, and physics task
from graphnet.models.detector.prometheus import Prometheus
from graphnet.models.graphs import KNNGraph
from graphnet.models.graphs.nodes import NodesAsPulses
from graphnet.models.gnn.dynedge import DynEdge
from graphnet.models.task.reconstruction import ZenithReconstructionWithKappa
# Choice of loss function and Model class
from graphnet.training.loss_functions import VonMisesFisher2DLoss
from graphnet.models import StandardModel
# Configuring the components
# Represents the data as a point-cloud graph where each
# node represents a pulse of Cherenkov radiation
# edges drawn to the 8 nearest neighbours
graph_definition = KNNGraph(
detector=Prometheus(),
node_definition=NodesAsPulses(),
nb_nearest_neighbours=8,
)
backbone = DynEdge(
nb_inputs=detector.nb_outputs,
global_pooling_schemes=["min", "max", "mean"],
)
task = ZenithReconstructionWithKappa(
hidden_size=backbone.nb_outputs,
target_labels="injection_zenith",
loss_function=VonMisesFisher2DLoss(),
)
# Construct the Model
model = StandardModel(
graph_definition=graph_definition,
backbone=backbone,
tasks=[task],
)
```
--------------------------------
### Specifying PyG CPU Wheel Source (Shell)
Source: https://github.com/graphnet-team/graphnet/blob/main/requirements/torch_macos.txt
This snippet provides a `--find-links` argument for `pip` to locate PyTorch Geometric (PyG) CPU wheel files from the PyG data server. This ensures that `pip` can find and install the correct CPU-compatible version of PyG, specifically for Torch 2.2.0.
```Shell
--find-links https://data.pyg.org/whl/torch-2.2.0+cpu.html
```
--------------------------------
### Example GraphNeT Model Configuration (YAML)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This YAML snippet provides a detailed example of a `ModelConfig` for an energy reconstruction model in GraphNeT. It defines the model's architecture (DynEdge), graph definition (KNNGraph with Prometheus detector and NodesAsPulses node definition), optimizer (Adam), scheduler (PiecewiseLinearLR), and a specific task (EnergyReconstruction with LogCoshLoss). This configuration can be used to instantiate a complex model programmatically.
```yaml
arguments:
architecture:
ModelConfig:
arguments:
add_global_variables_after_pooling: false
dynedge_layer_sizes: null
features_subset: null
global_pooling_schemes: [min, max, mean, sum]
nb_inputs: 4
nb_neighbours: 8
post_processing_layer_sizes: null
readout_layer_sizes: null
class_name: DynEdge
graph_definition:
ModelConfig:
arguments:
columns: [0, 1, 2]
detector:
ModelConfig:
arguments: {}
class_name: Prometheus
dtype: null
nb_nearest_neighbours: 8
node_definition:
ModelConfig:
arguments: {}
class_name: NodesAsPulses
node_feature_names: [sensor_pos_x, sensor_pos_y, sensor_pos_z, t]
class_name: KNNGraph
optimizer_class: '!class torch.optim.adam Adam'
optimizer_kwargs: {eps: 0.001, lr: 0.001}
scheduler_class: '!class graphnet.training.callbacks PiecewiseLinearLR'
scheduler_config: {interval: step}
scheduler_kwargs:
factors: [0.01, 1, 0.01]
milestones: [0, 20.0, 80]
tasks:
- ModelConfig:
arguments:
hidden_size: 128
loss_function:
ModelConfig:
arguments: {}
class_name: LogCoshLoss
loss_weight: null
prediction_labels: null
target_labels: total_energy
transform_inference: '!lambda x: torch.pow(10,x)'
transform_prediction_and_target: '!lambda x: torch.log10(x)'
transform_support: null
transform_target: null
class_name: EnergyReconstruction
class_name: StandardModel
```
--------------------------------
### Converting Data to GraphNeT Backend with DataConverter (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/integration/integration.rst
This example demonstrates how to instantiate and use GraphNeT's `DataConverter` to process raw experimental data. It configures the converter with a custom `MyReader`, `ParquetWriter`, and `MyExtractor` instances, then runs the conversion process and optionally merges the output files.
```python
from graphnet.data.extractors.myexperiment import MyExtractor
from graphnet.data.dataconverter import DataConverter
from graphnet.data.readers import MyReader
from graphnet.data.writers import ParquetWriter
# Your settings
dir_with_files = '/home/my_files'
outdir = '/home/my_outdir'
num_workers = 5
# Instantiate DataConverter - exports data from MyExperiment to Parquet
converter = DataConverter(file_reader = MyReader(),
save_method = ParquetWriter(),
extractors=[MyExtractor('hits'), MyExtractor('truth')],
outdir=outdir,
num_workers=num_workers,
)
# Run Converter
converter(input_dir = dir_with_files)
# Merge files (Optional)
converter.merge_files()
```
--------------------------------
### Training GraphNeT StandardModel using `model.fit` (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This example illustrates the simplified training syntax for GraphNeT models inheriting from `StandardModel`. The `model.fit` method provides an `sklearn`-like interface for training, abstracting away much of the boilerplate code typically required for PyTorch-Lightning based training loops. It accepts a `train_dataloader` and training parameters like `max_epochs`.
```python
model = Model(...)
train_dataloader = DataLoader(...)
model.fit(train_dataloader=train_dataloader, max_epochs=10)
```
--------------------------------
### Configuring PyTorch and PyG CPU Wheel Find Links
Source: https://github.com/graphnet-team/graphnet/blob/main/requirements/torch_cpu.txt
This snippet provides `find-links` arguments, typically used with `pip install -r requirements.txt` or directly on the command line, to specify alternative locations for package wheels. It points to CPU-specific builds of PyTorch and PyTorch Geometric, ensuring compatibility for non-GPU environments. These links are crucial for resolving dependencies when standard PyPI packages are not suitable or available.
```Configuration
--find-links https://download.pytorch.org/whl/cpu
--find-links https://data.pyg.org/whl/torch-2.2.0+cpu.html
```
--------------------------------
### Configuring DataConverter for LiquidO H5 to Parquet Conversion (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/data_conversion/data_conversion.rst
This snippet demonstrates how to configure and use GraphNeT's `DataConverter` to convert `.h5` files from the LiquidO experiment into `.parquet` format. It initializes the converter with a `LiquidOReader`, `ParquetWriter`, and specific extractors, then executes the conversion and optionally merges the output files. This setup enables parallel processing of files.
```python
from graphnet.data.extractors.liquido import H5HitExtractor, H5TruthExtractor
from graphnet.data.dataconverter import DataConverter
from graphnet.data.readers import LiquidOReader
from graphnet.data.writers import ParquetWriter
# Your settings
dir_with_files = '/home/my_files'
outdir = '/home/my_outdir'
num_workers = 5
# Instantiate DataConverter - exports data from LiquidO to Parquet
converter = DataConverter(file_reader = LiquidOReader(),
save_method = ParquetWriter(),
extractors=[H5HitExtractor(), H5TruthExtractor()],
outdir=outdir,
num_workers=num_workers,
)
# Run Converter
converter(input_dir = dir_with_files)
# Merge files (Optional)
converter.merge_files()
```
--------------------------------
### Utilizing GraphNeT's Logger Class for Custom Messages (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to instantiate and use GraphNeT's `Logger` class to output various types of messages to the terminal and log files. It shows examples of `info`, `warning`, `warning_once`, `debug`, `error`, and `critical` logging levels, providing flexibility for custom logging within GraphNeT applications.
```python
from graphnet.utilities.logging import Logger
logger = Logger()
logger.info("My very informative message")
logger.warning("My warning shown every time")
logger.warning_once("My warning shown once")
logger.debug("My debug call")
logger.error("My error")
logger.critical("My critical call")
```
--------------------------------
### Combining Multiple GraphNeT Datasets with EnsembleDataset (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/datasets/datasets.rst
This example illustrates the use of `EnsembleDataset` to merge various GraphNeT `Dataset` instances, such as `SQLiteDataset` and `ParquetDataset`, into a single, cohesive dataset. This allows for unified data access across different storage formats.
```python
from graphnet.data import EnsembleDataset
from graphnet.data.parquet import ParquetDataset
from graphnet.data.sqlite import SQLiteDataset
dataset_1 = SQLiteDataset(...)
dataset_2 = SQLiteDataset(...)
dataset_3 = ParquetDataset(...)
ensemble_dataset = EnsembleDataset([dataset_1, dataset_2, dataset_3])
```
--------------------------------
### Specifying GPU Dependencies for Python Packages
Source: https://github.com/graphnet-team/graphnet/blob/main/requirements/torch_cu118.txt
This snippet lists the necessary Python packages and their versions for GPU-enabled installations, specifically for PyTorch and torchvision. It also includes --find-links to custom wheel repositories to ensure compatibility with the specified CUDA version (cu118).
```Python Requirements
--find-links https://download.pytorch.org/whl/torch_stable.html
torch==2.2.0+cu118
torchvision==0.17.0+cu118
--find-links https://data.pyg.org/whl/torch-2.2.0+cu118.html
```
--------------------------------
### Training GraphNeT Models with PyTorch-Lightning (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet shows how to train GraphNeT `Model`s using PyTorch-Lightning, leveraging its `Trainer` class for more granular control over the training loop. It configures the `Trainer` with parameters like GPU usage, max epochs, callbacks (e.g., `ProgressBar`), and logging, then initiates training by calling `trainer.fit`.
```python
from pytorch_lightning import Trainer
from graphnet.training.callbacks import ProgressBar
model = Model(...)
train_dataloader = DataLoader(...)
# Configure Trainer
trainer = Trainer(
gpus=None,
max_epochs=10,
callbacks=[ProgressBar()],
log_every_n_steps=1,
logger=None,
strategy="ddp",
)
# Train model
trainer.fit(model, train_dataloader)
```
--------------------------------
### Loading GraphNeT Model from Configuration and State Dictionary (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet shows how to reconstruct a GraphNeT `Model` by first loading its definition from a `ModelConfig` YAML file, which initializes the model with random weights. Subsequently, the trained weights are loaded from a `.pth` file using `load_state_dict`, restoring the complete trained model.
```python
from graphnet.models import Model
from graphnet.utilities.config import ModelConfig
model_config = ModelConfig.load("model.yml")
model = Model.from_config(model_config) # With randomly initialised weights.
model.load_state_dict("state_dict.pth") # Now with trained weight.
```
--------------------------------
### Logging Configuration to Weights & Biases (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet illustrates how to log various configuration objects (training, model, and dataset configurations) to the Weights & Biases experiment run. By updating `wandb_logger.experiment.config`, these configurations are saved as artifacts, enhancing reproducibility and transparency of experiments.
```python
wandb_logger.experiment.config.update(training_config)
wandb_logger.experiment.config.update(model_config.as_dict())
wandb_logger.experiment.config.update(dataset_config.as_dict())
```
--------------------------------
### Loading Multiple Datasets from Config in Python
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet shows how to load datasets previously defined and saved in a `dataset.yml` configuration file. The `Dataset.from_config` method returns a dictionary where keys correspond to the named selections (e.g., 'train', 'test') and values are the respective `Dataset` objects, facilitating access to the pre-defined splits.
```python
datasets = Dataset.from_config("dataset.yml")
>>> datasets
{"train": Dataset(...),
"test": Dataset(...),}
```
--------------------------------
### Training GraphNeT Models with Built-in Fit Method (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates the simplified training process for GraphNeT `Model`s using the in-built `fit` method, similar to `sklearn`. It requires an initialized `Model` instance and a `DataLoader` for training data, and trains the model for a specified number of epochs.
```python
model = Model(...)
train_dataloader = DataLoader(...)
model.fit(train_dataloader=train_dataloader, max_epochs=10)
```
--------------------------------
### Defining Multiple Datasets with Selections in Python
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to define multiple datasets (e.g., 'train' and 'test') from the same data source using a single `DatasetConfig` file. It assigns different selection criteria to each dataset based on event numbers, then dumps the configuration to a YAML file. This allows for easy recreation of these specific dataset splits.
```python
dataset = Dataset(...)
dataset.config.selection = {
"train": "event_no % 2 == 0",
"test": "event_no % 2 == 1",
}
dataset.config.dump("dataset.yml")
```
--------------------------------
### Loading GraphNeT Model from PyTorch-Lightning Checkpoint (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet illustrates how to load a GraphNeT `Model` using PyTorch-Lightning's `load_from_checkpoint` method. It first builds the model structure from a `ModelConfig` and then directly loads the entire model state, including trained weights, from a `.ckpt` checkpoint file.
```python
model_config = ModelConfig.load("model.yml")
model = Model.from_config(model_config) # With randomly initialised weights.
model.load_from_checkpoint("checkpoint.ckpt") # Now with trained weight.
```
--------------------------------
### Build Docker Image for GraphNet Benchmarking (Bash)
Source: https://github.com/graphnet-team/graphnet/blob/main/docker/NOTES.md
This command builds a Docker image named 'graphnet-benchmarking-image' using the Dockerfile located in the 'benchmarking/' directory. It tags the image for easy reference and future use.
```bash
$ docker build -f benchmarking/dockerfile -t graphnet-benchmarking-image benchmarking/
```
--------------------------------
### Recreating GraphNeT Dataset from YAML Configuration
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet shows how to recreate a GraphNeT `Dataset` instance from a previously exported YAML configuration file using `Dataset.from_config()`. This method ensures that the dataset is loaded with the exact same settings, promoting reproducibility across different sessions or environments.
```python
from graphnet.data.dataset import Dataset
dataset = Dataset.from_config("dataset.yml")
```
--------------------------------
### Integrating Weights & Biases for Experiment Tracking (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to integrate Weights & Biases (W&B) for experiment tracking with GraphNeT models. It initializes a `WandbLogger` instance, specifying project, entity, and save directory, and then passes this logger to the `model.fit` method to enable automatic logging of training metrics and model artifacts to W&B.
```python
import os
from pytorch_lightning.loggers import WandbLogger
# Create wandb directory
wandb_dir = "./wandb/"
os.makedirs(wandb_dir, exist_ok=True)
# Initialise Weights & Biases (W&B) run
wandb_logger = WandbLogger(
project="example-script",
entity="graphnet-team",
save_dir=wandb_dir,
log_model=True,
)
# Fit Model
model = Model(...)
model.fit(
...,
logger=wandb_logger,
)
```
--------------------------------
### Exporting GraphNeT Dataset Configuration to YAML
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to export the configuration of a GraphNeT `Dataset` instance to a YAML file using `dataset.config.dump()`. This file captures details like input data paths, loaded tables/columns, and applied selections, enabling reproducible dataset creation.
```python
dataset = Dataset(...)
dataset.config.dump("dataset.yml")
```
--------------------------------
### Loading GraphNeT Model from YAML Configuration (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to reconstruct a GraphNeT `Model` architecture from a previously saved YAML configuration file. The `trust=True` argument is crucial as it allows for dynamically loading classes referenced within the configuration file, enabling the recreation of complex model structures.
```python
from graphnet.models import Model
# Indicate that you `trust` the config file after inspecting it, to allow for
# dynamically loading classes references in the file.
model = Model.from_config("model.yml", trust=True)
```
--------------------------------
### Loading GraphNeT Model with Built-in Load Method (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to load a previously saved GraphNeT model using the `Model.load` classmethod. It reconstructs the entire model object from the specified file path, making it ready for inference or further training.
```python
from graphnet.models import Model
loaded_model = Model.load("model.pth")
```
--------------------------------
### Saving GraphNeT Model Configuration and State Dictionary (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to save the model's configuration to a YAML file and its trained weights (state dictionary) to a PyTorch `.pth` file. This method is recommended for version-proof model persistence, separating the model definition from its learned parameters.
```python
model.save_config('model.yml')
model.save_state_dict('state_dict.pth')
```
--------------------------------
### Run GraphNet Benchmarking Docker Container with Mounted Data (Bash)
Source: https://github.com/graphnet-team/graphnet/blob/main/docker/NOTES.md
This command runs the 'graphnet-benchmarking-image' Docker container, mounting a local 'inference_data/' directory to '/data/' inside the container. It executes the 'apply.py' script within the container, processing input from '/data/input' and saving output to '/data/output'. The script assumes the mounted directory has an 'input/' directory and will create an 'output/' directory for results.
```bash
$ docker run --rm -it --mount type=bind,source=inference_data/,target=/data/ --name graphnet-benchmarking-container graphnet-benchmarking-image 'python apply.py /data/input /data/output graphnet_zenith 50'
```
--------------------------------
### Initializing Weights & Biases Logger and Fitting Model (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This snippet demonstrates how to set up `WandbLogger` for experiment tracking in GraphNeT. It initializes a W&B run, creates a dedicated directory for logs, and then integrates the logger when fitting a `Model` instance. This enables automatic logging of training and validation metrics.
```python
import os
from pytorch_lightning.loggers import WandbLogger
# Create wandb directory
wandb_dir = "./wandb/"
os.makedirs(wandb_dir, exist_ok=True)
# Initialise Weights & Biases (W&B) run
wandb_logger = WandbLogger(
project="example-script",
entity="graphnet-team",
save_dir=wandb_dir,
log_model=True,
)
# Fit Model
model = Model(...)
model.fit(
...,
logger=wandb_logger,
)
```
--------------------------------
### Loading Multiple Datasets from Config in GraphNeT (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/datasets/datasets.rst
This snippet demonstrates how loading a `DatasetConfig` with multiple selections results in a dictionary of `Dataset` objects. When `Dataset.from_config()` is called on a configuration file containing named selections, it returns a dictionary where keys are selection names and values are the corresponding `Dataset` instances, allowing easy access to different data subsets.
```python
datasets = Dataset.from_config("dataset.yml")
>>> datasets
{"train": Dataset(...),
"test": Dataset(...),}
```
--------------------------------
### Loading Dataset from Configuration in GraphNeT (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/datasets/datasets.rst
This snippet shows how to recreate a `Dataset` object from a previously saved `DatasetConfig` YAML file. The `Dataset.from_config()` static method reads the configuration from `dataset.yml`, ensuring that the dataset is initialized with the exact same settings as when it was exported.
```python
from graphnet.data.dataset import Dataset
dataset = Dataset.from_config("dataset.yml")
```
--------------------------------
### Saving GraphNeT Model Configuration to YAML (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet illustrates how to save the architectural configuration of a GraphNeT `Model` instance to a YAML file. This allows for the model's definition to be stored and recreated in different sessions, ensuring reproducibility of the model's structure without its trained weights.
```python
model = Model(...)
model.save_config("model.yml")
```
--------------------------------
### Logging Configuration Files with Weights & Biases (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This snippet shows how to log various configuration objects (`training_config`, `model_config`, `dataset_config`) to Weights & Biases using the `wandb_logger.experiment.config.update()` method. This practice significantly improves reproducibility and transparency by saving critical experiment parameters.
```python
wandb_logger.experiment.config.update(training_config)
wandb_logger.experiment.config.update(model_config.as_dict())
wandb_logger.experiment.config.update(dataset_config.as_dict())
```
--------------------------------
### Loading GraphNeT Model from Configuration and State Dictionary in Python
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This snippet demonstrates how to reconstruct a GraphNeT model from a saved `ModelConfig` file and then load its trained weights from a `state_dict`. This two-step process ensures that the model's definition is loaded first, followed by its specific parameters, offering robust versioning.
```python
from graphnet.models import Model
from graphnet.utilities.config import ModelConfig
model_config = ModelConfig.load("model.yml")
model = Model.from_config(model_config) # With randomly initialised weights.
model.load_state_dict("state_dict.pth") # Now with trained weight.
```
--------------------------------
### Referencing External Selection Files in Python
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet shows how to define dataset selections by referencing external CSV or JSON files. This approach allows for managing complex or large selection criteria outside the main configuration file, promoting reusability and easier updates of selection logic.
```python
dataset.config.selection = {
"train": "50000 random events ~ train_selection.csv",
"test": "test_selection.csv",
}
```
--------------------------------
### Saving GraphNeT Model with Built-in Save Method (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet shows how to save an entire GraphNeT model, including its `state_dict`, using the convenient `model.save` method. The model is serialized to the specified file path, allowing for easy persistence and later retrieval.
```python
model.save("model.pth")
```
--------------------------------
### Combining Multiple GraphNeT Datasets with EnsembleDataset
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet illustrates how to combine multiple GraphNeT `Dataset` instances (e.g., `SQLiteDataset`, `ParquetDataset`) into a single `EnsembleDataset`. This class allows for seamless aggregation and iteration over data from diverse sources, treating them as a unified dataset.
```python
from graphnet.data import EnsembleDataset
from graphnet.data.parquet import ParquetDataset
from graphnet.data.sqlite import SQLiteDataset
dataset_1 = SQLiteDataset(...)
dataset_2 = SQLiteDataset(...)
dataset_3 = ParquetDataset(...)
ensemble_dataset = EnsembleDataset([dataset_1, dataset_2, dataset_3])
```
--------------------------------
### Defining a Basic PyTorch `nn.Module` (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This snippet presents a fundamental PyTorch neural network module, `MyModel`, inheriting from `torch.nn.Module`. It initializes a simple linear layer in its constructor and defines a `forward` method that applies this layer to an input tensor, demonstrating the basic structure of a PyTorch model.
```python
import torch
class MyModel(torch.nn.Module):
def __init__(self,
input_dim : int = 5,
output_dim : int = 10):
super().__init__()
self._layer = torch.nn.Linear(input_dim, output_dim)
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self._layer(x)
```
--------------------------------
### Loading GraphNeT Model from PyTorch-Lightning Checkpoint in Python
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/models/models.rst
This snippet shows how to load a GraphNeT model using PyTorch-Lightning's `load_from_checkpoint` method. This approach leverages Lightning's built-in checkpointing capabilities to restore a model, including its trained weights, from a `.ckpt` file, often used for resuming training or inference.
```python
model_config = ModelConfig.load("model.yml")
model = Model.from_config(model_config) # With randomly initialised weights.
model.load_from_checkpoint("checkpoint.ckpt") # Now with trained weight.
```
--------------------------------
### Implementing MyReader for Pickle Files in GraphNeT (Python)
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/integration/integration.rst
This Python class, MyReader, extends GraphNeTFileReader to handle data stored in .pickle files from 'MyExperiment'. It defines accepted file extensions and extractors, and its __call__ method opens a specified pickle file, loads its content, and applies registered Extractor instances to process the data, returning a dictionary of extracted pandas DataFrames.
```python
from typing import List, Union, Dict
import pandas as pd
import pickle
# Import the generic file reader
from .graphnet_file_reader import GraphNeTFileReader
# Import your own extractor
from graphnet.data.extractors.myexperiment import MyExtractor
class MyReader(GraphNeTFileReader):
"""A class for reading my pickle files from MyExperiment."""
_accepted_file_extensions = [".pickle"]
_accepted_extractors = [MyExtractor]
def __call__(self, file_path: str) -> Dict[str, pd.DataFrame]:
"""Extract data from single pickle file.
Args:
file_path: Path to pickle file.
Returns:
Extracted data.
"""
# Open file
file = open(file_path,'r')
data = pickle.load(file)
# Apply extractors
outputs = {}
for extractor in self._extractors:
output = extractor(data)
if output is not None:
outputs[extractor._extractor_name] = output
return outputs
```
--------------------------------
### Selecting Random Subsets of Data in Python
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet demonstrates how to select a random subset of events from a dataset using the `DatasetConfig`. The `N random events ~ ` syntax allows specifying a fixed number of random events that also satisfy a given condition, useful for creating smaller, representative datasets for testing or development.
```python
dataset = Dataset(..)
dataset.config.selection = "1000 random events ~ abs(injection_type) == 14"
```
--------------------------------
### Adding Custom Labels to a GraphNeT Dataset
Source: https://github.com/graphnet-team/graphnet/blob/main/docs/source/getting_started/getting_started.md
This snippet shows how to integrate a previously defined custom label (e.g., `MyCustomLabel`) into a GraphNeT `Dataset` instance using the `add_label` method. After adding, the custom label can be accessed like any other feature from a graph object retrieved from the dataset.
```python
dataset.add_label(MyCustomLabel())
graph = dataset[0]
graph["my_custom_label"]
>>> ...
```