### Install and Use RichProgressBar Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/common/progress_bar.rst This example shows the basic setup for using the RichProgressBar. It includes the necessary pip installation command and the Python code to initialize and pass the RichProgressBar to the Trainer. ```bash pip install rich ``` ```python from lightning.pytorch.callbacks import RichProgressBar trainer = Trainer(callbacks=[RichProgressBar()]) ``` -------------------------------- ### Install Dependencies for MAML Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/examples/fabric/meta_learning/README.md Install the necessary libraries including lightning, learn2learn, and gym to run the MAML meta-learning examples. ```bash pip install lightning learn2learn cherry-rl 'gym<=0.22' ``` -------------------------------- ### Minimal Manual Optimization Example Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/model/manual_optimization.rst Demonstrates the basic setup for manual optimization by setting `automatic_optimization=False` and manually calling optimizer methods like `zero_grad()`, `manual_backward()`, and `step()`. ```python from lightning.pytorch import LightningModule class MyModel(LightningModule): def __init__(self): super().__init__() # Important: This property activates manual optimization. self.automatic_optimization = False def training_step(self, batch, batch_idx): opt = self.optimizers() opt.zero_grad() loss = self.compute_loss(batch) self.manual_backward(loss) opt.step() ``` -------------------------------- ### Setup MNIST Dataset Splits in Python Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/data/datamodule.rst The setup method is responsible for data operations that need to be performed on every GPU, such as splitting datasets or creating datasets. This example demonstrates how to split the MNIST dataset into training and validation sets for the 'fit' stage and prepare the test set for the 'test' stage. ```python import lightning as L from torch.utils.data import random_split from torchvision.datasets import MNIST import torchvision.transforms as transforms class MNISTDataModule(L.LightningDataModule): def setup(self, stage: str): # Assign Train/val split(s) for use in Dataloaders if stage == "fit": mnist_full = MNIST(self.data_dir, train=True, download=True, transform=self.transform) self.mnist_train, self.mnist_val = random_split( mnist_full, [55000, 5000], generator=torch.Generator().manual_seed(42) ) # Assign Test split(s) for use in Dataloaders if stage == "test": self.mnist_test = MNIST(self.data_dir, train=False, download=True, transform=self.transform) ``` -------------------------------- ### Full Example: PyTorch to Fabric Conversion Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/fundamentals/convert.rst This diff shows a complete example of converting a PyTorch script to use Fabric, including initialization, setup, and replacing manual device calls and backward passes. ```diff import torch from lightning.pytorch.demos import WikiText2, Transformer + import lightning as L - device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + fabric = L.Fabric(accelerator="cuda", devices=8, strategy="ddp") + fabric.launch() dataset = WikiText2() dataloader = torch.utils.data.DataLoader(dataset) model = Transformer(vocab_size=dataset.vocab_size) optimizer = torch.optim.SGD(model.parameters(), lr=0.1) - model = model.to(device) + model, optimizer = fabric.setup(model, optimizer) + dataloader = fabric.setup_dataloaders(dataloader) model.train() for epoch in range(20): for batch in dataloader: input, target = batch - input, target = input.to(device), target.to(device) optimizer.zero_grad() output = model(input, target) loss = torch.nn.functional.nll_loss(output, target.view(-1)) - loss.backward() + fabric.backward(loss) optimizer.step() ``` -------------------------------- ### Full FSDP Training Example with Language Model Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/advanced/model_parallel/fsdp.rst A complete PyTorch Lightning example showcasing FSDP training for a large language model. It includes model definition, data loading, optimizer configuration, and trainer setup with FSDP enabled. ```python import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import DataLoader import lightning as L from lightning.pytorch.strategies import FSDPStrategy from lightning.pytorch.demos import Transformer, WikiText2 class LanguageModel(L.LightningModule): def __init__(self, vocab_size): super().__init__() self.model = Transformer( # 1B parameters vocab_size=vocab_size, nlayers=32, nhid=4096, ninp=1024, nhead=64, ) def training_step(self, batch): input, target = batch output = self.model(input, target) loss = F.nll_loss(output, target.view(-1)) self.log("train_loss", loss, prog_bar=True) return loss def configure_optimizers(self): return torch.optim.Adam(self.parameters(), lr=0.1) L.seed_everything(42) # Data dataset = WikiText2() train_dataloader = DataLoader(dataset) # Model model = LanguageModel(vocab_size=dataset.vocab_size) # Trainer trainer = L.Trainer(accelerator="cuda", devices=2, strategy=FSDPStrategy()) trainer.fit(model, train_dataloader) trainer.print(torch.cuda.memory_summary()) ``` -------------------------------- ### Setup Multiple Models and Multiple Optimizers Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/multiple_setup.rst Provides an example of managing independent model-optimizer pairs, such as those required for Generative Adversarial Networks (GANs). ```python # Two models generator = Generator() discriminator = Discriminator() # Two optimizers optimizer_gen = torch.optim.SGD(generator.parameters(), lr=0.01) optimizer_dis = torch.optim.SGD(discriminator.parameters(), lr=0.001) # Set up generator generator, optimizer_gen = fabric.setup(generator, optimizer_gen) # Set up discriminator discriminator, optimizer_dis = fabric.setup(discriminator, optimizer_dis) ``` -------------------------------- ### Install PyTorch 2.3+ Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/examples/pytorch/tensor_parallel/README.md Installs the required version of PyTorch, which is necessary for leveraging advanced features like tensor parallelism and FSDP integration. This is a prerequisite for running the example. ```bash pip install 'torch>=2.3' ``` -------------------------------- ### Full Training Example with 2D Parallelism (Python) Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/model_parallel/tp_fsdp.rst A complete PyTorch Lightning example demonstrating the setup and launch of a training process using 2D parallelism (Tensor Parallelism + FSDP). Requires at least 4 GPUs. ```python import torch import torch.nn as nn import torch.nn.functional as F from torch.distributed.tensor.parallel import ColwiseParallel, RowwiseParallel from torch.distributed.tensor.parallel import parallelize_module from torch.distributed._composable.fsdp.fully_shard import fully_shard import lightning as L ``` -------------------------------- ### Setup Model, Optimizer, and Scheduler with Fabric Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/api/fabric_methods.rst The `setup` method prepares models and optimizers for accelerated training, automatically moving them to the correct device. It can also register learning rate schedulers for compatible strategies. ```python import torch.nn as nn import torch.optim from lightning.fabric import Fabric fabric = Fabric() model = nn.Linear(32, 64) optimizer = torch.optim.SGD(model.parameters(), lr=0.001) # Set up model and optimizer for accelerated training model, optimizer = fabric.setup(model, optimizer) ``` ```python import torch.nn as nn import torch.optim from lightning.fabric import Fabric fabric = Fabric() model = nn.Linear(32, 64) optimizer = torch.optim.SGD(model.parameters(), lr=0.001) # If you don't want Fabric to set the device model, optimizer = fabric.setup(model, optimizer, move_to_device=False) ``` ```python import torch.nn as nn import torch.optim from lightning.fabric import Fabric fabric = Fabric() model = nn.Linear(32, 64) optimizer = torch.optim.SGD(model.parameters(), lr=0.001) scheduler = torch.optim.lr_scheduler.LinearLR(optimizer, start_factor=1.0, end_factor=0.3, total_iters=10) # If you want to additionally register a learning rate scheduler model, optimizer, scheduler = fabric.setup(model, optimizer, scheduler) ``` -------------------------------- ### Install PyTorch Lightning Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/index.rst Use these commands to install the Lightning framework via pip or conda. ```bash pip install lightning ``` ```bash conda install lightning -c conda-forge ``` -------------------------------- ### Interactive SLURM Session (Bash) Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/guide/multi_node/slurm.rst Example of how to start an interactive SLURM session for development and debugging. This involves using 'srun' with specific flags to allocate resources and start a bash shell, allowing manual execution of training scripts. ```bash # make sure to set `--job-name "interactive"` srun --account --pty bash --job-name "interactive" ... # now run scripts normally python train.py ... ``` -------------------------------- ### Setup Multiple Models with One Optimizer Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/multiple_setup.rst Illustrates how to group multiple models under a single nn.Module container to treat them as one entity during the setup process. ```python class AutoEncoder(torch.nn.Module): def __init__(self): super().__init__() self.encoder = Encoder() self.decoder = Decoder() # Instantiate the big model autoencoder = AutoEncoder() optimizer = ... # Set up the model(s) and optimizer together autoencoder, optimizer = fabric.setup(autoencoder, optimizer) ``` -------------------------------- ### Run Tensor Parallel Example Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/examples/pytorch/tensor_parallel/README.md Navigates to the tensor parallelism example directory and executes the training script. This command initiates the distributed training process, applying the configured parallelism strategies. ```bash cd examples/pytorch/tensor_parallel python train.py ``` -------------------------------- ### Setup Models and DataLoaders with Fabric Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/fundamentals/convert.rst Use the setup method to prepare your PyTorch model and optimizer, and setup_dataloaders for your data loaders. Fabric will handle moving them to the appropriate devices. ```python model, optimizer = fabric.setup(model, optimizer) dataloader = fabric.setup_dataloaders(dataloader) ``` -------------------------------- ### Run PyTorch Profiler Example Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/examples/pytorch/basics/README.md Execute the profiler example script to activate the PyTorch Profiler with Lightning. This script demonstrates how to integrate PyTorch Profiler. ```bash python profiler_example.py ``` -------------------------------- ### Install Lightning with Pip Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/index.rst Installs the Lightning library using pip, the standard package installer for Python. This is the recommended method for most users. ```bash pip install lightning ``` -------------------------------- ### Install Lightning via Package Managers Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/starter/installation.rst Standard installation methods for Lightning using pip or Conda. These commands assume a pre-configured Python or Conda environment. ```bash python -m pip install lightning ``` ```bash conda install lightning -c conda-forge ``` -------------------------------- ### Install and Configure Weights and Biases Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/guide/loggers/wandb.rst Commands to install the required wandb package and authenticate the environment using an API key. ```bash pip install wandb wandb login ``` -------------------------------- ### Install and Login to Weights and Biases Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/guide/loggers/wandb.rst Instructions to install the wandb package and log in to your W&B account using your API key. ```APIDOC ## Install Weights and Biases ### Description Install the `wandb` package using pip. ### Method bash ### Endpoint ### Parameters ### Request Example ```bash pip install wandb ``` ## Login to Weights and Biases ### Description Log in to your Weights & Biases account using your API key. ### Method bash ### Endpoint ### Parameters - **** (string) - Required - Your Weights & Biases API key. ### Request Example ```bash wandb login ``` ``` -------------------------------- ### Install PyTorch Lightning Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/README.md Commands to install PyTorch Lightning from the master branch source or from the testing PyPI repository. ```bash pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/master.zip -U pip install -iU https://test.pypi.org/simple/ pytorch-lightning ``` -------------------------------- ### Install Stable Version from Source Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/README.md Installs the latest stable version of PyTorch Lightning directly from the GitHub source repository. ```bash pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/release/stable.zip -U ``` -------------------------------- ### Example Input Array Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/common/lightning_module.rst How to set and use `example_input_array` for generating data or inputs within your module. ```APIDOC ## Example Input Array ### Description `example_input_array` is used to represent a single batch of data, often utilized for tasks like generating outputs or as input to models during specific callbacks or methods. ### Method Assign a tensor or a structure of tensors to `self.example_input_array` in your `__init__` method. ### Endpoint N/A (This is a code pattern within a `LightningModule`) ### Parameters N/A ### Request Body N/A ### Request Example ```python class MyModule(LightningModule): def __init__(self): super().__init__() # Define example_input_array self.example_input_array = torch.randn(32, 100) # Example: batch size 32, feature size 100 self.generator = GeneratorModel() def on_train_epoch_end(self): # Use example_input_array to generate outputs generated_data = self.generator(self.example_input_array) # ... further processing ... ``` ### Response N/A ### Response Example N/A ``` -------------------------------- ### Install PyTorch Lightning (Shell) Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/accelerators/tpu_basic.rst Installs the PyTorch Lightning library, a framework that simplifies training PyTorch models on various hardware accelerators, including TPUs. ```shell !pip install lightning ``` -------------------------------- ### Setup Single Model and Optimizer Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/multiple_setup.rst Demonstrates the standard approach to initializing a model and its corresponding optimizer using the fabric.setup method to ensure strategy compatibility. ```python import torch from lightning.fabric import Fabric fabric = Fabric() # Instantiate model and optimizer model = LitModel() optimizer = torch.optim.Adam(model.parameters()) # Set up the model and optimizer together model, optimizer = fabric.setup(model, optimizer) ``` -------------------------------- ### Use Accelerator.setup_device instead of setup_environment Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/upgrade/sections/1_9_devel.rst The `Accelerator.setup_environment` method is deprecated. Use `Accelerator.setup_device`. ```python Accelerator.setup_environment ``` ```python Accelerator.setup_device ``` -------------------------------- ### Fabric.setup Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/api/fabric_methods.rst Sets up a model and optimizer for accelerated training, moving them to the correct device. ```APIDOC ## setup ### Description Prepares models and optimizers for accelerated training and precision handling. ### Parameters - **model** (nn.Module) - Required - The model to set up. - **optimizer** (Optimizer) - Optional - The optimizer to set up. - **move_to_device** (bool) - Optional - Whether to move the model to the device automatically. ``` -------------------------------- ### Integrate DataModule with Trainer Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/data/datamodule.rst Examples of passing a DataModule to the Trainer for automated training, validation, testing, and prediction, or manual setup for model initialization. ```python # Standard usage dm = MNISTDataModule() model = Model() trainer.fit(model, datamodule=dm) trainer.test(datamodule=dm) # Manual setup for model initialization dm = MNISTDataModule() dm.prepare_data() dm.setup(stage="fit") model = Model(num_classes=dm.num_classes) trainer.fit(model, dm) ``` -------------------------------- ### Initialize Lightning Trainer in Notebooks Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/common/notebooks.rst Demonstrates how to instantiate the Lightning Trainer within an interactive notebook environment using automatic hardware detection. ```python import lightning as L # Works in Jupyter, Colab and Kaggle! trainer = L.Trainer(accelerator="auto", devices="auto") ``` -------------------------------- ### Full FSDP Training Example with Transformer Model Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/model_parallel/fsdp.rst A complete example showcasing FSDP training of a Transformer model on the WikiText2 dataset using PyTorch Lightning Fabric. It includes model and optimizer setup, data loading, training loop, and memory usage reporting. ```python import torch import torch.nn as nn import torch.nn.functional as F import lightning as L from lightning.fabric.strategies import FSDPStrategy from lightning.pytorch.demos import Transformer, WikiText2 fabric = L.Fabric(accelerator="cuda", devices=2, strategy=FSDPStrategy()) fabric.launch() fabric.seed_everything(42) with fabric.rank_zero_first(): dataset = WikiText2() # 1B parameters model = Transformer(vocab_size=dataset.vocab_size, nlayers=32, nhid=4096, ninp=1024, nhead=64) model = fabric.setup(model) optimizer = torch.optim.Adam(model.parameters(), lr=0.1) optimizer = fabric.setup_optimizers(optimizer) for i in range(10): input, target = fabric.to_device(dataset[i]) output = model(input.unsqueeze(0), target.unsqueeze(0)) loss = F.nll_loss(output, target.view(-1)) fabric.backward(loss) optimizer.step() optimizer.zero_grad() fabric.print(loss.item()) fabric.print(torch.cuda.memory_summary()) ``` -------------------------------- ### Start Tensorboard for Profiling Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/tuning/profiler_advanced.rst Launch the Tensorboard server to visualize profiling data. Ensure the log directory is correctly set and the port matches the profiler's configuration for seamless integration. ```bash tensorboard --logdir ./tensorboard --port 9001 ``` -------------------------------- ### Initialize Trainer with Built-in Strategies Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/advanced/strategy_registry.rst Demonstrates how to configure the Trainer using pre-defined strategy strings like 'ddp', 'deepspeed_stage_3_offload', and 'xla_debug' alongside accelerator settings. ```python # Training with the DDP Strategy trainer = Trainer(strategy="ddp", accelerator="gpu", devices=4) # Training with DeepSpeed ZeRO Stage 3 and CPU Offload trainer = Trainer(strategy="deepspeed_stage_3_offload", accelerator="gpu", devices=3) # Training with the TPU Spawn Strategy with `debug` as True trainer = Trainer(strategy="xla_debug", accelerator="tpu", devices=8) ``` -------------------------------- ### Launch Fabric Training (Python) Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/guide/multi_node/slurm.rst Call the launch method on the initialized Fabric object to start the distributed training process. This method handles the communication setup between devices and nodes. ```python fabric = Fabric(...) fabric.launch() ``` -------------------------------- ### Hardware Acceleration Configuration Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/README.md Examples showing how to configure Fabric for different hardware accelerators like CPU, GPU (CUDA, Apple Silicon), and TPU, including multi-GPU and multi-node setups. ```APIDOC ## GET /api/items/{itemId} ### Description Retrieves details for a specific item using its unique ID. ### Method GET ### Endpoint /api/items/{itemId} ### Parameters #### Path Parameters - **itemId** (string) - Required - The unique identifier of the item to retrieve. #### Query Parameters None #### Request Body None ### Request Example None ### Response #### Success Response (200 OK) - **id** (string) - The unique identifier of the item. - **name** (string) - The name of the item. - **description** (string) - A brief description of the item. #### Response Example ```json { "id": "item-abcde", "name": "Example Widget", "description": "A sample widget for demonstration purposes." } ``` ``` -------------------------------- ### Setup Training Components Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/src/lightning_fabric/README.md Configures models, optimizers, and dataloaders for distributed training using Fabric primitives. ```APIDOC ## Fabric Setup ### Description Prepares models, optimizers, and dataloaders for execution on the configured accelerator. ### Method POST ### Parameters #### Request Body - **model** (object) - Required - The PyTorch model to be wrapped. - **optimizer** (object) - Required - The optimizer to be wrapped. - **dataloader** (object) - Required - The dataloader to be prepared. ### Request Example model, optimizer = fabric.setup(model, optimizer) dataloader = fabric.setup_dataloaders(dataloader) ### Response #### Success Response (200) - **model** (object) - The distributed-ready model. - **optimizer** (object) - The distributed-ready optimizer. ``` -------------------------------- ### Configure Trainer for GPU Acceleration Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/advanced/speed.rst Demonstrates how to configure the PyTorch Lightning Trainer to utilize GPUs for accelerated training. It shows examples for single GPU, multiple GPUs with DDP strategy, and multi-node, multi-GPU setups. ```python trainer = Trainer(accelerator="gpu", devices=1) trainer = Trainer(accelerator="gpu", devices=8, strategy="ddp") trainer = Trainer(accelerator="gpu", devices=2, num_nodes=4) ``` -------------------------------- ### Configure Logger with Remote Save Directory (Python) Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/common/remote_fs.rst This example shows how to configure a logger, specifically TensorBoardLogger, to save its logs to a remote filesystem. This requires the appropriate fsspec backend to be installed (e.g., 'fsspec[s3]' for S3). ```python from lightning.pytorch.loggers import TensorBoardLogger from lightning.pytorch import Trainer logger = TensorBoardLogger(save_dir="s3://my_bucket/logs/") trainer = Trainer(logger=logger) trainer.fit(model) ``` -------------------------------- ### Initialize and Train a LightningModule Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/common/lightning_module.rst Demonstrates the basic workflow of instantiating a LightningModule and using the Trainer to fit the model to data. ```python net = MyLightningModuleNet() trainer = Trainer() trainer.fit(net) ``` -------------------------------- ### Initialize Fabric and Load Checkpoint Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst Demonstrates how to configure a Fabric instance for multi-GPU training, set up a model and optimizer, and load a checkpoint state. This approach ensures the model and optimizer are correctly distributed across devices. ```python import lightning as L import torch fabric = L.Fabric(accelerator="cuda", devices=2, strategy=strategy) fabric.launch() with fabric.rank_zero_first(): dataset = WikiText2() model = Transformer(vocab_size=dataset.vocab_size, nlayers=32, nhid=4096, ninp=1024, nhead=64) optimizer = torch.optim.Adam(model.parameters(), lr=0.1) model, optimizer = fabric.setup(model, optimizer) state = {"model": model, "optimizer": optimizer, "iteration": 0} fabric.print("Loading checkpoint ...") fabric.load("my-checkpoint.ckpt", state) ``` -------------------------------- ### Configure Post-localSGD Hook with DDPStrategy in PyTorch Lightning Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/advanced/ddp_optimizations.rst This example shows how to configure PyTorch Lightning's DDPStrategy to utilize the Post-localSGD communication hook. It requires initializing PostLocalSGDState with parameters such as start iteration and subgroup, and setting the model_averaging_period. Both the state and the hook are then passed to the DDPStrategy. ```python import lightning as L from lightning.pytorch.strategies import DDPStrategy from torch.distributed.algorithms.ddp_comm_hooks import post_localSGD_hook as post_localSGD model = MyModel() trainer = L.Trainer( accelerator="gpu", devices=4, strategy=DDPStrategy( ddp_comm_state=post_localSGD.PostLocalSGDState( process_group=None, subgroup=None, start_localSGD_iter=8, ), ddp_comm_hook=post_localSGD.post_localSGD_hook, model_averaging_period=4, ), ) trainer.fit(model) ``` -------------------------------- ### Configure Optimizers and Schedulers Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/model/manual_optimization.rst Demonstrates how to return optimizers and schedulers from the configure_optimizers method and how to step a scheduler based on callback metrics. ```python scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10) return [optimizer], [scheduler] def on_train_epoch_end(self): sch = self.lr_schedulers() sch.step(self.trainer.callback_metrics["loss"]) ``` -------------------------------- ### Configure Fabric for Hardware Acceleration Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/README.md Examples of initializing Lightning Fabric to utilize different hardware accelerators like GPUs (including multi-GPU and multi-node setups) and TPUs. This highlights Fabric's flexibility in adapting to various computational resources without code changes in the training logic. ```python # Use your available hardware # no code changes needed fabric = Fabric() # Run on GPUs (CUDA or MPS) fabric = Fabric(accelerator="gpu") # 8 GPUs fabric = Fabric(accelerator="gpu", devices=8) # 256 GPUs, multi-node fabric = Fabric(accelerator="gpu", devices=8, num_nodes=32) # Run on TPUs fabric = Fabric(accelerator="tpu") ``` -------------------------------- ### Custom Strategy with Custom Accelerator and Plugins Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/extensions/strategy.rst This example illustrates how to initialize a custom strategy along with custom accelerator and precision plugin instances. These custom components are then passed to the CustomDDPStrategy constructor, which is subsequently used to initialize the Trainer. ```python # custom strategy, with new accelerator and plugins accelerator = MyAccelerator() precision_plugin = MyPrecisionPlugin() strategy = CustomDDPStrategy(accelerator=accelerator, precision_plugin=precision_plugin) trainer = Trainer(strategy=strategy) ``` -------------------------------- ### Custom Trainer with Lightning Fabric Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/README.md An example of building a custom trainer class using Lightning Fabric primitives. This custom trainer encapsulates the Fabric initialization and training loop logic, including model and optimizer setup, data loading, and the training steps, allowing for reusable and modular training pipelines. ```python import lightning as L class MyCustomTrainer: def __init__(self, accelerator="auto", strategy="auto", devices="auto", precision="32-true"): self.fabric = L.Fabric(accelerator=accelerator, strategy=strategy, devices=devices, precision=precision) def fit(self, model, optimizer, dataloader, max_epochs): self.fabric.launch() model, optimizer = self.fabric.setup(model, optimizer) dataloader = self.fabric.setup_dataloaders(dataloader) model.train() for epoch in range(max_epochs): for batch in dataloader: input, target = batch optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) self.fabric.backward(loss) optimizer.step() ``` -------------------------------- ### Visualize Training with TensorBoard Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/starter/introduction.rst Provides the command line instruction to launch the TensorBoard server for monitoring training logs. ```bash tensorboard --logdir . ``` -------------------------------- ### Install PyTorch Lightning via Package Managers Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/src/pytorch_lightning/README.md Provides various methods to install the PyTorch Lightning library. Options include standard PyPI installation, Conda, and installing specific versions from source. ```bash pip install pytorch-lightning pip install pytorch-lightning['extra'] conda install pytorch-lightning -c conda-forge pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/release/stable.zip -U pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/master.zip -U pip install -iU https://test.pypi.org/simple/ pytorch-lightning ``` -------------------------------- ### Install DeepSpeed Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/advanced/model_parallel/deepspeed.rst Installs the DeepSpeed library using pip. Ensure your PyTorch CUDA version matches your local CUDA installation. ```bash pip install deepspeed ``` -------------------------------- ### Setting up DataLoaders and Training Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/model/train_model_basic.rst Configuring the MNIST dataset and executing the training process using the Lightning Trainer. ```python dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor()) train_loader = DataLoader(dataset) autoencoder = LitAutoEncoder(Encoder(), Decoder()) trainer = L.Trainer() trainer.fit(model=autoencoder, train_dataloaders=train_loader) ``` -------------------------------- ### Initializing Model Parameters in Half Precision Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/common/precision_intermediate.rst Demonstrates how to initialize model parameters directly on the device and in half-precision (BFloat16 in this case) for faster initialization when using true half-precision training. ```python trainer = Trainer(precision="bf16-true") # init the model directly on the device and with parameters in half-precision with trainer.init_module(): ``` -------------------------------- ### Installation Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/guide/loggers/litlogger.rst Install the litlogger package using pip. ```APIDOC ## Installation ### Description Install the `litlogger` package using pip. ### Method bash ### Endpoint ### Parameters ### Request Example ### Response ``` -------------------------------- ### Install Lightning with Conda Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/index.rst Installs the Lightning library using Conda, a popular package and environment management system. This command installs Lightning from the conda-forge channel. ```bash conda install lightning -c conda-forge ``` -------------------------------- ### Install LightningCLI dependencies Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/cli/lightning_cli_intermediate.rst Commands to install the necessary packages for using LightningCLI functionality. ```bash pip install "lightning[pytorch-extra]" pip install "jsonargparse[signatures]" ``` -------------------------------- ### Speed up model initialization with Fabric Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/model_parallel/fsdp.rst Demonstrates how to use the fabric.init_module context manager to initialize large models directly on the GPU, including the recommended empty_init=True setting for FSDP to delay parameter allocation. ```python # Fast: Creates the model on the GPU directly with fabric.init_module(): model = Transformer(vocab_size=dataset.vocab_size) # Recommended for FSDP: with fabric.init_module(empty_init=True): model = Transformer(vocab_size=dataset.vocab_size) ``` -------------------------------- ### Install PyTorch Lightning Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/examples/pytorch/bug_report/bug_report_model.ipynb Installs the pytorch-lightning package using pip within a Jupyter environment. ```python ! pip install -qU pytorch-lightning ``` -------------------------------- ### Install Scikit-Learn Dependency Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/examples/fabric/kfold_cv/README.md Install the necessary scikit-learn library to enable K-Fold cross-validation splitting functionality. ```bash pip install scikit-learn ``` -------------------------------- ### Configure FSDP sharding strategies Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/model_parallel/fsdp.rst Shows how to instantiate FSDPStrategy with different sharding configurations to balance memory consumption and training speed. ```python strategy = FSDPStrategy( # Default: Shard weights, gradients, optimizer state (1 + 2 + 3) sharding_strategy="FULL_SHARD", # Shard gradients, optimizer state (2 + 3) sharding_strategy="SHARD_GRAD_OP", # Full-shard within a machine, replicate across machines sharding_strategy="HYBRID_SHARD", # Don't shard anything (similar to DDP) sharding_strategy="NO_SHARD", ) fabric = L.Fabric(..., strategy=strategy) ``` -------------------------------- ### Install LitLogger Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/guide/loggers/litlogger.rst Installs the litlogger package using pip. This is the first step to enable experiment tracking. ```bash pip install litlogger ``` -------------------------------- ### Install with Optional Dependencies Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/README.md Installs PyTorch Lightning along with additional optional dependencies for extended functionality. ```bash pip install lightning['extra'] ``` -------------------------------- ### Setup Single Model with Multiple Optimizers Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/advanced/multiple_setup.rst Shows how to configure multiple optimizers for a single model, useful for applying different learning rates to specific layers or components. ```python # Instantiate model and optimizers model = LitModel() optimizer1 = torch.optim.SGD(model.layer1.parameters(), lr=0.003) optimizer2 = torch.optim.SGD(model.layer2.parameters(), lr=0.01) # Set up the model and optimizers together model, optimizer1, optimizer2 = fabric.setup(model, optimizer1, optimizer2) ``` -------------------------------- ### Fabric Initialization and Precision Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-fabric/api/fabric_methods.rst Demonstrates how to initialize modules on a specific device and precision using `fabric.init_module()`, and how to use `fabric.autocast()` for mixed precision operations. ```APIDOC ## Fabric Initialization and Precision ### Description This section covers initializing PyTorch modules with specific device and precision settings, and applying mixed precision to code blocks. ### `fabric.init_module()` #### Description Instantiates a `nn.Module` directly on the target device and with the desired precision, bypassing the default CPU and float32 initialization. For distributed strategies like FSDP or DeepSpeed, it allocates parameters on a meta device first. #### Example ```python from lightning.fabric import Fabric abc fabric = Fabric(accelerator="cuda", precision="16-true") with fabric.init_module(): # Models created here will be on GPU and in float16 model = MyModel() ``` ### `fabric.autocast()` #### Description Allows the precision backend to autocast a block of code. This is optional as Fabric automatically handles precision for the model's forward method after `fabric.setup()`. #### Example ```python model, optimizer = fabric.setup(model, optimizer) # Fabric handles precision automatically for the model output = model(inputs) with fabric.autocast(): # optional loss = loss_function(output, target) fabric.backward(loss) ``` ``` -------------------------------- ### Install Optimized Lightning Apps Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/starter/installation.rst Installs the lightweight lightning-app package, which is optimized for production deployment workflows with fewer dependencies. ```bash pip install lightning-app ``` -------------------------------- ### Combining Multiple Configuration Files in PyTorch Lightning Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/cli/lightning_cli_advanced.rst Illustrates how to use multiple configuration files sequentially with the Lightning CLI. Settings from later files override those from earlier files, allowing for layered configurations. The example shows merging trainer settings. ```bash python main.py fit --config config_1.yaml --config config_2.yaml ``` -------------------------------- ### Install Pandoc for Jupyter Notebooks Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/README.md Command to install pandoc on Ubuntu, which is required for rendering Jupyter Notebooks within the documentation. ```bash sudo apt-get install pandoc ``` -------------------------------- ### PyTorch Lightning Quantization Example (Python) Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/advanced/post_training_quantization.rst Demonstrates integrating Intel Neural Compressor with PyTorch Lightning for model quantization. It includes defining a custom evaluation function compatible with the compressor and applying the quantization process. The quantized model is then saved. ```python from neural_compressor.quantization import fit as fit from neural_compressor.config import PostTrainingQuantConfig def eval_func_for_nc(model_n, trainer_n): setattr(model, "model", model_n) result = trainer_n.validate(model=model, dataloaders=dm.val_dataloader()) return result[0]["accuracy"] def eval_func(model): return eval_func_for_nc(model, trainer) conf = PostTrainingQuantConfig() q_model = fit(model=model.model, conf=conf, calib_dataloader=dm.val_dataloader(), eval_func=eval_func) q_model.save("./saved_model/") ``` -------------------------------- ### Launch TensorBoard from Commandline Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/docs/source-pytorch/visualize/logging_basic.rst This snippet shows the command to launch TensorBoard from the command line to visualize logged metrics. Ensure you have TensorBoard installed and specify the directory where Lightning logs are saved. ```bash tensorboard --logdir=lightning_logs/ ``` -------------------------------- ### Run Transformer Example Source: https://github.com/lightning-ai/pytorch-lightning/blob/master/examples/pytorch/basics/README.md Execute the transformer script for next-word prediction. This example uses a Transformer model on the WikiText2 dataset. ```bash python transformer.py ```