### Fast Setup: ViT-L/16 Pretraining on ImageNet-1k

Source: https://context7.com/facebookresearch/dinov3/llms.txt

This command initiates a fast setup for ViT-L/16 pretraining on ImageNet-1k using 4 nodes and 8 GPUs per node. Ensure the dataset path is correctly configured.

```bash
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 4 \
  --config-file dinov3/configs/train/vitl_im1k_lin834.yaml \
  --output-dir /outputs/vitl_im1k \
  train.dataset_path=ImageNet22k:root=/data/imagenet:extra=/data/imagenet
```

--------------------------------

### Setup Environment and DINOv3 Location

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb

Imports necessary libraries and sets up the DINOv3 location, either from a local path or via torch hub.

```python
import datetime
import functools
import io
import logging
import math
import os
from pathlib import Path
import tarfile
import time
import urllib

import lovely_tensors
import matplotlib.pyplot as plt
import mediapy as mp
import numpy as np
from PIL import Image
import torch
import torch.nn.functional as F
import torchvision.transforms as TVT
import torchvision.transforms.functional as TVTF
from torch import Tensor, nn
from tqdm import tqdm

DISPLAY_HEIGHT = 200
lovely_tensors.monkey_patch()
torch.set_grad_enabled(False)
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

DINOV3_GITHUB_LOCATION = "facebookresearch/dinov3"

if os.getenv("DINOV3_LOCATION") is not None:
    DINOV3_LOCATION = os.getenv("DINOV3_LOCATION")
else:
    DINOV3_LOCATION = DINOV3_GITHUB_LOCATION

print(f"DINOv3 location set to {DINOV3_LOCATION}")
```

--------------------------------

### Setup DINOv3 Repository Location

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/pca.ipynb

Imports necessary libraries and sets the DINOv3 repository location, either from an environment variable or via torch hub.

```python
import pickle
import os
import urllib

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

import torch
import torchvision.transforms.functional as TF
from sklearn.decomposition import PCA
from scipy import signal

DINOV3_GITHUB_LOCATION = "facebookresearch/dinov3"

if os.getenv("DINOV3_LOCATION") is not None:
    DINOV3_LOCATION = os.getenv("DINOV3_LOCATION")
else:
    DINOV3_LOCATION = DINOV3_GITHUB_LOCATION

print(f"DINOv3 location set to {DINOV3_LOCATION}")
```

--------------------------------

### Setup DinoV3 Conda Environment

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Commands to create and activate a Conda environment for DinoV3 using micromamba. This ensures all necessary dependencies are installed.

```shell
micromamba env create -f conda.yaml
micromamba activate dinov3
```

--------------------------------

### Setup ImageNet Validation Dataset

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dinotxt_inference.ipynb

Configure the root directory for the ImageNet validation dataset and create a dataset object with preprocessing. Ensure the model is in evaluation mode and moved to the GPU.

```python
imagenet_val_root_dir = "<PATH/TO/IMAGENET/VAL/ROOT_DIR>"
val_dataset = ImageFolder(imagenet_val_root_dir, image_preprocess)
model = model.eval().cuda()
```

--------------------------------

### Run DinoV3 Segmentation Inference

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Example command to perform full inference on the ADE20K dataset using a provided segmentor. Ensure to replace placeholders for dataset root and output directory.

```shell
PYTHONPATH=. python -m dinov3.run.submit dinov3/eval/segmentation/run.py \
config=dinov3/eval/segmentation/configs/config-ade20k-m2f-inference.yaml  \
datasets.root=<PATH/TO/DATASET> \
load_from=dinov3_vit7b16_ms \
--output-dir <PATH/TO/OUTPUT/DIR>
```

--------------------------------

### Full 3-Stage ViT-7B/16 Training: Stage 1 Pretraining

Source: https://context7.com/facebookresearch/dinov3/llms.txt

This command starts Stage 1 pretraining for the ViT-7B/16 model using 256 GPUs. It requires a custom dataset path and outputs to a specified directory.

```bash
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 32 \
  --config-file dinov3/configs/train/dinov3_vit7b16_pretrain.yaml \
  --output-dir /outputs/vit7b_stage1 \
  train.dataset_path=CustomDataset:root=/data/lvd:extra=/data/lvd
```

--------------------------------

### Full Example: DINOv3 Depther Inference and Visualization

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Demonstrates loading an image, applying transformations, performing depth estimation using the DINOv3 Depther model, and visualizing the results. Requires PIL, PyTorch, torchvision, and matplotlib. Ensure CUDA is available for autocast.

```python
from PIL import Image
import torch
from torchvision.transforms import v2
import matplotlib.pyplot as plt
from matplotlib import colormaps

def get_img():
    import requests
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
    return image

def make_transform(resize_size: int | list[int] = 768):
    to_tensor = v2.ToImage()
    resize = v2.Resize((resize_size, resize_size), antialias=True)
    to_float = v2.ToDtype(torch.float32, scale=True)
    normalize = v2.Normalize(
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225),
    )
    return v2.Compose([to_tensor, resize, to_float, normalize])

depther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source="local", weights=<DEPTHER/CHECKPOINT/URL/OR/PATH>, backbone_weights=<BACKBONE/CHECKPOINT/URL/OR/PATH>)

img_size = 1024
img = get_img()
transform = make_transform(img_size)
with torch.inference_mode():
    with torch.autocast('cuda', dtype=torch.bfloat16):
        batch_img = transform(img)[None]
        batch_img = batch_img
        depths = depther(batch_img)

plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(img)
plt.axis("off")
plt.subplot(122)
plt.imshow(depths[0,0].cpu(), cmap=colormaps["Spectral"])
plt.axis("off")

```

--------------------------------

### Train Text Alignment on DINOv3

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Initiate text alignment training on DINOv3 using the provided configuration. This example demonstrates training on 4 nodes with 8 GPUs each. Adapt the trainer config file and dataset as needed.

```shell
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/eval/text/train_dinotxt.py \
   --nodes 4 \
  # An example config for text alignment is here: dinov3/eval/text/configs/dinov3_vitl_text.yaml \
  trainer_config_file="<PATH/TO/DINOv3/TEXT/CONFIG>" \
  output-dir=<PATH/TO/OUTPUT/DIR>
```

--------------------------------

### Import Python Libraries for Dataset Exploration

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_dataset_exploration.ipynb

Imports necessary libraries for working with the dataset, including boto3 for S3 access, rasterio for geospatial data, and geopandas for vector data. Ensure these libraries are installed.

```python
# This notebook requires the following additional libraries
# (please install using the preferred method for your environment, e.g. pip, conda):
#
# boto3 >= 1.38.23
# matplotlib >= 3.10.3 
# rasterio >= 1.5.0
# geopandas >= 1.1.3

# Import the libraries required for this notebook
# Built-ins
import json
from pprint import pprint
import tempfile
import os
# Installed libraries
import boto3, matplotlib.pyplot as plt
from botocore import UNSIGNED
from botocore.config import Config
import rasterio
import rasterio.mask
from rasterio.merge import merge
from rasterio.warp import calculate_default_transform, reproject, Resampling
import geopandas as gp
import numpy as np
```

--------------------------------

### Load NEON Dataset Images as Tensors

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_inference.ipynb

Loads images from provided URIs and converts them into PyTorch tensors. Ensure rasterio and torch are installed and imported.

```python
import urllib.request
import rasterio
import io
import torch

# Using test samples from the NEON dataset that can be downloaded following instructions in
# https://github.com/facebookresearch/HighResCanopyHeight
# Original dataset: National Ecological Observatory Network (NEON), 2022. Ecosystem Structure
# URL: https://data.neonscience.org/data-products/DP3.30015.001.
neon_images_uri = [
    "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2017_WLOU_1_NEON_D13_WLOU_DP3_419000_4416000_RGB.tif_1_1.tif",
    "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2018_GUAN_1_NEON_D04_GUAN_DP3_725000_1985000_RGB.tif_2_1.tif",
    "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2019_HOPB_3_NEON_D01_HOPB_DP3_717000_4705000_RGB.tif_1_1.tif",
    "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2019_REDB_2_NEON_D15_REDB_DP3_433000_4516000_RGB.tif_2_2.tif",
    "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2019_WLOU_2_NEON_D13_WLOU_DP3_420000_4417000_RGB.tif_0_0.tif",

]
neon_images_list = []


def load_image_as_tensor(uri: str) -> torch.Tensor:
    """Load a rasterio image from URI as a PyTorch tensor."""
    with urllib.request.urlopen(uri) as response:
        data = response.read()
    with rasterio.open(io.BytesIO(data)) as src:
        img = src.read()
        return torch.from_numpy(img)

for neon_image_uri in neon_images_uri:
    neon_images_list.append(load_image_as_tensor(neon_image_uri))
```

--------------------------------

### Single Frame Propagation Example

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb

Demonstrates the 'propagate' function using features from the first frame as context for the second frame. It involves extracting features, marking dynamic tensors, and calling the propagation function.

```python
torch._dynamo.maybe_mark_dynamic(first_frame, (1, 2))
first_feats = forward(model, first_frame)  # [h, w, D]
print(f"First feats:   {first_feats.shape}")

frame_idx = 1
current_frame_pil = frames[frame_idx]
current_frame = transform(current_frame_pil).to("cuda")  # [3, H, W]
torch._dynamo.maybe_mark_dynamic(current_frame, (1, 2))
current_feats = forward(model, current_frame)  # [h", w", D]
print(f"Current feats: {current_feats.shape}")

current_probs = propagate(
    current_feats,  # [h", w", D]
    context_features=first_feats.unsqueeze(0),  # [1, h, w, D]
    context_probs=first_probs.unsqueeze(0),  # [1, h, w, M]
    neighborhood_mask=neighborhood_mask,  # [h", w", h, w]
    topk=5,
    temperature=0.2,
)  # [h", w", M]
print(f"Current probs:  {current_probs}")

```

--------------------------------

### Extract Image Features with Hugging Face Pipeline

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Demonstrates how to extract image features using the Hugging Face Transformers pipeline. Ensure the 'transformers' library is installed.

```python
from transformers import pipeline
from transformers.image_utils import load_image

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
image = load_image(url)

feature_extractor = pipeline(
    model="facebook/dinov3-convnext-tiny-pretrain-lvd1689m",
    task="image-feature-extraction",
)
features = feature_extractor(image)
```

--------------------------------

### Iterate Through Target Tiles

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_dataset_exploration.ipynb

Initializes an empty list to store TIFF file paths and begins iterating through the identified target tiles. This is a setup for further processing of each tile.

```python
tifs=[]
for ii, row in target_tiles.iterrows():
```

--------------------------------

### Load DinoV3 Backbone from URL using PyTorch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Use this snippet to load a DinoV3 backbone model directly from a URL. Ensure PyTorch is installed with CUDA support for better performance.

```python
import torch

# Example: Load a ViT-S/16 distilled backbone from a URL
# Replace with the actual URL from the model weights list
backbone_url = "<URL_TO_BACKBONE_WEIGHTS>"
backbone = torch.hub.load("facebookresearch/dinov3", "vit_small_patch16_224", pretrained=True, weights=backbone_url)

# Example: Load a ConvNeXt-Tiny backbone from a URL
# Replace with the actual URL from the model weights list
# backbone_url = "<URL_TO_BACKBONE_WEIGHTS>"
# backbone = torch.hub.load("facebookresearch/dinov3", "convnext_tiny_224", pretrained=True, weights=backbone_url)
```

--------------------------------

### Reproduce DINOv3 Depth Estimation Results (NYUv2)

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Command to reproduce DINOv3 depth estimation results on NYUv2 using the SYNTHMIX-trained Depther model. Requires dataset setup and correct paths for dataset root and output directory. This uses the `dinov3.run.submit` utility.

```shell
PYTHONPATH=. python -m dinov3.run.submit dinov3/eval/depth/run.py \
config=dinov3/eval/depth/configs/config-nyu-synthmix-dpt-inference.yaml \
datasets.root=<PATH/TO/DATASET> \
load_from=dinov3_vit7b16_dd \
--output-dir <PATH/TO/OUTPUT/DIR>
```

--------------------------------

### Initialize S3 Client and List Bucket Prefixes

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_dataset_exploration.ipynb

Sets up the S3 bucket and path for the dataset and initializes a boto3 client. It lists the top-level prefixes in the specified S3 path, requiring unsigned configuration for public buckets.

```python
# Location of the S3 bucket for this dataset
bucket = "dataforgood-fb-data"
path = "forests/v2/global/dinov3_global_chm_v2_ml3/"

# List the top level of the bucket using boto3. Because this is a public bucket, we don't need to sign requests.
# Here we set the signature version to unsigned, which is required for public buckets.
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))

# Print the items in the top-level prefixes
for item in s3.list_objects_v2(Bucket=bucket, Prefix=path, Delimiter='/')['CommonPrefixes']:
    print(item['Prefix'])
```

--------------------------------

### Run Multi-Distillation Training

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Use this command to initiate multi-distillation training. Ensure the PYTHONPATH is set correctly and specify the configuration file and dataset paths.

```shell
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 1 \
  --config-file dinov3/configs/train/multi_distillation_test.yaml \
  --output-dir <PATH/TO/OUTPUT/DIR> \
  --multi-distillation \
  train.dataset_path=<DATASET>:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```

--------------------------------

### Create Dataset Directory

Source: https://github.com/facebookresearch/dinov3/blob/main/DATASETS.md

Sets up the root directory for depth datasets and creates a specific folder for the NYU dataset.

```bash
export DEPTH_DATASETS_ROOT=${HOME}/datasets
mkdir -p ${DEPTH_DATASETS_ROOT}/NYU
```

--------------------------------

### Launch DINOv3 Training Jobs

Source: https://context7.com/facebookresearch/dinov3/llms.txt

Launches DINOv3 pretraining or multi-distillation jobs on SLURM clusters using submitit, or locally with python/torchrun. Configuration is managed via YAML files and omegaconf.

```bash
# dinov3/train/train.py
```

--------------------------------

### Initialize Quantization Filter and Resize Transform

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dense_sparse_matching.ipynb

Sets up a convolutional filter for patch quantization and a transform for resizing images to be divisible by the patch size. Ensure PATCH_SIZE and IMAGE_SIZE are defined.

```python
patch_quant_filter = torch.nn.Conv2d(1, 1, PATCH_SIZE, stride=PATCH_SIZE, bias=False)
patch_quant_filter.weight.data.fill_(1.0 / (PATCH_SIZE * PATCH_SIZE))
```

```python
def resize_transform(
    mask_image: Image,
    image_size: int = IMAGE_SIZE,
    patch_size: int = PATCH_SIZE,
) -> torch.Tensor:
    w, h = mask_image.size
    h_patches = int(image_size / patch_size)
    w_patches = int((w * image_size) / (h * patch_size))
    return TF.to_tensor(TF.resize(mask_image, (h_patches * patch_size, w_patches * patch_size)))
```

--------------------------------

### Get Image Embeddings with Hugging Face AutoModel

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Shows how to obtain image embeddings using Hugging Face's AutoImageProcessor and AutoModel. The model can be specified from a list of available pretrained models.

```python
import torch
from transformers import AutoImageProcessor, AutoModel
from transformers.image_utils import load_image

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = load_image(url)

pretrained_model_name = "facebook/dinov3-convnext-tiny-pretrain-lvd1689m"
processor = AutoImageProcessor.from_pretrained(pretrained_model_name)
model = AutoModel.from_pretrained(
    pretrained_model_name,
    device_map="auto",
)

inputs = processor(images=image, return_tensors="pt").to(model.device)
with torch.inference_mode():
    outputs = model(**inputs)

pooled_output = outputs.pooler_output
print("Pooled output shape:", pooled_output.shape)
```

--------------------------------

### Load DINOv3 Model and Get Attributes

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb

Loads a specified DINOv3 model (e.g., ViT-L) and retrieves its patch size and embedding dimension. It also reports peak GPU memory usage.

```python
# examples of available DINOv3 models:
MODEL_DINOV3_VITS = "dinov3_vits16"
MODEL_DINOV3_VITSP = "dinov3_vits16plus"
MODEL_DINOV3_VITB = "dinov3_vitb16"
MODEL_DINOV3_VITL = "dinov3_vitl16"
MODEL_DINOV3_VITHP = "dinov3_vith16plus"
MODEL_DINOV3_VIT7B = "dinov3_vit7b16"

# we take DINOv3 ViT-L
MODEL_NAME = MODEL_DINOV3_VITL

model = torch.hub.load(
    repo_or_dir=DINOV3_LOCATION,
    model=MODEL_NAME,
    source="local" if DINOV3_LOCATION != DINOV3_GITHUB_LOCATION else "github",
)
model.to("cuda")
model.eval()

patch_size = model.patch_size
embed_dim = model.embed_dim
print(f"Patch size: {patch_size}")
print(f"Embedding dimension: {embed_dim}")
print(f"Peak GPU memory: {torch.cuda.max_memory_allocated() / 2**30:.1f} GB")
```

--------------------------------

### Import DINOv3 Dependencies

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dinotxt_segmentation_inference.ipynb

Imports necessary libraries and sets the DINOv3 repository path. Ensure DINOv3_REPO_DIR is correctly set to your local repository path.

```python
import dataclasses
import math
import warnings
from typing import Callable
import os

import lovely_tensors
import numpy as np
import PIL.Image
import torch
import torch.nn.functional as F
import torchvision.transforms as TVT
import torchvision.transforms.functional as TVTF
import tqdm
from omegaconf import OmegaConf
from torch import Tensor, nn
from torchmetrics.classification import MulticlassJaccardIndex

DINOv3_REPO_DIR = "" # Please add here the path to your DINOv3 repository
```

--------------------------------

### Gram Anchoring for DINOv3 ViT-7B/16

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Performs Gram anchoring for DINOv3 ViT-7B/16 training on 32 nodes (256 GPUs). Requires the path to the Gram teacher checkpoint from the previous step.

```shell
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 32 \
  --config-file dinov3/configs/train/dinov3_vit7b16_gram_anchor.yaml \
  --output-dir <PATH/TO/OUTPUT/DIR> \
  train.dataset_path=<DATASET>:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
  gram.ckpt=<PATH/TO/GRAM_TEACHER_FROM_PREVIOUS_STEP>
```

--------------------------------

### ImageNet-1k Linear Classifier with DINOv3-ViT-7B

Source: https://context7.com/facebookresearch/dinov3/llms.txt

Loads a DINOv3 ViT-7B backbone with a pre-trained linear classification head for ImageNet-1k. This setup combines CLS tokens and mean-pooled patch tokens for 1000-class inference. Ensure correct paths for model weights and backbone weights.

```python
import torch
from torchvision.transforms import v2
from PIL import Image
from dinov3.hub.classifiers import dinov3_vit7b16_lc

# Load full linear classifier (backbone + head)
model = torch.hub.load(
    "/path/to/dinov3",
    "dinov3_vit7b16_lc",
    source="local",
    weights="/path/to/dinov3_vit7b16_imagenet1k_linear_head-90d8ed92.pth",
    backbone_weights="/path/to/dinov3_vit7b16_pretrain_lvd1689m-a955f4ea.pth",
)
model.eval().cuda()

transform = v2.Compose([
    v2.ToImage(),
    v2.Resize(256, antialias=True),
    v2.CenterCrop(224),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

x = transform(Image.open("cat.jpg").convert("RGB")).unsqueeze(0).cuda()

with torch.inference_mode():
    logits = model(x)                            # (1, 1000)
    pred_class = logits.argmax(dim=-1).item()
    confidence = logits.softmax(dim=-1).max().item()
    print(f"Predicted class: {pred_class}, confidence: {confidence:.3f}")
    # Predicted class: 281, confidence: 0.892  (e.g., tabby cat)
```

--------------------------------

### Train DINOv3 ViT-L/16 on ImageNet-1k

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Launches DINOv3 pre-training for ViT-L/16 on ImageNet-1k using 4 nodes (32 GPUs) in a SLURM environment. Training takes approximately 14 hours.

```shell
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 4 \
  --config-file dinov3/configs/train/vitl_im1k_lin834.yaml \
  --output-dir <PATH/TO/OUTPUT/DIR> \
  train.dataset_path=ImageNet22k:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```

--------------------------------

### Load Canopy Height Map Model (Hugging Face Transformers)

Source: https://context7.com/facebookresearch/dinov3/llms.txt

Loads a Canopy Height Map v2 (CHMv2) model using Hugging Face Transformers for satellite imagery. Requires installing the transformers library. The model is set to evaluation mode.

```python
import torch
from PIL import Image
from transformers import AutoModelForDepthEstimation, AutoImageProcessor

# Via Hugging Face Transformers
processor = AutoImageProcessor.from_pretrained("facebook/dinov3-vitl16-chmv2-dpt-head")
model = AutoModelForDepthEstimation.from_pretrained("facebook/dinov3-vitl16-chmv2-dpt-head")
model.eval()
```

```python
image = Image.open("forest_satellite.tif").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

depth = processor.post_process_depth_estimation(
    outputs, target_sizes=[(image.height, image.width)]
)[0]["predicted_depth"]
print("Canopy height map shape:", depth.shape)      # (H, W)
print("Height range (m):", depth.min().item(), "–", depth.max().item())
```

--------------------------------

### Load DINOv3 ViT Models (Satellite Imagery) with PyTorch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Loads DINOv3 ViT models pretrained on satellite imagery using torch.hub. Requires specifying the checkpoint path or URL.

```python
dinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
```

--------------------------------

### Run DINOv3 Depth Estimation Directly

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Alternative command to run DINOv3 depth estimation without `dinov3.run.submit`, using Python directly or torchrun. Ensure correct paths for dataset root and output directory are provided.

```shell
PYTHONPATH=. python dinov3/eval/depth/run.py \
config=dinov3/eval/depth/configs/config-nyu-synthmix-dpt-inference.yaml \
datasets.root=<PATH/TO/DATASET> \
load_from=dinov3_vit7b16_dd \
output_dir=<PATH/TO/OUTPUT/DIR>
```

--------------------------------

### Load and Visualize First Frame Mask

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb

Loads the instance segmentation mask for the first video frame from a URL and visualizes it alongside the original frame. Prints the mask dimensions and the number of detected masks.

```python
first_mask_np = np.array(
    load_image_from_url(
        "https://dl.fbaipublicfiles.com/dinov3/notebooks/segmentation_tracking/first_video_frame_mask.png"
    )
)

mask_height, mask_width = first_mask_np.shape  # Abbreviated at [H', W']
print(f"Mask size: {[mask_height, mask_width]}")

num_masks = int(first_mask_np.max() + 1)  # Abbreviated as M
print(f"Number of masks: {num_masks}")

mp.show_images(
    [frames[0], mask_to_rgb(first_mask_np, num_masks)],
    titles=["Frame", "Mask"],
    height=DISPLAY_HEIGHT,
)
```

--------------------------------

### Create Image Preprocessing Pipelines

Source: https://context7.com/facebookresearch/dinov3/llms.txt

Generates standard image preprocessing pipelines for evaluation or inference using torchvision v2 transforms. Supports configurable resize, crop, and normalization for classification or custom tasks.

```python
from dinov3.data.transforms import (
    make_classification_eval_transform,
    make_eval_transform,
    IMAGENET_DEFAULT_MEAN,
    IMAGENET_DEFAULT_STD,
)
from torchvision.transforms import v2
from PIL import Image
import torch

# Standard ImageNet eval transform (resize to 256, center-crop to 224)
eval_transform = make_classification_eval_transform(
    resize_size=256,
    crop_size=224,
    interpolation=v2.InterpolationMode.BICUBIC,
    mean=IMAGENET_DEFAULT_MEAN,
    std=IMAGENET_DEFAULT_STD,
)

# Custom: satellite imagery, square resize, no center-crop
sat_transform = make_eval_transform(
    resize_size=256,
    crop_size=None,             # skip center crop
    resize_square=True,         # force square output
    mean=(0.430, 0.411, 0.296),
    std=(0.213, 0.156, 0.143),
)

img = Image.open("photo.jpg").convert("RGB")
x_eval = eval_transform(img)   # torch.Tensor (3, 224, 224)
x_sat  = sat_transform(img)    # torch.Tensor (3, 256, 256)
print(x_eval.shape, x_eval.dtype)   # torch.Size([3, 224, 224]) torch.float32
print(x_sat.shape)                   # torch.Size([3, 256, 256])
```

--------------------------------

### Load Dataset and Prepare Text Features

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dinotxt_segmentation_inference.ipynb

Loads the specified dataset, defines image transformations, and prepares text features by encoding class names using various prompt templates. It averages prompt embeddings and normalizes them for use with the model.

```python
# Load dataset
transform = TVT.Compose(
    [
        ShortSideResize(cfg.resize, TVT.InterpolationMode.BICUBIC),
        TVT.ToTensor(),
        NORMALIZE_IMAGENET,
    ]
)
dataset = DATASETS[cfg.dataset](transform)
class_names = dataset.CLASS_NAMES
print(f"Dataset: {len(dataset)} images, {len(class_names)} classes")
dataloder = torch.utils.data.DataLoader(
    dataset,
    batch_size=None, # TODO Adapt
    num_workers=1,
    shuffle=False,
    pin_memory=True,
    multiprocessing_context="spawn",
)

# Prepare text features: prompts x class names
text_feats = []
for class_name in tqdm.tqdm(class_names, desc="Class names", unit="name", ncols=0):
    text = [template.format(class_name) for template in PROMPT_TEMPLATES]
    tokens = tokenizer(text).to("cuda", non_blocking=True)
    feats = model.encode_text(tokens)  # [num_prompts, 2D]
    feats = feats[:, feats.shape[1] // 2 :]  # The 1st half of the features corresponds to the CLS token, drop it
    feats = F.normalize(feats, p=2, dim=-1)  # Normalize each text embedding
    feats = feats.mean(dim=0)  # Average over all prompt embeddings per class
    feats = F.normalize(feats, p=2, dim=-1)  # Normalize again
    text_feats.append(feats)
text_feats = torch.stack(text_feats)  # [num_classes, D]
print(f"Text features: {text_feats}")

```

--------------------------------

### Download NYU Dataset Splits

Source: https://github.com/facebookresearch/dinov3/blob/main/DATASETS.md

Downloads the train and test split files for the NYU dataset, which are required for training and evaluation.

```bash
wget https://github.com/cleinc/bts/blob/master/train_test_inputs/nyudepthv2_train_files_with_gt.txt -O ${DEPTH_DATASETS_ROOT}/NYU/nyu_train.txt
wget https://github.com/cleinc/bts/blob/master/train_test_inputs/nyudepthv2_test_files_with_gt.txt -O ${DEPTH_DATASETS_ROOT}/NYU/nyu_test.txt
```

--------------------------------

### Load DINOv3 Model using Torch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/MODEL_CARD.md

Load a DINOv3 model from Torch Hub by specifying the repository, model name, and checkpoint path or URL. Replace `<MODEL_NAME>` and `<PATH/OR/URL/TO/CHECKPOINT>` with the appropriate values.

```python
import torch

model = torch.hub.load(
    repo_or_dir='facebookresearch/dinov3',
    model='<MODEL_NAME>',
    weights='<PATH/OR/URL/TO/CHECKPOINT>',
)
```

```python
# where MODEL_NAME can be one of:
# - dinov3_vits16
# - dinov3_vits16plus
# - dinov3_vitb16
# - dinov3_vitl16
# - dinov3_vith16plus
# - dinov3_vit7b16
# - dinov3_convnext_tiny
# - dinov3_convnext_small
# - dinov3_convnext_base
# - dinov3_convnext_large
```

```python
# For instance
dinov3_vits16 = torch.hub.load(
    repo_or_dir='facebookresearch/dinov3',
    model='dinov3_vits16',
    weights='<PATH/OR/URL/TO/DINOV3/VITS16/LVD1689M/CHECKPOINT>',
)
```

--------------------------------

### Load DinoV3 Adapter from URL using PyTorch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Load a DinoV3 adapter model directly from a URL. This is useful for applying specific task-oriented adapters to a base backbone.

```python
import torch

# Example: Load a specific adapter from a URL
# Replace with the actual URL from the model weights list
adapter_url = "<URL_TO_ADAPTER_WEIGHTS>"
adapter = torch.hub.load("facebookresearch/dinov3", "dinov3_adapter_vit_small_patch16_224", pretrained=True, weights=adapter_url)

# Note: The adapter model name might vary based on the specific adapter and backbone it's designed for.
```

--------------------------------

### Load DinoV3 Backbone from Local Path using PyTorch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Load a DinoV3 backbone model from a local directory where the repository has been cloned. Point `torch.hub.load` to the local path and specify the weights.

```python
import torch

REPO_DIR = "/path/to/your/local/dinov3/repo"

# Example: Load a ViT-S/16 distilled backbone from a local path
# Replace with the actual path to your downloaded weights
local_weights_path = "<PATH/TO/LOCAL/BACKBONE_WEIGHTS>"
backbone = torch.hub.load(REPO_DIR, "vit_small_patch16_224", source="local", pretrained=True, weights=local_weights_path)

# Example: Load a ConvNeXt-Tiny backbone from a local path
# Replace with the actual path to your downloaded weights
# local_weights_path = "<PATH/TO/LOCAL/BACKBONE_WEIGHTS>"
# backbone = torch.hub.load(REPO_DIR, "convnext_tiny_224", source="local", pretrained=True, weights=local_weights_path)
```

--------------------------------

### Pretrain DINOv3 ViT-7B/16

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Initiates DINOv3 ViT-7B/16 pretraining on 32 nodes (256 GPUs) in a SLURM cluster. Requires the 'dinov3' package in the Python module search path.

```shell
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 32 \
  --config-file dinov3/configs/train/dinov3_vit7b16_pretrain.yaml \
  --output-dir <PATH/TO/OUTPUT/DIR> \
  train.dataset_path=<DATASET>:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET>
```

--------------------------------

### Load DinoV3 Adapter from Local Path using PyTorch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Load a DinoV3 adapter model from a local directory. This allows for offline use or integration into local development workflows.

```python
import torch

REPO_DIR = "/path/to/your/local/dinov3/repo"

# Example: Load a specific adapter from a local path
# Replace with the actual path to your downloaded weights
local_weights_path = "<PATH/TO/LOCAL/ADAPTER_WEIGHTS>"
adapter = torch.hub.load(REPO_DIR, "dinov3_adapter_vit_small_patch16_224", source="local", pretrained=True, weights=local_weights_path)

# Note: The adapter model name might vary based on the specific adapter and backbone it's designed for.
```

--------------------------------

### Display Sample Video Frames

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb

Selects and displays a specified number of sample frames from a list of video frames. Requires the 'mp' library for image display.

```python
num_selected_frames = 4
selected_frames = np.linspace(0, num_frames - 1, num_selected_frames, dtype=int)

mp.show_images(
    [frames[int(i)] for i in selected_frames],
    titles=[f"Frame {i}" for i in selected_frames],
    height=DISPLAY_HEIGHT,
)
```

--------------------------------

### Load DINOv3 ConvNeXt Models with PyTorch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Loads various DINOv3 ConvNeXt models using torch.hub. Requires specifying the checkpoint path or URL.

```python
dinov3_convnext_tiny = torch.hub.load(REPO_DIR, 'dinov3_convnext_tiny', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_convnext_small = torch.hub.load(REPO_DIR, 'dinov3_convnext_small', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_convnext_base = torch.hub.load(REPO_DIR, 'dinov3_convnext_base', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_convnext_large = torch.hub.load(REPO_DIR, 'dinov3_convnext_large', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
```

--------------------------------

### High-Resolution Adaptation for DINOv3 ViT-7B/16

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Conducts high-resolution adaptation for DINOv3 ViT-7B/16 training on 32 nodes (256 GPUs). Requires paths to the Gram teacher checkpoint and student resume checkpoint.

```shell
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 32 \
  --config-file dinov3/configs/train/dinov3_vit7b16_high_res_adapt.yaml \
  --output-dir <PATH/TO/OUTPUT/DIR> \
  train.dataset_path=<DATASET>:root=<PATH/TO/DATASET>:extra=<PATH/TO/DATASET> \
  gram.ckpt=<PATH/TO/TEACHER_FROM_GRAM> \
  student.resume_from_teacher_chkpt=<PATH/TO/TEACHER_FROM_GRAM>
```

--------------------------------

### Load Image and Perform Foreground Segmentation

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/foreground_segmentation.ipynb

Loads an image from a URL, preprocesses it, extracts features using a DinoV3 model, predicts foreground probabilities with a classifier, applies a median filter, and displays the input image, raw foreground scores, and filtered scores.

```python
test_image_fpath = "https://dl.fbaipublicfiles.com/dinov3/notebooks/foreground_segmentation/test_image.jpg"

def load_image_from_url(url: str) -> Image:
    with urllib.request.urlopen(url) as f:
        return Image.open(f).convert("RGB")


test_image = load_image_from_url(test_image_fpath)
test_image_resized = resize_transform(test_image)
test_image_normalized = TF.normalize(test_image_resized, mean=IMAGENET_MEAN, std=IMAGENET_STD)

with torch.inference_mode():
    with torch.autocast(device_type='cuda', dtype=torch.float32):
        feats = model.get_intermediate_layers(test_image_normalized.unsqueeze(0).cuda(), n=range(n_layers), reshape=True, norm=True)
        x = feats[-1].squeeze().detach().cpu()
        dim = x.shape[0]
        x = x.view(dim, -1).permute(1, 0)

h_patches, w_patches = [int(d / PATCH_SIZE) for d in test_image_resized.shape[1:]]

fg_score = clf.predict_proba(x)[:, 1].reshape(h_patches, w_patches)
fg_score_mf = torch.from_numpy(signal.medfilt2d(fg_score, kernel_size=3))

plt.figure(figsize=(9, 3), dpi=300)
plt.subplot(1, 3, 1)
plt.axis('off')
plt.imshow(test_image_resized.permute(1, 2, 0))
plt.title('input image')
plt.subplot(1, 3, 2)
plt.axis('off')
plt.imshow(fg_score)
plt.title('foreground score')
plt.subplot(1, 3, 3)
plt.axis('off')
plt.imshow(fg_score_mf)
plt.title('+ median filter')
plt.show()
```

--------------------------------

### Full 3-Stage ViT-7B/16 Training: Stage 2 Gram Anchoring

Source: https://context7.com/facebookresearch/dinov3/llms.txt

This command executes Stage 2, Gram anchoring, for ViT-7B/16 training. It depends on the checkpoint from Stage 1 and uses a custom dataset.

```bash
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 32 \
  --config-file dinov3/configs/train/dinov3_vit7b16_gram_anchor.yaml \
  --output-dir /outputs/vit7b_stage2 \
  train.dataset_path=CustomDataset:root=/data/lvd:extra=/data/lvd \
  gram.ckpt=/outputs/vit7b_stage1/eval/training_xxx/teacher_checkpoint.pth
```

--------------------------------

### Compute Foreground Probability and Visualize

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/pca.ipynb

Calculates foreground probability using a classifier and visualizes the original image alongside the foreground score. Requires image resizing and patch size definitions.

```python
h_patches, w_patches = [int(d / PATCH_SIZE) for d in image_resized.shape[1:]]

fg_score = clf.predict_proba(x)[:, 1].reshape(h_patches, w_patches)
fg_score_mf = torch.from_numpy(signal.medfilt2d(fg_score, kernel_size=3))

plt.rcParams.update({
    "xtick.labelsize": 5,
    "ytick.labelsize": 5,
    "axes.labelsize": 5,
    "axes.titlesize": 4,
})

plt.figure(figsize=(4, 2), dpi=300)
plt.subplot(1, 2, 1)
plt.imshow(image)
plt.axis('off')
plt.title(f"Image, Size {image.size}")
plt.subplot(1, 2, 2)
plt.imshow(fg_score_mf)
plt.title(f"Foreground Score, Size {tuple(fg_score_mf.shape)}")
plt.colorbar()
plt.axis('off')
plt.show()
```

--------------------------------

### ConvNeXt Multi-Distillation (Single Node Test)

Source: https://context7.com/facebookresearch/dinov3/llms.txt

This command runs a test for ConvNeXt multi-distillation on a single node. It utilizes a specific configuration file and enables multi-distillation mode.

```bash
PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \
  --nodes 1 \
  --config-file dinov3/configs/train/distillation_convnext/convnext_tiny_p16.yaml \
  --output-dir /outputs/convnext_tiny \
  --multi-distillation \
  train.dataset_path=ImageNet22k:root=/data/imagenet:extra=/data/imagenet
```

--------------------------------

### Load DINOv3 ViT Models with PyTorch Hub

Source: https://github.com/facebookresearch/dinov3/blob/main/README.md

Loads various DINOv3 ViT models using torch.hub. Requires specifying the checkpoint path or URL.

```python
dinov3_vits16 = torch.hub.load(REPO_DIR, 'dinov3_vits16', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_vits16plus = torch.hub.load(REPO_DIR, 'dinov3_vits16plus', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_vitb16 = torch.hub.load(REPO_DIR, 'dinov3_vitb16', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_vith16plus = torch.hub.load(REPO_DIR, 'dinov3_vith16plus', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
dinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=<CHECKPOINT/URL/OR/PATH>)
```

--------------------------------

### Initialize Feature and Probability Queues

Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb

Initializes empty lists to store features and probabilities for context frames. These queues will be used for propagating segmentation information.

```python
features_queue: list[Tensor] = []
probs_queue: list[Tensor] = []
```

--------------------------------

### Load DINOv3 ViT Backbone (SAT-493M)

Source: https://context7.com/facebookresearch/dinov3/llms.txt

Loads a pretrained DINOv3 Vision Transformer backbone (ViT-L) specifically from the SAT-493M satellite imagery dataset. It highlights the use of satellite-specific normalization statistics.

```python
import torch
from torchvision.transforms import v2
from dinov3.hub.backbones import dinov3_vitl16, Weights

# Load ViT-L pretrained on satellite data
model = dinov3_vitl16(
    pretrained=True,
    weights=Weights.SAT493M,
    weights="/path/to/dinov3_vitl16_pretrain_sat493m-eadcf0ff.pth",
)
model.eval().cuda()

```

--------------------------------

### Load DINOv3 ConvNeXt Backbone (LVD-1689M)

Source: https://context7.com/facebookresearch/dinov3/llms.txt

Loads a pretrained DINOv3 ConvNeXt backbone (Base variant) from the LVD-1689M dataset. It applies standard image transformations and demonstrates how to obtain stage-wise features from the model.

```python
import torch
from torchvision.transforms import v2

# Load ConvNeXt-Base pretrained on web images
REPO_DIR = "/path/to/dinov3"
model = torch.hub.load(
    REPO_DIR,
    "dinov3_convnext_base",
    source="local",
    weights="/path/to/dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth",
)
model.eval().cuda()

transform = v2.Compose([
    v2.ToImage(),
    v2.Resize((256, 256), antialias=True),
    v2.CenterCrop(224),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

from PIL import Image
x = transform(Image.open("dog.jpg").convert("RGB")).unsqueeze(0).cuda()

with torch.inference_mode():
    # ConvNeXt returns a list of stage features
    out = model(x)
    print("Output type:", type(out))  # list
    for i, feat in enumerate(out):
        print(f"  Stage {i}: {feat.shape}")
    # Stage 0: torch.Size([1, 128, 56, 56])
    # Stage 1: torch.Size([1, 256, 28, 28])
    # Stage 2: torch.Size([1, 512, 14, 14])
    # Stage 3: torch.Size([1, 1024, 7, 7])

```