### Fast Setup: ViT-L/16 Pretraining on ImageNet-1k Source: https://context7.com/facebookresearch/dinov3/llms.txt This command initiates a fast setup for ViT-L/16 pretraining on ImageNet-1k using 4 nodes and 8 GPUs per node. Ensure the dataset path is correctly configured. ```bash PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 4 \ --config-file dinov3/configs/train/vitl_im1k_lin834.yaml \ --output-dir /outputs/vitl_im1k \ train.dataset_path=ImageNet22k:root=/data/imagenet:extra=/data/imagenet ``` -------------------------------- ### Setup Environment and DINOv3 Location Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb Imports necessary libraries and sets up the DINOv3 location, either from a local path or via torch hub. ```python import datetime import functools import io import logging import math import os from pathlib import Path import tarfile import time import urllib import lovely_tensors import matplotlib.pyplot as plt import mediapy as mp import numpy as np from PIL import Image import torch import torch.nn.functional as F import torchvision.transforms as TVT import torchvision.transforms.functional as TVTF from torch import Tensor, nn from tqdm import tqdm DISPLAY_HEIGHT = 200 lovely_tensors.monkey_patch() torch.set_grad_enabled(False) logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") DINOV3_GITHUB_LOCATION = "facebookresearch/dinov3" if os.getenv("DINOV3_LOCATION") is not None: DINOV3_LOCATION = os.getenv("DINOV3_LOCATION") else: DINOV3_LOCATION = DINOV3_GITHUB_LOCATION print(f"DINOv3 location set to {DINOV3_LOCATION}") ``` -------------------------------- ### Setup DINOv3 Repository Location Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/pca.ipynb Imports necessary libraries and sets the DINOv3 repository location, either from an environment variable or via torch hub. ```python import pickle import os import urllib import numpy as np import matplotlib.pyplot as plt from PIL import Image import torch import torchvision.transforms.functional as TF from sklearn.decomposition import PCA from scipy import signal DINOV3_GITHUB_LOCATION = "facebookresearch/dinov3" if os.getenv("DINOV3_LOCATION") is not None: DINOV3_LOCATION = os.getenv("DINOV3_LOCATION") else: DINOV3_LOCATION = DINOV3_GITHUB_LOCATION print(f"DINOv3 location set to {DINOV3_LOCATION}") ``` -------------------------------- ### Setup DinoV3 Conda Environment Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Commands to create and activate a Conda environment for DinoV3 using micromamba. This ensures all necessary dependencies are installed. ```shell micromamba env create -f conda.yaml micromamba activate dinov3 ``` -------------------------------- ### Setup ImageNet Validation Dataset Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dinotxt_inference.ipynb Configure the root directory for the ImageNet validation dataset and create a dataset object with preprocessing. Ensure the model is in evaluation mode and moved to the GPU. ```python imagenet_val_root_dir = "" val_dataset = ImageFolder(imagenet_val_root_dir, image_preprocess) model = model.eval().cuda() ``` -------------------------------- ### Run DinoV3 Segmentation Inference Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Example command to perform full inference on the ADE20K dataset using a provided segmentor. Ensure to replace placeholders for dataset root and output directory. ```shell PYTHONPATH=. python -m dinov3.run.submit dinov3/eval/segmentation/run.py \ config=dinov3/eval/segmentation/configs/config-ade20k-m2f-inference.yaml \ datasets.root= \ load_from=dinov3_vit7b16_ms \ --output-dir ``` -------------------------------- ### Full 3-Stage ViT-7B/16 Training: Stage 1 Pretraining Source: https://context7.com/facebookresearch/dinov3/llms.txt This command starts Stage 1 pretraining for the ViT-7B/16 model using 256 GPUs. It requires a custom dataset path and outputs to a specified directory. ```bash PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 32 \ --config-file dinov3/configs/train/dinov3_vit7b16_pretrain.yaml \ --output-dir /outputs/vit7b_stage1 \ train.dataset_path=CustomDataset:root=/data/lvd:extra=/data/lvd ``` -------------------------------- ### Full Example: DINOv3 Depther Inference and Visualization Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Demonstrates loading an image, applying transformations, performing depth estimation using the DINOv3 Depther model, and visualizing the results. Requires PIL, PyTorch, torchvision, and matplotlib. Ensure CUDA is available for autocast. ```python from PIL import Image import torch from torchvision.transforms import v2 import matplotlib.pyplot as plt from matplotlib import colormaps def get_img(): import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw).convert("RGB") return image def make_transform(resize_size: int | list[int] = 768): to_tensor = v2.ToImage() resize = v2.Resize((resize_size, resize_size), antialias=True) to_float = v2.ToDtype(torch.float32, scale=True) normalize = v2.Normalize( mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), ) return v2.Compose([to_tensor, resize, to_float, normalize]) depther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source="local", weights=, backbone_weights=) img_size = 1024 img = get_img() transform = make_transform(img_size) with torch.inference_mode(): with torch.autocast('cuda', dtype=torch.bfloat16): batch_img = transform(img)[None] batch_img = batch_img depths = depther(batch_img) plt.figure(figsize=(12, 6)) plt.subplot(121) plt.imshow(img) plt.axis("off") plt.subplot(122) plt.imshow(depths[0,0].cpu(), cmap=colormaps["Spectral"]) plt.axis("off") ``` -------------------------------- ### Train Text Alignment on DINOv3 Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Initiate text alignment training on DINOv3 using the provided configuration. This example demonstrates training on 4 nodes with 8 GPUs each. Adapt the trainer config file and dataset as needed. ```shell PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/eval/text/train_dinotxt.py \ --nodes 4 \ # An example config for text alignment is here: dinov3/eval/text/configs/dinov3_vitl_text.yaml \ trainer_config_file="" \ output-dir= ``` -------------------------------- ### Import Python Libraries for Dataset Exploration Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_dataset_exploration.ipynb Imports necessary libraries for working with the dataset, including boto3 for S3 access, rasterio for geospatial data, and geopandas for vector data. Ensure these libraries are installed. ```python # This notebook requires the following additional libraries # (please install using the preferred method for your environment, e.g. pip, conda): # # boto3 >= 1.38.23 # matplotlib >= 3.10.3 # rasterio >= 1.5.0 # geopandas >= 1.1.3 # Import the libraries required for this notebook # Built-ins import json from pprint import pprint import tempfile import os # Installed libraries import boto3, matplotlib.pyplot as plt from botocore import UNSIGNED from botocore.config import Config import rasterio import rasterio.mask from rasterio.merge import merge from rasterio.warp import calculate_default_transform, reproject, Resampling import geopandas as gp import numpy as np ``` -------------------------------- ### Load NEON Dataset Images as Tensors Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_inference.ipynb Loads images from provided URIs and converts them into PyTorch tensors. Ensure rasterio and torch are installed and imported. ```python import urllib.request import rasterio import io import torch # Using test samples from the NEON dataset that can be downloaded following instructions in # https://github.com/facebookresearch/HighResCanopyHeight # Original dataset: National Ecological Observatory Network (NEON), 2022. Ecosystem Structure # URL: https://data.neonscience.org/data-products/DP3.30015.001. neon_images_uri = [ "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2017_WLOU_1_NEON_D13_WLOU_DP3_419000_4416000_RGB.tif_1_1.tif", "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2018_GUAN_1_NEON_D04_GUAN_DP3_725000_1985000_RGB.tif_2_1.tif", "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2019_HOPB_3_NEON_D01_HOPB_DP3_717000_4705000_RGB.tif_1_1.tif", "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2019_REDB_2_NEON_D15_REDB_DP3_433000_4516000_RGB.tif_2_2.tif", "https://dl.fbaipublicfiles.com/dinov3/notebooks/chmv2/2019_WLOU_2_NEON_D13_WLOU_DP3_420000_4417000_RGB.tif_0_0.tif", ] neon_images_list = [] def load_image_as_tensor(uri: str) -> torch.Tensor: """Load a rasterio image from URI as a PyTorch tensor.""" with urllib.request.urlopen(uri) as response: data = response.read() with rasterio.open(io.BytesIO(data)) as src: img = src.read() return torch.from_numpy(img) for neon_image_uri in neon_images_uri: neon_images_list.append(load_image_as_tensor(neon_image_uri)) ``` -------------------------------- ### Single Frame Propagation Example Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb Demonstrates the 'propagate' function using features from the first frame as context for the second frame. It involves extracting features, marking dynamic tensors, and calling the propagation function. ```python torch._dynamo.maybe_mark_dynamic(first_frame, (1, 2)) first_feats = forward(model, first_frame) # [h, w, D] print(f"First feats: {first_feats.shape}") frame_idx = 1 current_frame_pil = frames[frame_idx] current_frame = transform(current_frame_pil).to("cuda") # [3, H, W] torch._dynamo.maybe_mark_dynamic(current_frame, (1, 2)) current_feats = forward(model, current_frame) # [h", w", D] print(f"Current feats: {current_feats.shape}") current_probs = propagate( current_feats, # [h", w", D] context_features=first_feats.unsqueeze(0), # [1, h, w, D] context_probs=first_probs.unsqueeze(0), # [1, h, w, M] neighborhood_mask=neighborhood_mask, # [h", w", h, w] topk=5, temperature=0.2, ) # [h", w", M] print(f"Current probs: {current_probs}") ``` -------------------------------- ### Extract Image Features with Hugging Face Pipeline Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Demonstrates how to extract image features using the Hugging Face Transformers pipeline. Ensure the 'transformers' library is installed. ```python from transformers import pipeline from transformers.image_utils import load_image url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg" image = load_image(url) feature_extractor = pipeline( model="facebook/dinov3-convnext-tiny-pretrain-lvd1689m", task="image-feature-extraction", ) features = feature_extractor(image) ``` -------------------------------- ### Iterate Through Target Tiles Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_dataset_exploration.ipynb Initializes an empty list to store TIFF file paths and begins iterating through the identified target tiles. This is a setup for further processing of each tile. ```python tifs=[] for ii, row in target_tiles.iterrows(): ``` -------------------------------- ### Load DinoV3 Backbone from URL using PyTorch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Use this snippet to load a DinoV3 backbone model directly from a URL. Ensure PyTorch is installed with CUDA support for better performance. ```python import torch # Example: Load a ViT-S/16 distilled backbone from a URL # Replace with the actual URL from the model weights list backbone_url = "" backbone = torch.hub.load("facebookresearch/dinov3", "vit_small_patch16_224", pretrained=True, weights=backbone_url) # Example: Load a ConvNeXt-Tiny backbone from a URL # Replace with the actual URL from the model weights list # backbone_url = "" # backbone = torch.hub.load("facebookresearch/dinov3", "convnext_tiny_224", pretrained=True, weights=backbone_url) ``` -------------------------------- ### Reproduce DINOv3 Depth Estimation Results (NYUv2) Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Command to reproduce DINOv3 depth estimation results on NYUv2 using the SYNTHMIX-trained Depther model. Requires dataset setup and correct paths for dataset root and output directory. This uses the `dinov3.run.submit` utility. ```shell PYTHONPATH=. python -m dinov3.run.submit dinov3/eval/depth/run.py \ config=dinov3/eval/depth/configs/config-nyu-synthmix-dpt-inference.yaml \ datasets.root= \ load_from=dinov3_vit7b16_dd \ --output-dir ``` -------------------------------- ### Initialize S3 Client and List Bucket Prefixes Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/chmv2_dataset_exploration.ipynb Sets up the S3 bucket and path for the dataset and initializes a boto3 client. It lists the top-level prefixes in the specified S3 path, requiring unsigned configuration for public buckets. ```python # Location of the S3 bucket for this dataset bucket = "dataforgood-fb-data" path = "forests/v2/global/dinov3_global_chm_v2_ml3/" # List the top level of the bucket using boto3. Because this is a public bucket, we don't need to sign requests. # Here we set the signature version to unsigned, which is required for public buckets. s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED)) # Print the items in the top-level prefixes for item in s3.list_objects_v2(Bucket=bucket, Prefix=path, Delimiter='/')['CommonPrefixes']: print(item['Prefix']) ``` -------------------------------- ### Run Multi-Distillation Training Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Use this command to initiate multi-distillation training. Ensure the PYTHONPATH is set correctly and specify the configuration file and dataset paths. ```shell PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 1 \ --config-file dinov3/configs/train/multi_distillation_test.yaml \ --output-dir \ --multi-distillation \ train.dataset_path=:root=:extra= ``` -------------------------------- ### Create Dataset Directory Source: https://github.com/facebookresearch/dinov3/blob/main/DATASETS.md Sets up the root directory for depth datasets and creates a specific folder for the NYU dataset. ```bash export DEPTH_DATASETS_ROOT=${HOME}/datasets mkdir -p ${DEPTH_DATASETS_ROOT}/NYU ``` -------------------------------- ### Launch DINOv3 Training Jobs Source: https://context7.com/facebookresearch/dinov3/llms.txt Launches DINOv3 pretraining or multi-distillation jobs on SLURM clusters using submitit, or locally with python/torchrun. Configuration is managed via YAML files and omegaconf. ```bash # dinov3/train/train.py ``` -------------------------------- ### Initialize Quantization Filter and Resize Transform Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dense_sparse_matching.ipynb Sets up a convolutional filter for patch quantization and a transform for resizing images to be divisible by the patch size. Ensure PATCH_SIZE and IMAGE_SIZE are defined. ```python patch_quant_filter = torch.nn.Conv2d(1, 1, PATCH_SIZE, stride=PATCH_SIZE, bias=False) patch_quant_filter.weight.data.fill_(1.0 / (PATCH_SIZE * PATCH_SIZE)) ``` ```python def resize_transform( mask_image: Image, image_size: int = IMAGE_SIZE, patch_size: int = PATCH_SIZE, ) -> torch.Tensor: w, h = mask_image.size h_patches = int(image_size / patch_size) w_patches = int((w * image_size) / (h * patch_size)) return TF.to_tensor(TF.resize(mask_image, (h_patches * patch_size, w_patches * patch_size))) ``` -------------------------------- ### Get Image Embeddings with Hugging Face AutoModel Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Shows how to obtain image embeddings using Hugging Face's AutoImageProcessor and AutoModel. The model can be specified from a list of available pretrained models. ```python import torch from transformers import AutoImageProcessor, AutoModel from transformers.image_utils import load_image url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = load_image(url) pretrained_model_name = "facebook/dinov3-convnext-tiny-pretrain-lvd1689m" processor = AutoImageProcessor.from_pretrained(pretrained_model_name) model = AutoModel.from_pretrained( pretrained_model_name, device_map="auto", ) inputs = processor(images=image, return_tensors="pt").to(model.device) with torch.inference_mode(): outputs = model(**inputs) pooled_output = outputs.pooler_output print("Pooled output shape:", pooled_output.shape) ``` -------------------------------- ### Load DINOv3 Model and Get Attributes Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb Loads a specified DINOv3 model (e.g., ViT-L) and retrieves its patch size and embedding dimension. It also reports peak GPU memory usage. ```python # examples of available DINOv3 models: MODEL_DINOV3_VITS = "dinov3_vits16" MODEL_DINOV3_VITSP = "dinov3_vits16plus" MODEL_DINOV3_VITB = "dinov3_vitb16" MODEL_DINOV3_VITL = "dinov3_vitl16" MODEL_DINOV3_VITHP = "dinov3_vith16plus" MODEL_DINOV3_VIT7B = "dinov3_vit7b16" # we take DINOv3 ViT-L MODEL_NAME = MODEL_DINOV3_VITL model = torch.hub.load( repo_or_dir=DINOV3_LOCATION, model=MODEL_NAME, source="local" if DINOV3_LOCATION != DINOV3_GITHUB_LOCATION else "github", ) model.to("cuda") model.eval() patch_size = model.patch_size embed_dim = model.embed_dim print(f"Patch size: {patch_size}") print(f"Embedding dimension: {embed_dim}") print(f"Peak GPU memory: {torch.cuda.max_memory_allocated() / 2**30:.1f} GB") ``` -------------------------------- ### Import DINOv3 Dependencies Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dinotxt_segmentation_inference.ipynb Imports necessary libraries and sets the DINOv3 repository path. Ensure DINOv3_REPO_DIR is correctly set to your local repository path. ```python import dataclasses import math import warnings from typing import Callable import os import lovely_tensors import numpy as np import PIL.Image import torch import torch.nn.functional as F import torchvision.transforms as TVT import torchvision.transforms.functional as TVTF import tqdm from omegaconf import OmegaConf from torch import Tensor, nn from torchmetrics.classification import MulticlassJaccardIndex DINOv3_REPO_DIR = "" # Please add here the path to your DINOv3 repository ``` -------------------------------- ### Gram Anchoring for DINOv3 ViT-7B/16 Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Performs Gram anchoring for DINOv3 ViT-7B/16 training on 32 nodes (256 GPUs). Requires the path to the Gram teacher checkpoint from the previous step. ```shell PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 32 \ --config-file dinov3/configs/train/dinov3_vit7b16_gram_anchor.yaml \ --output-dir \ train.dataset_path=:root=:extra= \ gram.ckpt= ``` -------------------------------- ### ImageNet-1k Linear Classifier with DINOv3-ViT-7B Source: https://context7.com/facebookresearch/dinov3/llms.txt Loads a DINOv3 ViT-7B backbone with a pre-trained linear classification head for ImageNet-1k. This setup combines CLS tokens and mean-pooled patch tokens for 1000-class inference. Ensure correct paths for model weights and backbone weights. ```python import torch from torchvision.transforms import v2 from PIL import Image from dinov3.hub.classifiers import dinov3_vit7b16_lc # Load full linear classifier (backbone + head) model = torch.hub.load( "/path/to/dinov3", "dinov3_vit7b16_lc", source="local", weights="/path/to/dinov3_vit7b16_imagenet1k_linear_head-90d8ed92.pth", backbone_weights="/path/to/dinov3_vit7b16_pretrain_lvd1689m-a955f4ea.pth", ) model.eval().cuda() transform = v2.Compose([ v2.ToImage(), v2.Resize(256, antialias=True), v2.CenterCrop(224), v2.ToDtype(torch.float32, scale=True), v2.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ]) x = transform(Image.open("cat.jpg").convert("RGB")).unsqueeze(0).cuda() with torch.inference_mode(): logits = model(x) # (1, 1000) pred_class = logits.argmax(dim=-1).item() confidence = logits.softmax(dim=-1).max().item() print(f"Predicted class: {pred_class}, confidence: {confidence:.3f}") # Predicted class: 281, confidence: 0.892 (e.g., tabby cat) ``` -------------------------------- ### Train DINOv3 ViT-L/16 on ImageNet-1k Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Launches DINOv3 pre-training for ViT-L/16 on ImageNet-1k using 4 nodes (32 GPUs) in a SLURM environment. Training takes approximately 14 hours. ```shell PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 4 \ --config-file dinov3/configs/train/vitl_im1k_lin834.yaml \ --output-dir \ train.dataset_path=ImageNet22k:root=:extra= ``` -------------------------------- ### Load Canopy Height Map Model (Hugging Face Transformers) Source: https://context7.com/facebookresearch/dinov3/llms.txt Loads a Canopy Height Map v2 (CHMv2) model using Hugging Face Transformers for satellite imagery. Requires installing the transformers library. The model is set to evaluation mode. ```python import torch from PIL import Image from transformers import AutoModelForDepthEstimation, AutoImageProcessor # Via Hugging Face Transformers processor = AutoImageProcessor.from_pretrained("facebook/dinov3-vitl16-chmv2-dpt-head") model = AutoModelForDepthEstimation.from_pretrained("facebook/dinov3-vitl16-chmv2-dpt-head") model.eval() ``` ```python image = Image.open("forest_satellite.tif").convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) depth = processor.post_process_depth_estimation( outputs, target_sizes=[(image.height, image.width)] )[0]["predicted_depth"] print("Canopy height map shape:", depth.shape) # (H, W) print("Height range (m):", depth.min().item(), "–", depth.max().item()) ``` -------------------------------- ### Load DINOv3 ViT Models (Satellite Imagery) with PyTorch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Loads DINOv3 ViT models pretrained on satellite imagery using torch.hub. Requires specifying the checkpoint path or URL. ```python dinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=) dinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=) ``` -------------------------------- ### Run DINOv3 Depth Estimation Directly Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Alternative command to run DINOv3 depth estimation without `dinov3.run.submit`, using Python directly or torchrun. Ensure correct paths for dataset root and output directory are provided. ```shell PYTHONPATH=. python dinov3/eval/depth/run.py \ config=dinov3/eval/depth/configs/config-nyu-synthmix-dpt-inference.yaml \ datasets.root= \ load_from=dinov3_vit7b16_dd \ output_dir= ``` -------------------------------- ### Load and Visualize First Frame Mask Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb Loads the instance segmentation mask for the first video frame from a URL and visualizes it alongside the original frame. Prints the mask dimensions and the number of detected masks. ```python first_mask_np = np.array( load_image_from_url( "https://dl.fbaipublicfiles.com/dinov3/notebooks/segmentation_tracking/first_video_frame_mask.png" ) ) mask_height, mask_width = first_mask_np.shape # Abbreviated at [H', W'] print(f"Mask size: {[mask_height, mask_width]}") num_masks = int(first_mask_np.max() + 1) # Abbreviated as M print(f"Number of masks: {num_masks}") mp.show_images( [frames[0], mask_to_rgb(first_mask_np, num_masks)], titles=["Frame", "Mask"], height=DISPLAY_HEIGHT, ) ``` -------------------------------- ### Create Image Preprocessing Pipelines Source: https://context7.com/facebookresearch/dinov3/llms.txt Generates standard image preprocessing pipelines for evaluation or inference using torchvision v2 transforms. Supports configurable resize, crop, and normalization for classification or custom tasks. ```python from dinov3.data.transforms import ( make_classification_eval_transform, make_eval_transform, IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD, ) from torchvision.transforms import v2 from PIL import Image import torch # Standard ImageNet eval transform (resize to 256, center-crop to 224) eval_transform = make_classification_eval_transform( resize_size=256, crop_size=224, interpolation=v2.InterpolationMode.BICUBIC, mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, ) # Custom: satellite imagery, square resize, no center-crop sat_transform = make_eval_transform( resize_size=256, crop_size=None, # skip center crop resize_square=True, # force square output mean=(0.430, 0.411, 0.296), std=(0.213, 0.156, 0.143), ) img = Image.open("photo.jpg").convert("RGB") x_eval = eval_transform(img) # torch.Tensor (3, 224, 224) x_sat = sat_transform(img) # torch.Tensor (3, 256, 256) print(x_eval.shape, x_eval.dtype) # torch.Size([3, 224, 224]) torch.float32 print(x_sat.shape) # torch.Size([3, 256, 256]) ``` -------------------------------- ### Load Dataset and Prepare Text Features Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/dinotxt_segmentation_inference.ipynb Loads the specified dataset, defines image transformations, and prepares text features by encoding class names using various prompt templates. It averages prompt embeddings and normalizes them for use with the model. ```python # Load dataset transform = TVT.Compose( [ ShortSideResize(cfg.resize, TVT.InterpolationMode.BICUBIC), TVT.ToTensor(), NORMALIZE_IMAGENET, ] ) dataset = DATASETS[cfg.dataset](transform) class_names = dataset.CLASS_NAMES print(f"Dataset: {len(dataset)} images, {len(class_names)} classes") dataloder = torch.utils.data.DataLoader( dataset, batch_size=None, # TODO Adapt num_workers=1, shuffle=False, pin_memory=True, multiprocessing_context="spawn", ) # Prepare text features: prompts x class names text_feats = [] for class_name in tqdm.tqdm(class_names, desc="Class names", unit="name", ncols=0): text = [template.format(class_name) for template in PROMPT_TEMPLATES] tokens = tokenizer(text).to("cuda", non_blocking=True) feats = model.encode_text(tokens) # [num_prompts, 2D] feats = feats[:, feats.shape[1] // 2 :] # The 1st half of the features corresponds to the CLS token, drop it feats = F.normalize(feats, p=2, dim=-1) # Normalize each text embedding feats = feats.mean(dim=0) # Average over all prompt embeddings per class feats = F.normalize(feats, p=2, dim=-1) # Normalize again text_feats.append(feats) text_feats = torch.stack(text_feats) # [num_classes, D] print(f"Text features: {text_feats}") ``` -------------------------------- ### Download NYU Dataset Splits Source: https://github.com/facebookresearch/dinov3/blob/main/DATASETS.md Downloads the train and test split files for the NYU dataset, which are required for training and evaluation. ```bash wget https://github.com/cleinc/bts/blob/master/train_test_inputs/nyudepthv2_train_files_with_gt.txt -O ${DEPTH_DATASETS_ROOT}/NYU/nyu_train.txt wget https://github.com/cleinc/bts/blob/master/train_test_inputs/nyudepthv2_test_files_with_gt.txt -O ${DEPTH_DATASETS_ROOT}/NYU/nyu_test.txt ``` -------------------------------- ### Load DINOv3 Model using Torch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/MODEL_CARD.md Load a DINOv3 model from Torch Hub by specifying the repository, model name, and checkpoint path or URL. Replace `` and `` with the appropriate values. ```python import torch model = torch.hub.load( repo_or_dir='facebookresearch/dinov3', model='', weights='', ) ``` ```python # where MODEL_NAME can be one of: # - dinov3_vits16 # - dinov3_vits16plus # - dinov3_vitb16 # - dinov3_vitl16 # - dinov3_vith16plus # - dinov3_vit7b16 # - dinov3_convnext_tiny # - dinov3_convnext_small # - dinov3_convnext_base # - dinov3_convnext_large ``` ```python # For instance dinov3_vits16 = torch.hub.load( repo_or_dir='facebookresearch/dinov3', model='dinov3_vits16', weights='', ) ``` -------------------------------- ### Load DinoV3 Adapter from URL using PyTorch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Load a DinoV3 adapter model directly from a URL. This is useful for applying specific task-oriented adapters to a base backbone. ```python import torch # Example: Load a specific adapter from a URL # Replace with the actual URL from the model weights list adapter_url = "" adapter = torch.hub.load("facebookresearch/dinov3", "dinov3_adapter_vit_small_patch16_224", pretrained=True, weights=adapter_url) # Note: The adapter model name might vary based on the specific adapter and backbone it's designed for. ``` -------------------------------- ### Load DinoV3 Backbone from Local Path using PyTorch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Load a DinoV3 backbone model from a local directory where the repository has been cloned. Point `torch.hub.load` to the local path and specify the weights. ```python import torch REPO_DIR = "/path/to/your/local/dinov3/repo" # Example: Load a ViT-S/16 distilled backbone from a local path # Replace with the actual path to your downloaded weights local_weights_path = "" backbone = torch.hub.load(REPO_DIR, "vit_small_patch16_224", source="local", pretrained=True, weights=local_weights_path) # Example: Load a ConvNeXt-Tiny backbone from a local path # Replace with the actual path to your downloaded weights # local_weights_path = "" # backbone = torch.hub.load(REPO_DIR, "convnext_tiny_224", source="local", pretrained=True, weights=local_weights_path) ``` -------------------------------- ### Pretrain DINOv3 ViT-7B/16 Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Initiates DINOv3 ViT-7B/16 pretraining on 32 nodes (256 GPUs) in a SLURM cluster. Requires the 'dinov3' package in the Python module search path. ```shell PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 32 \ --config-file dinov3/configs/train/dinov3_vit7b16_pretrain.yaml \ --output-dir \ train.dataset_path=:root=:extra= ``` -------------------------------- ### Load DinoV3 Adapter from Local Path using PyTorch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Load a DinoV3 adapter model from a local directory. This allows for offline use or integration into local development workflows. ```python import torch REPO_DIR = "/path/to/your/local/dinov3/repo" # Example: Load a specific adapter from a local path # Replace with the actual path to your downloaded weights local_weights_path = "" adapter = torch.hub.load(REPO_DIR, "dinov3_adapter_vit_small_patch16_224", source="local", pretrained=True, weights=local_weights_path) # Note: The adapter model name might vary based on the specific adapter and backbone it's designed for. ``` -------------------------------- ### Display Sample Video Frames Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb Selects and displays a specified number of sample frames from a list of video frames. Requires the 'mp' library for image display. ```python num_selected_frames = 4 selected_frames = np.linspace(0, num_frames - 1, num_selected_frames, dtype=int) mp.show_images( [frames[int(i)] for i in selected_frames], titles=[f"Frame {i}" for i in selected_frames], height=DISPLAY_HEIGHT, ) ``` -------------------------------- ### Load DINOv3 ConvNeXt Models with PyTorch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Loads various DINOv3 ConvNeXt models using torch.hub. Requires specifying the checkpoint path or URL. ```python dinov3_convnext_tiny = torch.hub.load(REPO_DIR, 'dinov3_convnext_tiny', source='local', weights=) dinov3_convnext_small = torch.hub.load(REPO_DIR, 'dinov3_convnext_small', source='local', weights=) dinov3_convnext_base = torch.hub.load(REPO_DIR, 'dinov3_convnext_base', source='local', weights=) dinov3_convnext_large = torch.hub.load(REPO_DIR, 'dinov3_convnext_large', source='local', weights=) ``` -------------------------------- ### High-Resolution Adaptation for DINOv3 ViT-7B/16 Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Conducts high-resolution adaptation for DINOv3 ViT-7B/16 training on 32 nodes (256 GPUs). Requires paths to the Gram teacher checkpoint and student resume checkpoint. ```shell PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 32 \ --config-file dinov3/configs/train/dinov3_vit7b16_high_res_adapt.yaml \ --output-dir \ train.dataset_path=:root=:extra= \ gram.ckpt= \ student.resume_from_teacher_chkpt= ``` -------------------------------- ### Load Image and Perform Foreground Segmentation Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/foreground_segmentation.ipynb Loads an image from a URL, preprocesses it, extracts features using a DinoV3 model, predicts foreground probabilities with a classifier, applies a median filter, and displays the input image, raw foreground scores, and filtered scores. ```python test_image_fpath = "https://dl.fbaipublicfiles.com/dinov3/notebooks/foreground_segmentation/test_image.jpg" def load_image_from_url(url: str) -> Image: with urllib.request.urlopen(url) as f: return Image.open(f).convert("RGB") test_image = load_image_from_url(test_image_fpath) test_image_resized = resize_transform(test_image) test_image_normalized = TF.normalize(test_image_resized, mean=IMAGENET_MEAN, std=IMAGENET_STD) with torch.inference_mode(): with torch.autocast(device_type='cuda', dtype=torch.float32): feats = model.get_intermediate_layers(test_image_normalized.unsqueeze(0).cuda(), n=range(n_layers), reshape=True, norm=True) x = feats[-1].squeeze().detach().cpu() dim = x.shape[0] x = x.view(dim, -1).permute(1, 0) h_patches, w_patches = [int(d / PATCH_SIZE) for d in test_image_resized.shape[1:]] fg_score = clf.predict_proba(x)[:, 1].reshape(h_patches, w_patches) fg_score_mf = torch.from_numpy(signal.medfilt2d(fg_score, kernel_size=3)) plt.figure(figsize=(9, 3), dpi=300) plt.subplot(1, 3, 1) plt.axis('off') plt.imshow(test_image_resized.permute(1, 2, 0)) plt.title('input image') plt.subplot(1, 3, 2) plt.axis('off') plt.imshow(fg_score) plt.title('foreground score') plt.subplot(1, 3, 3) plt.axis('off') plt.imshow(fg_score_mf) plt.title('+ median filter') plt.show() ``` -------------------------------- ### Full 3-Stage ViT-7B/16 Training: Stage 2 Gram Anchoring Source: https://context7.com/facebookresearch/dinov3/llms.txt This command executes Stage 2, Gram anchoring, for ViT-7B/16 training. It depends on the checkpoint from Stage 1 and uses a custom dataset. ```bash PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 32 \ --config-file dinov3/configs/train/dinov3_vit7b16_gram_anchor.yaml \ --output-dir /outputs/vit7b_stage2 \ train.dataset_path=CustomDataset:root=/data/lvd:extra=/data/lvd \ gram.ckpt=/outputs/vit7b_stage1/eval/training_xxx/teacher_checkpoint.pth ``` -------------------------------- ### Compute Foreground Probability and Visualize Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/pca.ipynb Calculates foreground probability using a classifier and visualizes the original image alongside the foreground score. Requires image resizing and patch size definitions. ```python h_patches, w_patches = [int(d / PATCH_SIZE) for d in image_resized.shape[1:]] fg_score = clf.predict_proba(x)[:, 1].reshape(h_patches, w_patches) fg_score_mf = torch.from_numpy(signal.medfilt2d(fg_score, kernel_size=3)) plt.rcParams.update({ "xtick.labelsize": 5, "ytick.labelsize": 5, "axes.labelsize": 5, "axes.titlesize": 4, }) plt.figure(figsize=(4, 2), dpi=300) plt.subplot(1, 2, 1) plt.imshow(image) plt.axis('off') plt.title(f"Image, Size {image.size}") plt.subplot(1, 2, 2) plt.imshow(fg_score_mf) plt.title(f"Foreground Score, Size {tuple(fg_score_mf.shape)}") plt.colorbar() plt.axis('off') plt.show() ``` -------------------------------- ### ConvNeXt Multi-Distillation (Single Node Test) Source: https://context7.com/facebookresearch/dinov3/llms.txt This command runs a test for ConvNeXt multi-distillation on a single node. It utilizes a specific configuration file and enables multi-distillation mode. ```bash PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \ --nodes 1 \ --config-file dinov3/configs/train/distillation_convnext/convnext_tiny_p16.yaml \ --output-dir /outputs/convnext_tiny \ --multi-distillation \ train.dataset_path=ImageNet22k:root=/data/imagenet:extra=/data/imagenet ``` -------------------------------- ### Load DINOv3 ViT Models with PyTorch Hub Source: https://github.com/facebookresearch/dinov3/blob/main/README.md Loads various DINOv3 ViT models using torch.hub. Requires specifying the checkpoint path or URL. ```python dinov3_vits16 = torch.hub.load(REPO_DIR, 'dinov3_vits16', source='local', weights=) dinov3_vits16plus = torch.hub.load(REPO_DIR, 'dinov3_vits16plus', source='local', weights=) dinov3_vitb16 = torch.hub.load(REPO_DIR, 'dinov3_vitb16', source='local', weights=) dinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=) dinov3_vith16plus = torch.hub.load(REPO_DIR, 'dinov3_vith16plus', source='local', weights=) dinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=) ``` -------------------------------- ### Initialize Feature and Probability Queues Source: https://github.com/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb Initializes empty lists to store features and probabilities for context frames. These queues will be used for propagating segmentation information. ```python features_queue: list[Tensor] = [] probs_queue: list[Tensor] = [] ``` -------------------------------- ### Load DINOv3 ViT Backbone (SAT-493M) Source: https://context7.com/facebookresearch/dinov3/llms.txt Loads a pretrained DINOv3 Vision Transformer backbone (ViT-L) specifically from the SAT-493M satellite imagery dataset. It highlights the use of satellite-specific normalization statistics. ```python import torch from torchvision.transforms import v2 from dinov3.hub.backbones import dinov3_vitl16, Weights # Load ViT-L pretrained on satellite data model = dinov3_vitl16( pretrained=True, weights=Weights.SAT493M, weights="/path/to/dinov3_vitl16_pretrain_sat493m-eadcf0ff.pth", ) model.eval().cuda() ``` -------------------------------- ### Load DINOv3 ConvNeXt Backbone (LVD-1689M) Source: https://context7.com/facebookresearch/dinov3/llms.txt Loads a pretrained DINOv3 ConvNeXt backbone (Base variant) from the LVD-1689M dataset. It applies standard image transformations and demonstrates how to obtain stage-wise features from the model. ```python import torch from torchvision.transforms import v2 # Load ConvNeXt-Base pretrained on web images REPO_DIR = "/path/to/dinov3" model = torch.hub.load( REPO_DIR, "dinov3_convnext_base", source="local", weights="/path/to/dinov3_convnext_base_pretrain_lvd1689m-801f2ba9.pth", ) model.eval().cuda() transform = v2.Compose([ v2.ToImage(), v2.Resize((256, 256), antialias=True), v2.CenterCrop(224), v2.ToDtype(torch.float32, scale=True), v2.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ]) from PIL import Image x = transform(Image.open("dog.jpg").convert("RGB")).unsqueeze(0).cuda() with torch.inference_mode(): # ConvNeXt returns a list of stage features out = model(x) print("Output type:", type(out)) # list for i, feat in enumerate(out): print(f" Stage {i}: {feat.shape}") # Stage 0: torch.Size([1, 128, 56, 56]) # Stage 1: torch.Size([1, 256, 28, 28]) # Stage 2: torch.Size([1, 512, 14, 14]) # Stage 3: torch.Size([1, 1024, 7, 7]) ```