### Install Training Dependencies

Source: https://github.com/microsoft/moge/blob/main/docs/train.md

Installs required Python packages for training and finetuning the MoGe model. Ensure these are installed in addition to those in pyproject.toml.

```bash
accelerate
sympy
mlflow
```

--------------------------------

### Launch Training Script

Source: https://github.com/microsoft/moge/blob/main/docs/train.md

Use this command to launch the main training script. Ensure you have accelerate installed for distributed training. Adjust parameters like num_processes and batch_size_forward as needed.

```bash
accelerate launch \
    --num_processes 8 \
    moge/scripts/train.py \
    --config configs/train/v1.json \
    --workspace workspace/debug \
    --gradient_accumulation_steps 2 \
    --batch_size_forward 2 \
    --checkpoint latest \
    --enable_gradient_checkpointing True \
    --vis_every 1000 \
    --enable_mlflow True
```

--------------------------------

### Clone Repository and Install Dependencies

Source: https://github.com/microsoft/moge/blob/main/README.md

Clone the MoGe repository and install all necessary dependencies using the provided requirements file. This method is suitable for development or if you need direct access to the source code.

```bash
git clone https://github.com/microsoft/MoGe.git
cd MoGe
pip install -r requirements.txt
```

--------------------------------

### Infer Baselines with Example Images

Source: https://github.com/microsoft/moge/blob/main/docs/eval.md

Use this script to run inference on a small set of example images for a given baseline. This is useful for checking baseline implementation correctness. It includes options for specifying the baseline, input directory, output directory, pretrained model, and output map/ply formats.

```bash
python moge/scripts/infer_baselines.py --baseline baselines/moge.py --input example_images/ --output infer_outupt/moge --pretrained Ruicheng/moge-vitl --maps --ply
```

--------------------------------

### Install MoGe via pip

Source: https://github.com/microsoft/moge/blob/main/README.md

Use this command to install the MoGe library directly from GitHub using pip. Ensure you have a compatible Python environment.

```bash
pip install git+https://github.com/microsoft/MoGe.git
```

--------------------------------

### Launch Gradio Web Demo

Source: https://context7.com/microsoft/moge/llms.txt

Start an interactive web demo for uploading images and viewing 3D reconstructions. Options include enabling public sharing, using FP16 for faster inference, and selecting specific model versions.

```bash
moge app
```

```bash
moge app --share
```

```bash
moge app --fp16
```

```bash
moge app --version v1
```

```bash
moge app --pretrained Ruicheng/moge-2-vitb-normal
```

--------------------------------

### MoGe Training Configuration Example

Source: https://github.com/microsoft/moge/blob/main/docs/train.md

This JSON object defines the hyperparameters for training the MoGe model. It includes settings for data augmentation, model architecture, optimizer, learning rate scheduler, and various loss functions.

```json
{
    "data": {
        "aspect_ratio_range": [0.5, 2.0],               # Range of aspect ratio of sampled images
        "area_range": [250000, 1000000],                # Range of sampled image area in pixels
        "clamp_max_depth": 1000.0,                      # Maximum far/near
        "center_augmentation": 0.5,                     # Ratio of center crop augmentation
        "fov_range_absolute": [1, 179],                 # Absolute range of FOV in degrees
        "fov_range_relative": [0.01, 1.0],              # Relative range of FOV to the original FOV
        "image_augmentation": ["jittering", "jpeg_loss", "blurring"],       # List of image augmentation techniques
        "datasets": [
            {
                "name": "TartanAir",                    # Name of the dataset. Name it as you like.
                "path": "data/TartanAir",               # Path to the dataset
                "label_type": "synthetic",              # Label type for this dataset. Losses will be applied accordingly. see "loss" config
                "weight": 4.8,                          # Probability of sampling this dataset
                "index": ".index.txt",                  # File name of the index file.  Defaults to .index.txt
                "depth": "depth.png",                   # File name of depth images. Defaults to depth.png
                "center_augmentation": 0.25,            # Below are dataset-specific hyperparameters. Overriding the global ones above.
                "fov_range_absolute": [30, 150],
                "fov_range_relative": [0.5, 1.0],
                "image_augmentation": ["jittering", "jpeg_loss", "blurring", "shot_noise"]
            }
        ]
    },
    "model_version": "v1",                 # Model version. If you have multiple model variants, you can use this to switch between them.
    "model": {                             # Model hyperparameters. Will be passed to Model __init__() as kwargs.
        "encoder": "dinov2_vitl14",
        "remap_output": "exp",
        "intermediate_layers": 4,
        "dim_upsample": [256, 128, 64],
        "dim_times_res_block_hidden": 2,
        "num_res_blocks": 2,
        "num_tokens_range": [1200, 2500],
        "last_conv_channels": 32,
        "last_conv_size": 1
    },
    "optimizer": {                          # Reflection-like optimizer configurations. See moge.train.utils.py build_optimizer() for details.
        "type": "AdamW",
        "params": [
            {"params": {"include": ["*"], "exclude": ["*backbone.*"]}, "lr": 1e-4},
            {"params": {"include": ["*backbone.*"]}, "lr": 1e-5}
        ]
    },
    "lr_scheduler": {                       # Reflection-like lr_scheduler configurations. See moge.train.utils.py build_lr_scheduler() for details.
        "type": "SequentialLR",
        "params": {
            "schedulers": [
                {"type": "LambdaLR", "params": {"lr_lambda": ["1.0", "max(0.0, min(1.0, (epoch - 1000) / 1000))"]}},
                {"type": "StepLR", "params": {"step_size": 25000, "gamma": 0.5}}
            ],
            "milestones": [2000]
        }
    },
    "low_resolution_training_steps": 50000, # Total number of low-resolution training steps. It makes the early stage training faster. Later stage training on varying size images will be slower.
    "loss": {
        "invalid": {},
        "synthetic": {                      # Below are loss hyperparameters
            "global": {"function": "affine_invariant_global_loss", "weight": 1.0, "params": {"align_resolution": 32}},
            "patch_4": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 4, "align_resolution": 16, "num_patches": 16}},
            "patch_16": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 16, "align_resolution": 8, "num_patches": 256}},
            "patch_64": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 64, "align_resolution": 4, "num_patches": 4096}},
            "normal": {"function": "normal_loss", "weight": 1.0},
            "mask": {"function": "mask_l2_loss", "weight": 1.0}
        },
        "sfm": {
            "global": {"function": "affine_invariant_global_loss", "weight": 1.0, "params": {"align_resolution": 32}},
            "patch_4": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 4, "align_resolution": 16, "num_patches": 16}},
            "patch_16": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 16, "align_resolution": 8, "num_patches": 256}},
            "mask": {"function": "mask_l2_loss", "weight": 1.0}
        }
    }
}
```

--------------------------------

### Saving 3D Outputs with Utility Functions

Source: https://context7.com/microsoft/moge/llms.txt

Save inference results as 3D mesh files (GLB) or point clouds (PLY) using provided utility functions. This example demonstrates loading an image, performing inference, and cleaning edges for better mesh quality.

```python
import cv2
import torch
import numpy as np
import utils3d
from moge.model.v2 import MoGeModel
from moge.utils.io import save_glb, save_ply

device = torch.device("cuda")
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)

# Load and process image
image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB)
height, width = image.shape[:2]
image_tensor = torch.tensor(image / 255, dtype=torch.float32, device=device).permute(2, 0, 1)

output = model.infer(image_tensor, use_fp16=True)
points = output["points"].cpu().numpy()
depth = output["depth"].cpu().numpy()
mask = output["mask"].cpu().numpy()
normal = output.get("normal", None)
if normal is not None:
    normal = normal.cpu().numpy()

# Clean edges for better mesh quality
mask_cleaned = mask & ~utils3d.np.depth_map_edge(depth, rtol=0.04)

```

--------------------------------

### Export MoGe Models to ONNX

Source: https://context7.com/microsoft/moge/llms.txt

Export MoGe models to ONNX format for deployment. This involves disabling xformers for compatibility and using `torch.onnx.export`. Examples show exporting with dynamic input shapes and fixed input shapes.

```python
import os
os.environ['XFORMERS_DISABLED'] = '1'  # Disable xformers for ONNX compatibility

import torch
from moge.model.v2 import MoGeModel

# Load model
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal")
model.onnx_compatible_mode = True  # Enable ONNX compatible mode

# Export with dynamic input shape and variable tokens
torch.onnx.export(
    model,
    (torch.rand(1, 3, 518, 518), torch.tensor(1800)),
    "moge-2-vitl-normal.onnx",
    input_names=['image', 'num_tokens'],
    output_names=['points', 'normal', 'mask', 'metric_scale'],
    dynamic_axes={'image': {0: 'batch_size', 2: 'height', 3: 'width'}},
    opset_version=14
)

# Export with fixed input shape for optimized inference
class MoGeStatic(MoGeModel):
    def forward(self, image: torch.Tensor):
        return super().forward(image, 1800)  # Fixed token count

model_static = MoGeStatic.from_pretrained("Ruicheng/moge-2-vits-normal")
model_static.onnx_compatible_mode = True

torch.onnx.export(
    model_static,
    (torch.rand(1, 3, 518, 518),),
    "moge-2-static.onnx",
    input_names=['image'],
    output_names=['points', 'normal', 'mask', 'metric_scale'],
    opset_version=14
)
```

--------------------------------

### Launch Finetuning Script

Source: https://github.com/microsoft/moge/blob/main/docs/train.md

This command is used for finetuning a pre-trained MoGe model. Specify the path to your downloaded checkpoint using the --checkpoint argument. Lower learning rates and a minimum batch size of 32 are recommended for finetuning.

```bash
accelerate launch \
    --num_processes 8 \
    moge/scripts/train.py \
    --config configs/train/v1.json \
    --workspace workspace/debug \
    --gradient_accumulation_steps 2 \
    --batch_size_forward 2 \
    --checkpoint pretrained/moge-vitl.pt \
    --enable_gradient_checkpointing True \
    --vis_every 1000 \
    --enable_mlflow True
```

--------------------------------

### Launch Distributed Training

Source: https://context7.com/microsoft/moge/llms.txt

Initiates distributed training for MoGe models using the accelerate library. Requires a specific dataset structure and configuration files.

```bash
# Prepare dataset structure:
# dataset/
# ├── .index.txt          # List of instance paths
# ├── folder1/
# │   ├── instance1/
# │   │   ├── image.jpg   # RGB image
# │   │   ├── depth.png   # 16-bit depth (use moge.utils.io.write_depth)
# │   │   └── meta.json   # {"intrinsics": [[fx,0,cx],[0,fy,cy],[0,0,1]]}

# Launch distributed training on 8 GPUs
accelerate launch \
    --num_processes 8 \
    moge/scripts/train.py \
    --config configs/train/v2.json \
    --workspace workspace/moge-v2 \
    --gradient_accumulation_steps 2 \
    --batch_size_forward 2 \
    --checkpoint latest \
    --enable_gradient_checkpointing True \
    --vis_every 1000 \
    --enable_mlflow True
```

--------------------------------

### Run MoGe Gradio Demo

Source: https://github.com/microsoft/moge/blob/main/README.md

Launch the interactive Gradio demo for MoGe. By default, it runs the MoGe-2 demo. The demo can also be run directly from the repository.

```bash
# Using the command line tool
moge app        # will run MoGe-2 demo by default.

# In this repo
python moge/scripts/app.py   # --share for Gradio public sharing

```

--------------------------------

### Unzip Downloaded Datasets

Source: https://github.com/microsoft/moge/blob/main/docs/eval.md

After downloading, navigate to the `data/eval` directory and unzip all downloaded files. Optionally, remove the zip files if they are no longer needed.

```bash
cd data/eval  
unzip '*.zip'
# rm *.zip # if you don't keep the zip files
```

--------------------------------

### Download Datasets with Huggingface CLI

Source: https://github.com/microsoft/moge/blob/main/docs/eval.md

Use this command to download processed datasets from Huggingface. Ensure the data is placed in the `data/eval` directory.

```bash
mkdir -p data/eval
huggingface-cli download Ruicheng/monocular-geometry-evaluation --repo-type dataset --local-dir data/eval --local-dir-use-symlinks False
```

--------------------------------

### Visualize Instance Data

Source: https://github.com/microsoft/moge/blob/main/docs/train.md

Use this script to visualize instance data and check data quality. It exports the instance as a PLY file for point cloud visualization. Specify the path to the instance and optionally an output directory.

```bash
python moge/scripts/vis_data.py PATH_TO_INSTANCE --ply [-o SOMEWHERE_ELSE_TO_SAVE_VIS]
```

--------------------------------

### MoGe Infer Command Help

Source: https://github.com/microsoft/moge/blob/main/README.md

Displays detailed options for the `moge infer` command, including input/output paths, model parameters, and output formats.

```bash
Usage: moge infer [OPTIONS]

  Inference script

Options:
  -i, --input PATH            Input image or folder path. "jpg" and "png" are
                              supported.
  --fov_x FLOAT               If camera parameters are known, set the
                              horizontal field of view in degrees. Otherwise,
                              MoGe will estimate it.
  -o, --output PATH           Output folder path
  --pretrained TEXT           Pretrained model name or path. If not provided,
                              the corresponding default model will be chosen.
  --version [v1|v2]           Model version. Defaults to "v2"
  --device TEXT               Device name (e.g. "cuda", "cuda:0", "cpu").
                              Defaults to "cuda"
  --fp16                      Use fp16 precision for much faster inference.
  --resize INTEGER            Resize the image(s) & output maps to a specific
                              size. Defaults to None (no resizing).
  --resolution_level INTEGER  An integer [0-9] for the resolution level for
                              inference. Higher value means more tokens and
                              the finer details will be captured, but
                              inference can be slower. Defaults to 9. Note
                              that it is irrelevant to the output size, which
                              is always the same as the input size.
                              `resolution_level` actually controls
                              `num_tokens`. See `num_tokens` for more details.
  --num_tokens INTEGER        number of tokens used for inference. A integer
                              in the (suggested) range of `[1200, 2500]`. 
                              `resolution_level` will be ignored if
                              `num_tokens` is provided. Default: None
  --threshold FLOAT           Threshold for removing edges. Defaults to 0.01.
                              Smaller value removes more edges. "inf" means no
                              thresholding.
  --maps                      Whether to save the output maps (image, point
                              map, depth map, normal map, mask) and fov.
  --glb                       Whether to save the output as a.glb file. The
                              color will be saved as a texture.
  --ply                       Whether to save the output as a.ply file. The
                              color will be saved as vertex colors.
  --show                      Whether show the output in a window. Note that
                              this requires pyglet<2 installed as required by
                              trimesh.
  --help                      Show this message and exit.
```

--------------------------------

### Depth Map I/O Utilities

Source: https://context7.com/microsoft/moge/llms.txt

Provides utilities for reading and writing depth maps in MoGe's optimized 16-bit PNG format. Supports logarithmic encoding and handles special values like NaN and Inf.

```python
from moge.utils.io import read_depth, write_depth
import numpy as np

# Create sample depth map with special values
depth = np.random.uniform(0.5, 50.0, (480, 640)).astype(np.float32)
depth[100:150, 200:250] = np.nan   # Invalid/unknown regions
depth[300:320, 400:420] = np.inf  # Infinite depth (sky)

# Write depth map (stores in logarithmic scale, handles NaN/Inf)
write_depth("output/depth.png", depth, max_range=1e5, compression_level=7)

# Read depth map back
depth_loaded = read_depth("output/depth.png")

# Values are preserved including special cases
print(f"NaN preserved: {np.isnan(depth_loaded[125, 225])}")
print(f"Inf preserved: {np.isinf(depth_loaded[310, 410])}")
print(f"Max error: {np.nanmax(np.abs(depth - depth_loaded)[np.isfinite(depth)]):.6f}")
```

--------------------------------

### Evaluation Script Usage

Source: https://github.com/microsoft/moge/blob/main/docs/eval.md

This shows the available options for the `eval_baseline.py` script, including paths for baseline, configuration, and output, as well as flags for oracle mode and dumping predictions or ground truth.

```bash
Usage: eval_baseline.py [OPTIONS]

  Evaluation script.

Options:
  --baseline PATH  Path to the baseline model python code.
  --config PATH    Path to the evaluation configurations. Defaults to
                   "configs/eval/all_benchmarks.json".
  --output PATH    Path to the output json file.
  --oracle         Use oracle mode for evaluation, i.e., use the GT intrinsics
                   input.
  --dump_pred      Dump predition results.
  --dump_gt        Dump ground truth.
  --help           Show this message and exit.
```

--------------------------------

### Evaluate Depth Anything V2 Model

Source: https://github.com/microsoft/moge/blob/main/docs/eval.md

Evaluate the Depth Anything V2 model using the provided script. Note that this model uses affine disparity. Specify the baseline script, configuration, and output path.

```bash
python moge/scripts/eval_baseline.py --baseline baselines/da_v2.py --config configs/eval/all_benchmarks.json --output eval_output/da_v2.json
```

--------------------------------

### Evaluate MoGe Performance on Benchmarks

Source: https://context7.com/microsoft/moge/llms.txt

Evaluates MoGe model performance across various depth estimation benchmarks. Requires downloading evaluation datasets and provides options for oracle mode and dumping predictions.

```bash
# Download evaluation datasets
mkdir -p data/eval
huggingface-cli download Ruicheng/monocular-geometry-evaluation \
    --repo-type dataset \
    --local-dir data/eval \
    --local-dir-use-symlinks False
cd data/eval && unzip '*.zip'

# Evaluate MoGe on all benchmarks
python moge/scripts/eval_baseline.py \
    --baseline baselines/moge.py \
    --config configs/eval/all_benchmarks.json \
    --output eval_output/moge.json \
    --pretrained Ruicheng/moge-2-vitl-normal \
    --resolution_level 9

# Evaluate with oracle mode (ground truth intrinsics)
python moge/scripts/eval_baseline.py \
    --baseline baselines/moge.py \
    --config configs/eval/all_benchmarks.json \
    --output eval_output/moge_oracle.json \
    --pretrained Ruicheng/moge-2-vitl-normal \
    --oracle

# Dump predictions for visualization
python moge/scripts/eval_baseline.py \
    --baseline baselines/moge.py \
    --config configs/eval/all_benchmarks.json \
    --output eval_output/moge.json \
    --pretrained Ruicheng/moge-2-vitl-normal \
    --dump_pred
```

--------------------------------

### Load Pretrained MoGe Model

Source: https://context7.com/microsoft/moge/llms.txt

Load MoGe models from Hugging Face Hub or a local checkpoint using the `from_pretrained` method. Custom model configurations can be applied using `model_kwargs`.

```python
import torch
from moge.model.v2 import MoGeModel

# Load MoGe-2 with normal estimation (full capabilities)
device = torch.device("cuda")
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)

# Alternative models available:
# - "Ruicheng/moge-2-vitl"        # MoGe-2 without normal head (326M params)
# - "Ruicheng/moge-2-vitb-normal" # Smaller model (104M params)
# - "Ruicheng/moge-2-vits-normal" # Smallest model (35M params)
# - "Ruicheng/moge-vitl"          # MoGe-1 (314M params)

# Load from local checkpoint
model = MoGeModel.from_pretrained("/path/to/model.pt").to(device)

# Load with custom model configuration overrides
model = MoGeModel.from_pretrained(
    "Ruicheng/moge-2-vitl-normal",
    model_kwargs={"num_tokens_range": [1000, 3000]}
).to(device)
```

--------------------------------

### Load MoGe Model and Infer on Image

Source: https://github.com/microsoft/moge/blob/main/README.md

Load a pretrained MoGe model from Hugging Face and perform inference on a single image. Ensure the input image is preprocessed to a tensor with RGB values normalized to [0, 1]. The output contains point maps, depth, mask, and optionally normal maps and intrinsics.

```python
import cv2
import torch
# from moge.model.v1 import MoGeModel
from moge.model.v2 import MoGeModel # Let's try MoGe-2

device = torch.device("cuda")

# Load the model from huggingface hub (or load from local).
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)                             

# Read the input image and convert to tensor (3, H, W) with RGB values normalized to [0, 1]
input_image = cv2.cvtColor(cv2.imread("PATH_TO_IMAGE.jpg"), cv2.COLOR_BGR2RGB)                       
input_image = torch.tensor(input_image / 255, dtype=torch.float32, device=device).permute(2, 0, 1)    

# Infer 
output = model.infer(input_image)
"""
`output` has keys "points", "depth", "mask", "normal" (optional) and "intrinsics",
The maps are in the same size as the input image. 
{
    "points": (H, W, 3),    # point map in OpenCV camera coordinate system (x right, y down, z forward). For MoGe-2, the point map is in metric scale.
    "depth": (H, W),        # depth map
    "normal": (H, W, 3)     # normal map in OpenCV camera coordinate system. (available for MoGe-2-normal)
    "mask": (H, W),         # a binary mask for valid pixels. 
    "intrinsics": (3, 3),   # normalized camera intrinsics
}
"""

```

--------------------------------

### Command Line Inference with MoGe

Source: https://context7.com/microsoft/moge/llms.txt

Use the `moge infer` command for image processing and saving outputs like depth maps, point clouds, and 3D meshes. Options include specifying input/output paths, enabling different output formats, using FP16 for faster inference, setting camera FOV, adjusting resolution, selecting model versions, and controlling edge removal thresholds.

```bash
moge infer -i ./images/ -o ./output/ --maps --glb --ply
```

```bash
moge infer -i photo.jpg -o ./output/ --fp16 --maps
```

```bash
moge infer -i photo.jpg -o ./output/ --fov_x 60.0 --maps --glb
```

```bash
moge infer -i ./images/ -o ./output/ --resolution_level 7 --fp16 --maps
```

```bash
moge infer -i ./images/ -o ./output/ --resize 800 --maps --glb
```

```bash
moge infer -i ./images/ -o ./output/ --version v1 --maps
```

```bash
moge infer -i ./images/ -o ./output/ --pretrained Ruicheng/moge-2-vitb-normal --maps
```

```bash
moge infer -i photo.jpg -o ./output/ --threshold 0.02 --glb
```

```bash
moge infer -i photo.jpg -o ./output/ --show
```

--------------------------------

### Evaluate MoGe Model

Source: https://github.com/microsoft/moge/blob/main/docs/eval.md

Run the evaluation script for the MoGe model. This command specifies the baseline script, configuration file, output path, and pretrained model details.

```bash
python moge/scripts/eval_baseline.py --baseline baselines/moge.py --config configs/eval/all_benchmarks.json --output eval_output/moge.json --pretrained Ruicheng/moge-vitl --resolution_level 9
```

--------------------------------

### Build Mesh from Depth/Point Map

Source: https://context7.com/microsoft/moge/llms.txt

Constructs a 3D mesh from point data, depth images, and optional normal maps. Handles coordinate system conversion for OpenGL export and saves the mesh as GLB or point cloud as PLY.

```python
if normal is not None:
    faces, vertices, vertex_colors, vertex_uvs, vertex_normals = utils3d.np.build_mesh_from_map(
        points,
        image.astype(np.float32) / 255,
        utils3d.np.uv_map(height, width),
        normal,
        mask=mask_cleaned,
        tri=True
    )
else:
    faces, vertices, vertex_colors, vertex_uvs = utils3d.np.build_mesh_from_map(
        points,
        image.astype(np.float32) / 255,
        utils3d.np.uv_map(height, width),
        mask=mask_cleaned,
        tri=True
    )
    vertex_normals = None

# Convert to OpenGL coordinate system for export
vertices = vertices * [1, -1, -1]
vertex_uvs = vertex_uvs * [1, -1] + [0, 1]
if vertex_normals is not None:
    vertex_normals = vertex_normals * [1, -1, -1]

# Save as GLB mesh with texture
save_glb("output/mesh.glb", vertices, faces, vertex_uvs, image, vertex_normals)

# Save as PLY point cloud with colors
save_ply("output/pointcloud.ply", vertices, np.zeros((0, 3), dtype=np.int32), vertex_colors, vertex_normals)
```

--------------------------------

### Run MoGe Inference

Source: https://github.com/microsoft/moge/blob/main/README.md

Use this command to perform inference on input images or a folder of images. The output will be saved to the specified folder. Requires pyglet < 2.0.

```bash
moge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --show
```

--------------------------------

### Fine-tune Pretrained MoGe Model

Source: https://context7.com/microsoft/moge/llms.txt

Fine-tunes a pretrained MoGe checkpoint on a custom dataset using lower learning rates. Ensure pretrained weights are placed in the specified directory.

```bash
# Download pretrained weights
# Place in: pretrained/moge-2-vitl-normal.pt

# Fine-tune with lower learning rate (recommended: 1e-5 for head, 1e-6 for backbone)
accelerate launch \
    --num_processes 8 \
    moge/scripts/train.py \
    --config configs/train/v2.json \
    --workspace workspace/moge-finetuned \
    --gradient_accumulation_steps 2 \
    --batch_size_forward 2 \
    --checkpoint pretrained/moge-2-vitl-normal.pt \
    --enable_gradient_checkpointing True \
    --vis_every 1000 \
    --enable_mlflow True
```

--------------------------------

### Run MoGe Panorama Inference

Source: https://github.com/microsoft/moge/blob/main/README.md

This experimental script processes 360° panorama images by splitting them into multiple views, inferring on each, and combining the results into a panorama depth and point map. The input image must be in spherical parameterization.

```bash
moge infer_panorama --help
```

--------------------------------

### Run MoGe Inference via Command Line

Source: https://github.com/microsoft/moge/blob/main/README.md

Execute the inference script from the command line to process images. This command allows saving various output formats including maps, GLB, and PLY files.

```bash
# Save the output [maps], [glb] and [ply] files
moge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --maps --glb --ply

```

--------------------------------

### MoGe Inference API

Source: https://github.com/microsoft/moge/blob/main/README.md

Inference script to process images and show results in a window. Requires pyglet < 2.0.

```APIDOC
## POST /api/infer

### Description
Performs inference on input images or a folder of images and displays the results in a window.

### Method
POST

### Endpoint
/api/infer

### Parameters
#### Query Parameters
- **-i, --input** (PATH) - Required - Input image or folder path. "jpg" and "png" are supported.
- **-o, --output** (PATH) - Required - Output folder path
- **--fov_x** (FLOAT) - Optional - If camera parameters are known, set the horizontal field of view in degrees. Otherwise, MoGe will estimate it.
- **--pretrained** (TEXT) - Optional - Pretrained model name or path. If not provided, the corresponding default model will be chosen.
- **--version** (v1|v2) - Optional - Model version. Defaults to "v2"
- **--device** (TEXT) - Optional - Device name (e.g. "cuda", "cuda:0", "cpu"). Defaults to "cuda"
- **--fp16** (BOOLEAN) - Optional - Use fp16 precision for much faster inference.
- **--resize** (INTEGER) - Optional - Resize the image(s) & output maps to a specific size. Defaults to None (no resizing).
- **--resolution_level** (INTEGER) - Optional - An integer [0-9] for the resolution level for inference. Higher value means more tokens and the finer details will be captured, but inference can be slower. Defaults to 9. Note that it is irrelevant to the output size, which is always the same as the input size. `resolution_level` actually controls `num_tokens`.
- **--num_tokens** (INTEGER) - Optional - number of tokens used for inference. A integer in the (suggested) range of `[1200, 2500]`. `resolution_level` will be ignored if `num_tokens` is provided. Default: None
- **--threshold** (FLOAT) - Optional - Threshold for removing edges. Defaults to 0.01. Smaller value removes more edges. "inf" means no thresholding.
- **--maps** (BOOLEAN) - Optional - Whether to save the output maps (image, point map, depth map, normal map, mask) and fov.
- **--glb** (BOOLEAN) - Optional - Whether to save the output as a.glb file. The color will be saved as a texture.
- **--ply** (BOOLEAN) - Optional - Whether to save the output as a.ply file. The color will be saved as vertex colors.
- **--show** (BOOLEAN) - Optional - Whether show the output in a window. Note that this requires pyglet<2 installed as required by trimesh.

### Request Example
```json
{
  "input": "/path/to/images/",
  "output": "/path/to/output/",
  "show": true
}
```

### Response
#### Success Response (200)
- **message** (string) - Inference completed successfully.

#### Response Example
```json
{
  "message": "Inference completed successfully."
}
```
```

--------------------------------

### Enable Gradient Checkpointing for MoGe Training

Source: https://context7.com/microsoft/moge/llms.txt

Reduce GPU memory usage during MoGe model training by enabling gradient checkpointing and PyTorch native scaled dot-product attention. This trades compute for memory.

```python
import torch
from moge.model.v2 import MoGeModel

model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal")
model.enable_gradient_checkpointing()  # Reduces memory usage during training
model.enable_pytorch_native_sdpa()     # Use PyTorch native scaled dot-product attention

# Now training will use less GPU memory
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)

# Forward pass with gradient checkpointing enabled
image = torch.rand(2, 3, 512, 512, device="cuda")
output = model(image, num_tokens=1800)
loss = output["points"].mean()  # Example loss
loss.backward()  # Gradients computed with checkpointing
```

--------------------------------

### Visualize Depth and Normal Maps with MoGe

Source: https://context7.com/microsoft/moge/llms.txt

Convert depth and normal predictions from the MoGe model into colorized visualizations using OpenCV. Ensure the input image is in RGB format before processing.

```python
import cv2
import torch
from moge.model.v2 import MoGeModel
from moge.utils.vis import colorize_depth, colorize_normal

device = torch.device("cuda")
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)

# Run inference
image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB)
image_tensor = torch.tensor(image / 255, dtype=torch.float32, device=device).permute(2, 0, 1)
output = model.infer(image_tensor, use_fp16=True)

depth = output["depth"].cpu().numpy()
mask = output["mask"].cpu().numpy()
normal = output["normal"].cpu().numpy()

# Colorize depth (turbo colormap, masked regions shown as black)
depth_vis = colorize_depth(depth, mask=mask)  # Returns (H, W, 3) RGB uint8
cv2.imwrite("depth_colored.png", cv2.cvtColor(depth_vis, cv2.COLOR_RGB2BGR))

# Colorize normal map (RGB encoding: R=X, G=Y, B=Z)
normal_vis = colorize_normal(normal)  # Returns (H, W, 3) RGB uint8
cv2.imwrite("normal_colored.png", cv2.cvtColor(normal_vis, cv2.COLOR_RGB2BGR))
```

--------------------------------

### MoGe Panorama Inference API

Source: https://github.com/microsoft/moge/blob/main/README.md

Experimental script to infer on 360-degree panorama images by splitting them into multiple views.

```APIDOC
## POST /api/infer_panorama

### Description
Processes 360-degree panorama images by splitting them into multiple perspective views, inferring on each, and combining the results into panorama depth and point maps. The input image must have spherical parameterization (e.g., environment maps or equirectangular images).

### Method
POST

### Endpoint
/api/infer_panorama

### Parameters
#### Query Parameters
- **-i, --input** (PATH) - Required - Input panorama image path.
- **-o, --output** (PATH) - Required - Output folder path.
- **--fov_x** (FLOAT) - Optional - If camera parameters are known, set the horizontal field of view in degrees. Otherwise, MoGe will estimate it.
- **--pretrained** (TEXT) - Optional - Pretrained model name or path. If not provided, the corresponding default model will be chosen.
- **--version** (v1|v2) - Optional - Model version. Defaults to "v2"
- **--device** (TEXT) - Optional - Device name (e.g. "cuda", "cuda:0", "cpu"). Defaults to "cuda"
- **--fp16** (BOOLEAN) - Optional - Use fp16 precision for much faster inference.
- **--resize** (INTEGER) - Optional - Resize the image(s) & output maps to a specific size. Defaults to None (no resizing).
- **--resolution_level** (INTEGER) - Optional - An integer [0-9] for the resolution level for inference. Higher value means more tokens and the finer details will be captured, but inference can be slower. Defaults to 9. Note that it is irrelevant to the output size, which is always the same as the input size. `resolution_level` actually controls `num_tokens`.
- **--num_tokens** (INTEGER) - Optional - number of tokens used for inference. A integer in the (suggested) range of `[1200, 2500]`. `resolution_level` will be ignored if `num_tokens` is provided. Default: None
- **--threshold** (FLOAT) - Optional - Threshold for removing edges. Defaults to 0.01. Smaller value removes more edges. "inf" means no thresholding.
- **--maps** (BOOLEAN) - Optional - Whether to save the output maps (image, point map, depth map, normal map, mask) and fov.
- **--glb** (BOOLEAN) - Optional - Whether to save the output as a.glb file. The color will be saved as a texture.
- **--ply** (BOOLEAN) - Optional - Whether to save the output as a.ply file. The color will be saved as vertex colors.

### Request Example
```json
{
  "input": "/path/to/panorama.jpg",
  "output": "/path/to/panorama_output/"
}
```

### Response
#### Success Response (200)
- **message** (string) - Panorama inference completed successfully.

#### Response Example
```json
{
  "message": "Panorama inference completed successfully."
}
```
```

--------------------------------

### Moge-2 Project Citation

Source: https://github.com/microsoft/moge/blob/main/README.md

Cite this paper for MoGe-2, which focuses on accurate monocular geometry with metric scale and sharp details. This includes arXiv details.

```bibtex
@misc{wang2025moge2,
      title={MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details},
      author={Ruicheng Wang and Sicheng Xu and Yue Dong and Yu Deng and Jianfeng Xiang and Zelong Lv and Guangzhong Sun and Xin Tong and Jiaolong Yang},
      year={2025},
      eprint={2507.02546},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.02546}
}
```

--------------------------------

### Run Inference on a Single Image

Source: https://context7.com/microsoft/moge/llms.txt

Use the `infer()` method for single-image inference, which includes automatic post-processing like depth recovery and camera intrinsics estimation. Ensure the input image is converted to a tensor with values in [0, 1] and permuted to (3, H, W).

```python
import cv2
import torch
from moge.model.v2 import MoGeModel

device = torch.device("cuda")
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)

# Read and prepare image
input_image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB)
input_tensor = torch.tensor(input_image / 255, dtype=torch.float32, device=device).permute(2, 0, 1)

# Run inference
output = model.infer(
    input_tensor,
    resolution_level=9,      # 0-9, higher = more detail but slower
    force_projection=True,   # Recompute points from depth for consistency
    apply_mask=True,         # Mask invalid pixels with inf
    use_fp16=True            # Use mixed precision for speed
)

# Access outputs
points = output["points"]        # (H, W, 3) - 3D point cloud in metric scale
depth = output["depth"]          # (H, W) - depth map in meters
mask = output["mask"]            # (H, W) - binary validity mask
normal = output["normal"]        # (H, W, 3) - surface normals (if available)
intrinsics = output["intrinsics"]  # (3, 3) - estimated camera intrinsics

print(f"Depth range: {depth[mask].min():.2f}m to {depth[mask].max():.2f}m")
```

--------------------------------

### 360 Panorama Inference

Source: https://context7.com/microsoft/moge/llms.txt

Process 360-degree panorama images by splitting them into perspective views and merging results. Options include custom batch size for GPU memory management, saving intermediate split images, and resizing output resolution.

```bash
moge infer_panorama -i panorama.jpg -o ./output/ --maps --glb
```

```bash
moge infer_panorama -i panorama.jpg -o ./output/ --batch_size 2 --maps
```

```bash
moge infer_panorama -i panorama.jpg -o ./output/ --splitted --maps --ply
```

```bash
moge infer_panorama -i panorama.jpg -o ./output/ --resize 1920 --maps --glb
```

--------------------------------

### Export MoGe Model with Static Shape and Fixed Tokens to ONNX

Source: https://github.com/microsoft/moge/blob/main/docs/onnx.md

This snippet demonstrates exporting a MoGe model to ONNX with static input shapes and a fixed number of tokens. A custom class `MoGeStatic` is used to handle the fixed token count. The exported ONNX model will contain only the raw forward pass.

```python
import os
os.environ['XFORMERS_DISABLED'] = '1'   # Disable xformers
import numpy as np
import torch
from moge.model.v2 import MoGeModel

class MoGeStatic(MoGeModel):
    def forward(self, image: torch.Tensor):
        return super().forward(image, NUM_TOKENS)

NUM_TOKENS = 1800
FIXED_IMAGE_INPUT = torch.rand(1, 3, 518, 518)
PRETRAINED_MODEL = 'Ruicheng/moge-2-vits-normal.pt'
ONNX_FILE = 'moge-2-vits-normal.onnx'

model = MoGeStatic.from_pretrained(PRETRAINED_MODEL)
model.onnx_compatible_mode = True  # Enable ONNX compatible mode

torch.onnx.export(
    model, 
    (FIXED_IMAGE_INPUT,),
    ONNX_FILE,
    input_names=['image'],
    output_names=['points', 'normal', 'mask', 'metric_scale'],
    dynamic_axes=None,
    opset_version=14
)
```

--------------------------------

### Run Inference with Known Camera FOV

Source: https://context7.com/microsoft/moge/llms.txt

Improve accuracy by providing the known horizontal camera field of view (FOV) in degrees to the `infer()` method. This bypasses the model's estimation of FOV.

```python
import torch
from moge.model.v2 import MoGeModel

device = torch.device("cuda")
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)

# Prepare image tensor (3, H, W) with values in [0, 1]
image_tensor = torch.rand(3, 720, 1280, device=device)

# Run inference with known horizontal FOV (degrees)
output = model.infer(
    image_tensor,
    fov_x=60.0,              # Known horizontal field of view in degrees
    resolution_level=9,
    use_fp16=True
)

# The model will use provided FOV instead of estimating it
depth = output["depth"]
points = output["points"]
```

--------------------------------

### Batch Inference

Source: https://context7.com/microsoft/moge/llms.txt

Process multiple images concurrently for higher throughput by stacking them into a batch tensor (B, 3, H, W). Ensure all images in the batch are resized to the same dimensions before stacking.

```python
import cv2
import torch
import numpy as np
from moge.model.v2 import MoGeModel

device = torch.device("cuda")
model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)

# Load multiple images
image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"]
images = []
for path in image_paths:
    img = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (640, 480))  # Resize to same dimensions for batching
    images.append(img)

# Stack into batch tensor (B, 3, H, W)
batch = np.stack(images, axis=0)
batch_tensor = torch.tensor(batch / 255, dtype=torch.float32, device=device).permute(0, 3, 1, 2)

# Run batch inference
output = model.infer(batch_tensor, resolution_level=7, use_fp16=True)
```

--------------------------------

### Export MoGe Model with Dynamic Shape and Variable Tokens to ONNX

Source: https://github.com/microsoft/moge/blob/main/docs/onnx.md

Use this snippet to export a MoGe model to ONNX format with dynamic input shapes and a variable number of tokens. Ensure ONNX compatible mode is enabled on the model before exporting. The exported model will output intermediate predictions, requiring separate post-processing.

```python
import os
os.environ['XFORMERS_DISABLED'] = '1'   # Disable xformers
import numpy as np
import torch
from moge.model.v2 import MoGeModel

PRETRAINED_MODEL = 'Ruicheng/moge-2-vits-normal.pt'
ONNX_FILE = 'moge-2-vits-normal.onnx'

model = MoGeModel.from_pretrained(PRETRAINED_MODEL)
model.onnx_compatible_mode = True  # Enable ONNX compatible mode

torch.onnx.export(
    model, 
    (torch.rand(1, 3, 518, 518), torch.tensor(1800)),
    ONNX_FILE,
    input_names=['image', 'num_tokens'],
    output_names=['points', 'normal', 'mask', 'metric_scale'],
    dynamic_axes={
        'image': {0: 'batch_size', 2: 'height', 3: 'width'},
    },
    opset_version=14
)
```