### Install Training Dependencies Source: https://github.com/microsoft/moge/blob/main/docs/train.md Installs required Python packages for training and finetuning the MoGe model. Ensure these are installed in addition to those in pyproject.toml. ```bash accelerate sympy mlflow ``` -------------------------------- ### Launch Training Script Source: https://github.com/microsoft/moge/blob/main/docs/train.md Use this command to launch the main training script. Ensure you have accelerate installed for distributed training. Adjust parameters like num_processes and batch_size_forward as needed. ```bash accelerate launch \ --num_processes 8 \ moge/scripts/train.py \ --config configs/train/v1.json \ --workspace workspace/debug \ --gradient_accumulation_steps 2 \ --batch_size_forward 2 \ --checkpoint latest \ --enable_gradient_checkpointing True \ --vis_every 1000 \ --enable_mlflow True ``` -------------------------------- ### Clone Repository and Install Dependencies Source: https://github.com/microsoft/moge/blob/main/README.md Clone the MoGe repository and install all necessary dependencies using the provided requirements file. This method is suitable for development or if you need direct access to the source code. ```bash git clone https://github.com/microsoft/MoGe.git cd MoGe pip install -r requirements.txt ``` -------------------------------- ### Infer Baselines with Example Images Source: https://github.com/microsoft/moge/blob/main/docs/eval.md Use this script to run inference on a small set of example images for a given baseline. This is useful for checking baseline implementation correctness. It includes options for specifying the baseline, input directory, output directory, pretrained model, and output map/ply formats. ```bash python moge/scripts/infer_baselines.py --baseline baselines/moge.py --input example_images/ --output infer_outupt/moge --pretrained Ruicheng/moge-vitl --maps --ply ``` -------------------------------- ### Install MoGe via pip Source: https://github.com/microsoft/moge/blob/main/README.md Use this command to install the MoGe library directly from GitHub using pip. Ensure you have a compatible Python environment. ```bash pip install git+https://github.com/microsoft/MoGe.git ``` -------------------------------- ### Launch Gradio Web Demo Source: https://context7.com/microsoft/moge/llms.txt Start an interactive web demo for uploading images and viewing 3D reconstructions. Options include enabling public sharing, using FP16 for faster inference, and selecting specific model versions. ```bash moge app ``` ```bash moge app --share ``` ```bash moge app --fp16 ``` ```bash moge app --version v1 ``` ```bash moge app --pretrained Ruicheng/moge-2-vitb-normal ``` -------------------------------- ### MoGe Training Configuration Example Source: https://github.com/microsoft/moge/blob/main/docs/train.md This JSON object defines the hyperparameters for training the MoGe model. It includes settings for data augmentation, model architecture, optimizer, learning rate scheduler, and various loss functions. ```json { "data": { "aspect_ratio_range": [0.5, 2.0], # Range of aspect ratio of sampled images "area_range": [250000, 1000000], # Range of sampled image area in pixels "clamp_max_depth": 1000.0, # Maximum far/near "center_augmentation": 0.5, # Ratio of center crop augmentation "fov_range_absolute": [1, 179], # Absolute range of FOV in degrees "fov_range_relative": [0.01, 1.0], # Relative range of FOV to the original FOV "image_augmentation": ["jittering", "jpeg_loss", "blurring"], # List of image augmentation techniques "datasets": [ { "name": "TartanAir", # Name of the dataset. Name it as you like. "path": "data/TartanAir", # Path to the dataset "label_type": "synthetic", # Label type for this dataset. Losses will be applied accordingly. see "loss" config "weight": 4.8, # Probability of sampling this dataset "index": ".index.txt", # File name of the index file. Defaults to .index.txt "depth": "depth.png", # File name of depth images. Defaults to depth.png "center_augmentation": 0.25, # Below are dataset-specific hyperparameters. Overriding the global ones above. "fov_range_absolute": [30, 150], "fov_range_relative": [0.5, 1.0], "image_augmentation": ["jittering", "jpeg_loss", "blurring", "shot_noise"] } ] }, "model_version": "v1", # Model version. If you have multiple model variants, you can use this to switch between them. "model": { # Model hyperparameters. Will be passed to Model __init__() as kwargs. "encoder": "dinov2_vitl14", "remap_output": "exp", "intermediate_layers": 4, "dim_upsample": [256, 128, 64], "dim_times_res_block_hidden": 2, "num_res_blocks": 2, "num_tokens_range": [1200, 2500], "last_conv_channels": 32, "last_conv_size": 1 }, "optimizer": { # Reflection-like optimizer configurations. See moge.train.utils.py build_optimizer() for details. "type": "AdamW", "params": [ {"params": {"include": ["*"], "exclude": ["*backbone.*"]}, "lr": 1e-4}, {"params": {"include": ["*backbone.*"]}, "lr": 1e-5} ] }, "lr_scheduler": { # Reflection-like lr_scheduler configurations. See moge.train.utils.py build_lr_scheduler() for details. "type": "SequentialLR", "params": { "schedulers": [ {"type": "LambdaLR", "params": {"lr_lambda": ["1.0", "max(0.0, min(1.0, (epoch - 1000) / 1000))"]}}, {"type": "StepLR", "params": {"step_size": 25000, "gamma": 0.5}} ], "milestones": [2000] } }, "low_resolution_training_steps": 50000, # Total number of low-resolution training steps. It makes the early stage training faster. Later stage training on varying size images will be slower. "loss": { "invalid": {}, "synthetic": { # Below are loss hyperparameters "global": {"function": "affine_invariant_global_loss", "weight": 1.0, "params": {"align_resolution": 32}}, "patch_4": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 4, "align_resolution": 16, "num_patches": 16}}, "patch_16": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 16, "align_resolution": 8, "num_patches": 256}}, "patch_64": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 64, "align_resolution": 4, "num_patches": 4096}}, "normal": {"function": "normal_loss", "weight": 1.0}, "mask": {"function": "mask_l2_loss", "weight": 1.0} }, "sfm": { "global": {"function": "affine_invariant_global_loss", "weight": 1.0, "params": {"align_resolution": 32}}, "patch_4": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 4, "align_resolution": 16, "num_patches": 16}}, "patch_16": {"function": "affine_invariant_local_loss", "weight": 1.0, "params": {"level": 16, "align_resolution": 8, "num_patches": 256}}, "mask": {"function": "mask_l2_loss", "weight": 1.0} } } } ``` -------------------------------- ### Saving 3D Outputs with Utility Functions Source: https://context7.com/microsoft/moge/llms.txt Save inference results as 3D mesh files (GLB) or point clouds (PLY) using provided utility functions. This example demonstrates loading an image, performing inference, and cleaning edges for better mesh quality. ```python import cv2 import torch import numpy as np import utils3d from moge.model.v2 import MoGeModel from moge.utils.io import save_glb, save_ply device = torch.device("cuda") model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device) # Load and process image image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB) height, width = image.shape[:2] image_tensor = torch.tensor(image / 255, dtype=torch.float32, device=device).permute(2, 0, 1) output = model.infer(image_tensor, use_fp16=True) points = output["points"].cpu().numpy() depth = output["depth"].cpu().numpy() mask = output["mask"].cpu().numpy() normal = output.get("normal", None) if normal is not None: normal = normal.cpu().numpy() # Clean edges for better mesh quality mask_cleaned = mask & ~utils3d.np.depth_map_edge(depth, rtol=0.04) ``` -------------------------------- ### Export MoGe Models to ONNX Source: https://context7.com/microsoft/moge/llms.txt Export MoGe models to ONNX format for deployment. This involves disabling xformers for compatibility and using `torch.onnx.export`. Examples show exporting with dynamic input shapes and fixed input shapes. ```python import os os.environ['XFORMERS_DISABLED'] = '1' # Disable xformers for ONNX compatibility import torch from moge.model.v2 import MoGeModel # Load model model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal") model.onnx_compatible_mode = True # Enable ONNX compatible mode # Export with dynamic input shape and variable tokens torch.onnx.export( model, (torch.rand(1, 3, 518, 518), torch.tensor(1800)), "moge-2-vitl-normal.onnx", input_names=['image', 'num_tokens'], output_names=['points', 'normal', 'mask', 'metric_scale'], dynamic_axes={'image': {0: 'batch_size', 2: 'height', 3: 'width'}}, opset_version=14 ) # Export with fixed input shape for optimized inference class MoGeStatic(MoGeModel): def forward(self, image: torch.Tensor): return super().forward(image, 1800) # Fixed token count model_static = MoGeStatic.from_pretrained("Ruicheng/moge-2-vits-normal") model_static.onnx_compatible_mode = True torch.onnx.export( model_static, (torch.rand(1, 3, 518, 518),), "moge-2-static.onnx", input_names=['image'], output_names=['points', 'normal', 'mask', 'metric_scale'], opset_version=14 ) ``` -------------------------------- ### Launch Finetuning Script Source: https://github.com/microsoft/moge/blob/main/docs/train.md This command is used for finetuning a pre-trained MoGe model. Specify the path to your downloaded checkpoint using the --checkpoint argument. Lower learning rates and a minimum batch size of 32 are recommended for finetuning. ```bash accelerate launch \ --num_processes 8 \ moge/scripts/train.py \ --config configs/train/v1.json \ --workspace workspace/debug \ --gradient_accumulation_steps 2 \ --batch_size_forward 2 \ --checkpoint pretrained/moge-vitl.pt \ --enable_gradient_checkpointing True \ --vis_every 1000 \ --enable_mlflow True ``` -------------------------------- ### Launch Distributed Training Source: https://context7.com/microsoft/moge/llms.txt Initiates distributed training for MoGe models using the accelerate library. Requires a specific dataset structure and configuration files. ```bash # Prepare dataset structure: # dataset/ # ├── .index.txt # List of instance paths # ├── folder1/ # │ ├── instance1/ # │ │ ├── image.jpg # RGB image # │ │ ├── depth.png # 16-bit depth (use moge.utils.io.write_depth) # │ │ └── meta.json # {"intrinsics": [[fx,0,cx],[0,fy,cy],[0,0,1]]} # Launch distributed training on 8 GPUs accelerate launch \ --num_processes 8 \ moge/scripts/train.py \ --config configs/train/v2.json \ --workspace workspace/moge-v2 \ --gradient_accumulation_steps 2 \ --batch_size_forward 2 \ --checkpoint latest \ --enable_gradient_checkpointing True \ --vis_every 1000 \ --enable_mlflow True ``` -------------------------------- ### Run MoGe Gradio Demo Source: https://github.com/microsoft/moge/blob/main/README.md Launch the interactive Gradio demo for MoGe. By default, it runs the MoGe-2 demo. The demo can also be run directly from the repository. ```bash # Using the command line tool moge app # will run MoGe-2 demo by default. # In this repo python moge/scripts/app.py # --share for Gradio public sharing ``` -------------------------------- ### Unzip Downloaded Datasets Source: https://github.com/microsoft/moge/blob/main/docs/eval.md After downloading, navigate to the `data/eval` directory and unzip all downloaded files. Optionally, remove the zip files if they are no longer needed. ```bash cd data/eval unzip '*.zip' # rm *.zip # if you don't keep the zip files ``` -------------------------------- ### Download Datasets with Huggingface CLI Source: https://github.com/microsoft/moge/blob/main/docs/eval.md Use this command to download processed datasets from Huggingface. Ensure the data is placed in the `data/eval` directory. ```bash mkdir -p data/eval huggingface-cli download Ruicheng/monocular-geometry-evaluation --repo-type dataset --local-dir data/eval --local-dir-use-symlinks False ``` -------------------------------- ### Visualize Instance Data Source: https://github.com/microsoft/moge/blob/main/docs/train.md Use this script to visualize instance data and check data quality. It exports the instance as a PLY file for point cloud visualization. Specify the path to the instance and optionally an output directory. ```bash python moge/scripts/vis_data.py PATH_TO_INSTANCE --ply [-o SOMEWHERE_ELSE_TO_SAVE_VIS] ``` -------------------------------- ### MoGe Infer Command Help Source: https://github.com/microsoft/moge/blob/main/README.md Displays detailed options for the `moge infer` command, including input/output paths, model parameters, and output formats. ```bash Usage: moge infer [OPTIONS] Inference script Options: -i, --input PATH Input image or folder path. "jpg" and "png" are supported. --fov_x FLOAT If camera parameters are known, set the horizontal field of view in degrees. Otherwise, MoGe will estimate it. -o, --output PATH Output folder path --pretrained TEXT Pretrained model name or path. If not provided, the corresponding default model will be chosen. --version [v1|v2] Model version. Defaults to "v2" --device TEXT Device name (e.g. "cuda", "cuda:0", "cpu"). Defaults to "cuda" --fp16 Use fp16 precision for much faster inference. --resize INTEGER Resize the image(s) & output maps to a specific size. Defaults to None (no resizing). --resolution_level INTEGER An integer [0-9] for the resolution level for inference. Higher value means more tokens and the finer details will be captured, but inference can be slower. Defaults to 9. Note that it is irrelevant to the output size, which is always the same as the input size. `resolution_level` actually controls `num_tokens`. See `num_tokens` for more details. --num_tokens INTEGER number of tokens used for inference. A integer in the (suggested) range of `[1200, 2500]`. `resolution_level` will be ignored if `num_tokens` is provided. Default: None --threshold FLOAT Threshold for removing edges. Defaults to 0.01. Smaller value removes more edges. "inf" means no thresholding. --maps Whether to save the output maps (image, point map, depth map, normal map, mask) and fov. --glb Whether to save the output as a.glb file. The color will be saved as a texture. --ply Whether to save the output as a.ply file. The color will be saved as vertex colors. --show Whether show the output in a window. Note that this requires pyglet<2 installed as required by trimesh. --help Show this message and exit. ``` -------------------------------- ### Depth Map I/O Utilities Source: https://context7.com/microsoft/moge/llms.txt Provides utilities for reading and writing depth maps in MoGe's optimized 16-bit PNG format. Supports logarithmic encoding and handles special values like NaN and Inf. ```python from moge.utils.io import read_depth, write_depth import numpy as np # Create sample depth map with special values depth = np.random.uniform(0.5, 50.0, (480, 640)).astype(np.float32) depth[100:150, 200:250] = np.nan # Invalid/unknown regions depth[300:320, 400:420] = np.inf # Infinite depth (sky) # Write depth map (stores in logarithmic scale, handles NaN/Inf) write_depth("output/depth.png", depth, max_range=1e5, compression_level=7) # Read depth map back depth_loaded = read_depth("output/depth.png") # Values are preserved including special cases print(f"NaN preserved: {np.isnan(depth_loaded[125, 225])}") print(f"Inf preserved: {np.isinf(depth_loaded[310, 410])}") print(f"Max error: {np.nanmax(np.abs(depth - depth_loaded)[np.isfinite(depth)]):.6f}") ``` -------------------------------- ### Evaluation Script Usage Source: https://github.com/microsoft/moge/blob/main/docs/eval.md This shows the available options for the `eval_baseline.py` script, including paths for baseline, configuration, and output, as well as flags for oracle mode and dumping predictions or ground truth. ```bash Usage: eval_baseline.py [OPTIONS] Evaluation script. Options: --baseline PATH Path to the baseline model python code. --config PATH Path to the evaluation configurations. Defaults to "configs/eval/all_benchmarks.json". --output PATH Path to the output json file. --oracle Use oracle mode for evaluation, i.e., use the GT intrinsics input. --dump_pred Dump predition results. --dump_gt Dump ground truth. --help Show this message and exit. ``` -------------------------------- ### Evaluate Depth Anything V2 Model Source: https://github.com/microsoft/moge/blob/main/docs/eval.md Evaluate the Depth Anything V2 model using the provided script. Note that this model uses affine disparity. Specify the baseline script, configuration, and output path. ```bash python moge/scripts/eval_baseline.py --baseline baselines/da_v2.py --config configs/eval/all_benchmarks.json --output eval_output/da_v2.json ``` -------------------------------- ### Evaluate MoGe Performance on Benchmarks Source: https://context7.com/microsoft/moge/llms.txt Evaluates MoGe model performance across various depth estimation benchmarks. Requires downloading evaluation datasets and provides options for oracle mode and dumping predictions. ```bash # Download evaluation datasets mkdir -p data/eval huggingface-cli download Ruicheng/monocular-geometry-evaluation \ --repo-type dataset \ --local-dir data/eval \ --local-dir-use-symlinks False cd data/eval && unzip '*.zip' # Evaluate MoGe on all benchmarks python moge/scripts/eval_baseline.py \ --baseline baselines/moge.py \ --config configs/eval/all_benchmarks.json \ --output eval_output/moge.json \ --pretrained Ruicheng/moge-2-vitl-normal \ --resolution_level 9 # Evaluate with oracle mode (ground truth intrinsics) python moge/scripts/eval_baseline.py \ --baseline baselines/moge.py \ --config configs/eval/all_benchmarks.json \ --output eval_output/moge_oracle.json \ --pretrained Ruicheng/moge-2-vitl-normal \ --oracle # Dump predictions for visualization python moge/scripts/eval_baseline.py \ --baseline baselines/moge.py \ --config configs/eval/all_benchmarks.json \ --output eval_output/moge.json \ --pretrained Ruicheng/moge-2-vitl-normal \ --dump_pred ``` -------------------------------- ### Load Pretrained MoGe Model Source: https://context7.com/microsoft/moge/llms.txt Load MoGe models from Hugging Face Hub or a local checkpoint using the `from_pretrained` method. Custom model configurations can be applied using `model_kwargs`. ```python import torch from moge.model.v2 import MoGeModel # Load MoGe-2 with normal estimation (full capabilities) device = torch.device("cuda") model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device) # Alternative models available: # - "Ruicheng/moge-2-vitl" # MoGe-2 without normal head (326M params) # - "Ruicheng/moge-2-vitb-normal" # Smaller model (104M params) # - "Ruicheng/moge-2-vits-normal" # Smallest model (35M params) # - "Ruicheng/moge-vitl" # MoGe-1 (314M params) # Load from local checkpoint model = MoGeModel.from_pretrained("/path/to/model.pt").to(device) # Load with custom model configuration overrides model = MoGeModel.from_pretrained( "Ruicheng/moge-2-vitl-normal", model_kwargs={"num_tokens_range": [1000, 3000]} ).to(device) ``` -------------------------------- ### Load MoGe Model and Infer on Image Source: https://github.com/microsoft/moge/blob/main/README.md Load a pretrained MoGe model from Hugging Face and perform inference on a single image. Ensure the input image is preprocessed to a tensor with RGB values normalized to [0, 1]. The output contains point maps, depth, mask, and optionally normal maps and intrinsics. ```python import cv2 import torch # from moge.model.v1 import MoGeModel from moge.model.v2 import MoGeModel # Let's try MoGe-2 device = torch.device("cuda") # Load the model from huggingface hub (or load from local). model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device) # Read the input image and convert to tensor (3, H, W) with RGB values normalized to [0, 1] input_image = cv2.cvtColor(cv2.imread("PATH_TO_IMAGE.jpg"), cv2.COLOR_BGR2RGB) input_image = torch.tensor(input_image / 255, dtype=torch.float32, device=device).permute(2, 0, 1) # Infer output = model.infer(input_image) """ `output` has keys "points", "depth", "mask", "normal" (optional) and "intrinsics", The maps are in the same size as the input image. { "points": (H, W, 3), # point map in OpenCV camera coordinate system (x right, y down, z forward). For MoGe-2, the point map is in metric scale. "depth": (H, W), # depth map "normal": (H, W, 3) # normal map in OpenCV camera coordinate system. (available for MoGe-2-normal) "mask": (H, W), # a binary mask for valid pixels. "intrinsics": (3, 3), # normalized camera intrinsics } """ ``` -------------------------------- ### Command Line Inference with MoGe Source: https://context7.com/microsoft/moge/llms.txt Use the `moge infer` command for image processing and saving outputs like depth maps, point clouds, and 3D meshes. Options include specifying input/output paths, enabling different output formats, using FP16 for faster inference, setting camera FOV, adjusting resolution, selecting model versions, and controlling edge removal thresholds. ```bash moge infer -i ./images/ -o ./output/ --maps --glb --ply ``` ```bash moge infer -i photo.jpg -o ./output/ --fp16 --maps ``` ```bash moge infer -i photo.jpg -o ./output/ --fov_x 60.0 --maps --glb ``` ```bash moge infer -i ./images/ -o ./output/ --resolution_level 7 --fp16 --maps ``` ```bash moge infer -i ./images/ -o ./output/ --resize 800 --maps --glb ``` ```bash moge infer -i ./images/ -o ./output/ --version v1 --maps ``` ```bash moge infer -i ./images/ -o ./output/ --pretrained Ruicheng/moge-2-vitb-normal --maps ``` ```bash moge infer -i photo.jpg -o ./output/ --threshold 0.02 --glb ``` ```bash moge infer -i photo.jpg -o ./output/ --show ``` -------------------------------- ### Evaluate MoGe Model Source: https://github.com/microsoft/moge/blob/main/docs/eval.md Run the evaluation script for the MoGe model. This command specifies the baseline script, configuration file, output path, and pretrained model details. ```bash python moge/scripts/eval_baseline.py --baseline baselines/moge.py --config configs/eval/all_benchmarks.json --output eval_output/moge.json --pretrained Ruicheng/moge-vitl --resolution_level 9 ``` -------------------------------- ### Build Mesh from Depth/Point Map Source: https://context7.com/microsoft/moge/llms.txt Constructs a 3D mesh from point data, depth images, and optional normal maps. Handles coordinate system conversion for OpenGL export and saves the mesh as GLB or point cloud as PLY. ```python if normal is not None: faces, vertices, vertex_colors, vertex_uvs, vertex_normals = utils3d.np.build_mesh_from_map( points, image.astype(np.float32) / 255, utils3d.np.uv_map(height, width), normal, mask=mask_cleaned, tri=True ) else: faces, vertices, vertex_colors, vertex_uvs = utils3d.np.build_mesh_from_map( points, image.astype(np.float32) / 255, utils3d.np.uv_map(height, width), mask=mask_cleaned, tri=True ) vertex_normals = None # Convert to OpenGL coordinate system for export vertices = vertices * [1, -1, -1] vertex_uvs = vertex_uvs * [1, -1] + [0, 1] if vertex_normals is not None: vertex_normals = vertex_normals * [1, -1, -1] # Save as GLB mesh with texture save_glb("output/mesh.glb", vertices, faces, vertex_uvs, image, vertex_normals) # Save as PLY point cloud with colors save_ply("output/pointcloud.ply", vertices, np.zeros((0, 3), dtype=np.int32), vertex_colors, vertex_normals) ``` -------------------------------- ### Run MoGe Inference Source: https://github.com/microsoft/moge/blob/main/README.md Use this command to perform inference on input images or a folder of images. The output will be saved to the specified folder. Requires pyglet < 2.0. ```bash moge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --show ``` -------------------------------- ### Fine-tune Pretrained MoGe Model Source: https://context7.com/microsoft/moge/llms.txt Fine-tunes a pretrained MoGe checkpoint on a custom dataset using lower learning rates. Ensure pretrained weights are placed in the specified directory. ```bash # Download pretrained weights # Place in: pretrained/moge-2-vitl-normal.pt # Fine-tune with lower learning rate (recommended: 1e-5 for head, 1e-6 for backbone) accelerate launch \ --num_processes 8 \ moge/scripts/train.py \ --config configs/train/v2.json \ --workspace workspace/moge-finetuned \ --gradient_accumulation_steps 2 \ --batch_size_forward 2 \ --checkpoint pretrained/moge-2-vitl-normal.pt \ --enable_gradient_checkpointing True \ --vis_every 1000 \ --enable_mlflow True ``` -------------------------------- ### Run MoGe Panorama Inference Source: https://github.com/microsoft/moge/blob/main/README.md This experimental script processes 360° panorama images by splitting them into multiple views, inferring on each, and combining the results into a panorama depth and point map. The input image must be in spherical parameterization. ```bash moge infer_panorama --help ``` -------------------------------- ### Run MoGe Inference via Command Line Source: https://github.com/microsoft/moge/blob/main/README.md Execute the inference script from the command line to process images. This command allows saving various output formats including maps, GLB, and PLY files. ```bash # Save the output [maps], [glb] and [ply] files moge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --maps --glb --ply ``` -------------------------------- ### MoGe Inference API Source: https://github.com/microsoft/moge/blob/main/README.md Inference script to process images and show results in a window. Requires pyglet < 2.0. ```APIDOC ## POST /api/infer ### Description Performs inference on input images or a folder of images and displays the results in a window. ### Method POST ### Endpoint /api/infer ### Parameters #### Query Parameters - **-i, --input** (PATH) - Required - Input image or folder path. "jpg" and "png" are supported. - **-o, --output** (PATH) - Required - Output folder path - **--fov_x** (FLOAT) - Optional - If camera parameters are known, set the horizontal field of view in degrees. Otherwise, MoGe will estimate it. - **--pretrained** (TEXT) - Optional - Pretrained model name or path. If not provided, the corresponding default model will be chosen. - **--version** (v1|v2) - Optional - Model version. Defaults to "v2" - **--device** (TEXT) - Optional - Device name (e.g. "cuda", "cuda:0", "cpu"). Defaults to "cuda" - **--fp16** (BOOLEAN) - Optional - Use fp16 precision for much faster inference. - **--resize** (INTEGER) - Optional - Resize the image(s) & output maps to a specific size. Defaults to None (no resizing). - **--resolution_level** (INTEGER) - Optional - An integer [0-9] for the resolution level for inference. Higher value means more tokens and the finer details will be captured, but inference can be slower. Defaults to 9. Note that it is irrelevant to the output size, which is always the same as the input size. `resolution_level` actually controls `num_tokens`. - **--num_tokens** (INTEGER) - Optional - number of tokens used for inference. A integer in the (suggested) range of `[1200, 2500]`. `resolution_level` will be ignored if `num_tokens` is provided. Default: None - **--threshold** (FLOAT) - Optional - Threshold for removing edges. Defaults to 0.01. Smaller value removes more edges. "inf" means no thresholding. - **--maps** (BOOLEAN) - Optional - Whether to save the output maps (image, point map, depth map, normal map, mask) and fov. - **--glb** (BOOLEAN) - Optional - Whether to save the output as a.glb file. The color will be saved as a texture. - **--ply** (BOOLEAN) - Optional - Whether to save the output as a.ply file. The color will be saved as vertex colors. - **--show** (BOOLEAN) - Optional - Whether show the output in a window. Note that this requires pyglet<2 installed as required by trimesh. ### Request Example ```json { "input": "/path/to/images/", "output": "/path/to/output/", "show": true } ``` ### Response #### Success Response (200) - **message** (string) - Inference completed successfully. #### Response Example ```json { "message": "Inference completed successfully." } ``` ``` -------------------------------- ### Enable Gradient Checkpointing for MoGe Training Source: https://context7.com/microsoft/moge/llms.txt Reduce GPU memory usage during MoGe model training by enabling gradient checkpointing and PyTorch native scaled dot-product attention. This trades compute for memory. ```python import torch from moge.model.v2 import MoGeModel model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal") model.enable_gradient_checkpointing() # Reduces memory usage during training model.enable_pytorch_native_sdpa() # Use PyTorch native scaled dot-product attention # Now training will use less GPU memory model.train() optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5) # Forward pass with gradient checkpointing enabled image = torch.rand(2, 3, 512, 512, device="cuda") output = model(image, num_tokens=1800) loss = output["points"].mean() # Example loss loss.backward() # Gradients computed with checkpointing ``` -------------------------------- ### Visualize Depth and Normal Maps with MoGe Source: https://context7.com/microsoft/moge/llms.txt Convert depth and normal predictions from the MoGe model into colorized visualizations using OpenCV. Ensure the input image is in RGB format before processing. ```python import cv2 import torch from moge.model.v2 import MoGeModel from moge.utils.vis import colorize_depth, colorize_normal device = torch.device("cuda") model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device) # Run inference image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB) image_tensor = torch.tensor(image / 255, dtype=torch.float32, device=device).permute(2, 0, 1) output = model.infer(image_tensor, use_fp16=True) depth = output["depth"].cpu().numpy() mask = output["mask"].cpu().numpy() normal = output["normal"].cpu().numpy() # Colorize depth (turbo colormap, masked regions shown as black) depth_vis = colorize_depth(depth, mask=mask) # Returns (H, W, 3) RGB uint8 cv2.imwrite("depth_colored.png", cv2.cvtColor(depth_vis, cv2.COLOR_RGB2BGR)) # Colorize normal map (RGB encoding: R=X, G=Y, B=Z) normal_vis = colorize_normal(normal) # Returns (H, W, 3) RGB uint8 cv2.imwrite("normal_colored.png", cv2.cvtColor(normal_vis, cv2.COLOR_RGB2BGR)) ``` -------------------------------- ### MoGe Panorama Inference API Source: https://github.com/microsoft/moge/blob/main/README.md Experimental script to infer on 360-degree panorama images by splitting them into multiple views. ```APIDOC ## POST /api/infer_panorama ### Description Processes 360-degree panorama images by splitting them into multiple perspective views, inferring on each, and combining the results into panorama depth and point maps. The input image must have spherical parameterization (e.g., environment maps or equirectangular images). ### Method POST ### Endpoint /api/infer_panorama ### Parameters #### Query Parameters - **-i, --input** (PATH) - Required - Input panorama image path. - **-o, --output** (PATH) - Required - Output folder path. - **--fov_x** (FLOAT) - Optional - If camera parameters are known, set the horizontal field of view in degrees. Otherwise, MoGe will estimate it. - **--pretrained** (TEXT) - Optional - Pretrained model name or path. If not provided, the corresponding default model will be chosen. - **--version** (v1|v2) - Optional - Model version. Defaults to "v2" - **--device** (TEXT) - Optional - Device name (e.g. "cuda", "cuda:0", "cpu"). Defaults to "cuda" - **--fp16** (BOOLEAN) - Optional - Use fp16 precision for much faster inference. - **--resize** (INTEGER) - Optional - Resize the image(s) & output maps to a specific size. Defaults to None (no resizing). - **--resolution_level** (INTEGER) - Optional - An integer [0-9] for the resolution level for inference. Higher value means more tokens and the finer details will be captured, but inference can be slower. Defaults to 9. Note that it is irrelevant to the output size, which is always the same as the input size. `resolution_level` actually controls `num_tokens`. - **--num_tokens** (INTEGER) - Optional - number of tokens used for inference. A integer in the (suggested) range of `[1200, 2500]`. `resolution_level` will be ignored if `num_tokens` is provided. Default: None - **--threshold** (FLOAT) - Optional - Threshold for removing edges. Defaults to 0.01. Smaller value removes more edges. "inf" means no thresholding. - **--maps** (BOOLEAN) - Optional - Whether to save the output maps (image, point map, depth map, normal map, mask) and fov. - **--glb** (BOOLEAN) - Optional - Whether to save the output as a.glb file. The color will be saved as a texture. - **--ply** (BOOLEAN) - Optional - Whether to save the output as a.ply file. The color will be saved as vertex colors. ### Request Example ```json { "input": "/path/to/panorama.jpg", "output": "/path/to/panorama_output/" } ``` ### Response #### Success Response (200) - **message** (string) - Panorama inference completed successfully. #### Response Example ```json { "message": "Panorama inference completed successfully." } ``` ``` -------------------------------- ### Moge-2 Project Citation Source: https://github.com/microsoft/moge/blob/main/README.md Cite this paper for MoGe-2, which focuses on accurate monocular geometry with metric scale and sharp details. This includes arXiv details. ```bibtex @misc{wang2025moge2, title={MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details}, author={Ruicheng Wang and Sicheng Xu and Yue Dong and Yu Deng and Jianfeng Xiang and Zelong Lv and Guangzhong Sun and Xin Tong and Jiaolong Yang}, year={2025}, eprint={2507.02546}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.02546} } ``` -------------------------------- ### Run Inference on a Single Image Source: https://context7.com/microsoft/moge/llms.txt Use the `infer()` method for single-image inference, which includes automatic post-processing like depth recovery and camera intrinsics estimation. Ensure the input image is converted to a tensor with values in [0, 1] and permuted to (3, H, W). ```python import cv2 import torch from moge.model.v2 import MoGeModel device = torch.device("cuda") model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device) # Read and prepare image input_image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB) input_tensor = torch.tensor(input_image / 255, dtype=torch.float32, device=device).permute(2, 0, 1) # Run inference output = model.infer( input_tensor, resolution_level=9, # 0-9, higher = more detail but slower force_projection=True, # Recompute points from depth for consistency apply_mask=True, # Mask invalid pixels with inf use_fp16=True # Use mixed precision for speed ) # Access outputs points = output["points"] # (H, W, 3) - 3D point cloud in metric scale depth = output["depth"] # (H, W) - depth map in meters mask = output["mask"] # (H, W) - binary validity mask normal = output["normal"] # (H, W, 3) - surface normals (if available) intrinsics = output["intrinsics"] # (3, 3) - estimated camera intrinsics print(f"Depth range: {depth[mask].min():.2f}m to {depth[mask].max():.2f}m") ``` -------------------------------- ### 360 Panorama Inference Source: https://context7.com/microsoft/moge/llms.txt Process 360-degree panorama images by splitting them into perspective views and merging results. Options include custom batch size for GPU memory management, saving intermediate split images, and resizing output resolution. ```bash moge infer_panorama -i panorama.jpg -o ./output/ --maps --glb ``` ```bash moge infer_panorama -i panorama.jpg -o ./output/ --batch_size 2 --maps ``` ```bash moge infer_panorama -i panorama.jpg -o ./output/ --splitted --maps --ply ``` ```bash moge infer_panorama -i panorama.jpg -o ./output/ --resize 1920 --maps --glb ``` -------------------------------- ### Export MoGe Model with Static Shape and Fixed Tokens to ONNX Source: https://github.com/microsoft/moge/blob/main/docs/onnx.md This snippet demonstrates exporting a MoGe model to ONNX with static input shapes and a fixed number of tokens. A custom class `MoGeStatic` is used to handle the fixed token count. The exported ONNX model will contain only the raw forward pass. ```python import os os.environ['XFORMERS_DISABLED'] = '1' # Disable xformers import numpy as np import torch from moge.model.v2 import MoGeModel class MoGeStatic(MoGeModel): def forward(self, image: torch.Tensor): return super().forward(image, NUM_TOKENS) NUM_TOKENS = 1800 FIXED_IMAGE_INPUT = torch.rand(1, 3, 518, 518) PRETRAINED_MODEL = 'Ruicheng/moge-2-vits-normal.pt' ONNX_FILE = 'moge-2-vits-normal.onnx' model = MoGeStatic.from_pretrained(PRETRAINED_MODEL) model.onnx_compatible_mode = True # Enable ONNX compatible mode torch.onnx.export( model, (FIXED_IMAGE_INPUT,), ONNX_FILE, input_names=['image'], output_names=['points', 'normal', 'mask', 'metric_scale'], dynamic_axes=None, opset_version=14 ) ``` -------------------------------- ### Run Inference with Known Camera FOV Source: https://context7.com/microsoft/moge/llms.txt Improve accuracy by providing the known horizontal camera field of view (FOV) in degrees to the `infer()` method. This bypasses the model's estimation of FOV. ```python import torch from moge.model.v2 import MoGeModel device = torch.device("cuda") model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device) # Prepare image tensor (3, H, W) with values in [0, 1] image_tensor = torch.rand(3, 720, 1280, device=device) # Run inference with known horizontal FOV (degrees) output = model.infer( image_tensor, fov_x=60.0, # Known horizontal field of view in degrees resolution_level=9, use_fp16=True ) # The model will use provided FOV instead of estimating it depth = output["depth"] points = output["points"] ``` -------------------------------- ### Batch Inference Source: https://context7.com/microsoft/moge/llms.txt Process multiple images concurrently for higher throughput by stacking them into a batch tensor (B, 3, H, W). Ensure all images in the batch are resized to the same dimensions before stacking. ```python import cv2 import torch import numpy as np from moge.model.v2 import MoGeModel device = torch.device("cuda") model = MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device) # Load multiple images image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"] images = [] for path in image_paths: img = cv2.cvtColor(cv2.imread(path), cv2.COLOR_BGR2RGB) img = cv2.resize(img, (640, 480)) # Resize to same dimensions for batching images.append(img) # Stack into batch tensor (B, 3, H, W) batch = np.stack(images, axis=0) batch_tensor = torch.tensor(batch / 255, dtype=torch.float32, device=device).permute(0, 3, 1, 2) # Run batch inference output = model.infer(batch_tensor, resolution_level=7, use_fp16=True) ``` -------------------------------- ### Export MoGe Model with Dynamic Shape and Variable Tokens to ONNX Source: https://github.com/microsoft/moge/blob/main/docs/onnx.md Use this snippet to export a MoGe model to ONNX format with dynamic input shapes and a variable number of tokens. Ensure ONNX compatible mode is enabled on the model before exporting. The exported model will output intermediate predictions, requiring separate post-processing. ```python import os os.environ['XFORMERS_DISABLED'] = '1' # Disable xformers import numpy as np import torch from moge.model.v2 import MoGeModel PRETRAINED_MODEL = 'Ruicheng/moge-2-vits-normal.pt' ONNX_FILE = 'moge-2-vits-normal.onnx' model = MoGeModel.from_pretrained(PRETRAINED_MODEL) model.onnx_compatible_mode = True # Enable ONNX compatible mode torch.onnx.export( model, (torch.rand(1, 3, 518, 518), torch.tensor(1800)), ONNX_FILE, input_names=['image', 'num_tokens'], output_names=['points', 'normal', 'mask', 'metric_scale'], dynamic_axes={ 'image': {0: 'batch_size', 2: 'height', 3: 'width'}, }, opset_version=14 ) ```