### Test Installation (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Verifies the project installation by running the command-line interface (CLI) help command. This ensures the 'sharp' command is recognized and functional. ```bash sharp --help ``` -------------------------------- ### Install Project Dependencies (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Installs the project's Python dependencies by reading from the 'requirements.txt' file. This command should be run after creating the Python environment. ```bash pip install -r requirements.txt ``` -------------------------------- ### Download Model Checkpoint (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Downloads the SHARP model checkpoint directly from a provided URL. This is an alternative to automatic downloading during prediction. ```bash wget https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt ``` -------------------------------- ### Run Prediction with Auto Download (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Executes the SHARP model for prediction using images from the specified input directory and saves the resulting 3D Gaussian splats to the output directory. The model checkpoint is automatically downloaded and cached if not found locally. ```bash sharp predict -i /path/to/input/images -o /path/to/output/gaussians ``` -------------------------------- ### Run Prediction with Manual Checkpoint (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Executes the SHARP model for prediction, specifying a manually downloaded checkpoint file using the '-c' flag. This allows for using a specific version of the model. ```bash sharp predict -i /path/to/input/images -o /path/to/output/gaussians -c sharp_2572gikvuh.pt ``` -------------------------------- ### CLI - Render Trajectory Videos with SHARP Source: https://context7.com/apple/ml-sharp/llms.txt Command-line interface for rendering camera trajectory videos from pre-computed 3D Gaussian .ply files generated by SHARP. Supports rendering from single or multiple .ply files and generates both .mp4 video and .depth.mp4 depth visualizations. ```bash # Render video from single .ply file sharp render -i ./output/scene.ply -o ./videos # Render videos from all .ply files in directory sharp render -i ./output -o ./videos -v # Expected output: Creates .mp4 video and .depth.mp4 depth visualization # Example: output/scene.ply -> videos/scene.mp4, videos/scene.depth.mp4 # Default trajectory: 60 frames rotating forward with camera movement ``` -------------------------------- ### Create Python Environment (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Creates a new Conda Python environment named 'sharp' with Python version 3.13. This is the first step to set up the project's dependencies. ```bash conda create -n sharp python=3.13 ``` -------------------------------- ### Render Videos from Gaussians (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Predicts 3D Gaussian splats and then renders videos from these splats using a camera trajectory. This functionality requires a CUDA GPU and uses the '--render' option. ```bash sharp predict -i /path/to/input/images -o /path/to/output/gaussians --render ``` -------------------------------- ### Render Videos from Intermediate Gaussians (Bash) Source: https://github.com/apple/ml-sharp/blob/main/README.md Renders videos directly from pre-computed 3D Gaussian splat files located in the specified output directory. This command is used after the prediction step has generated the Gaussian data. ```bash sharp render -i /path/to/output/gaussians -o /path/to/output/renderings ``` -------------------------------- ### Python API: Configure Model Parameters for Neural Networks Source: https://context7.com/apple/ml-sharp/llms.txt Customizes neural network architecture and training hyperparameters using the SHARP Python API. This involves setting up initializer, monodepth, and Gaussian decoder parameters, as well as learning rate factors and color space configurations. The output is a configured predictor object. ```python from sharp.models import create_predictor, PredictorParams from sharp.models.params import ( InitializerParams, MonodepthParams, GaussianDecoderParams, DeltaFactor ) # Configure predictor with custom parameters params = PredictorParams( # Initializer settings initializer=InitializerParams( num_layers=2, # Number of Gaussian layers stride=2, # Spatial stride scale_factor=1.0, # Initial scale multiplier first_layer_depth_option="surface_min", color_option="all_layers", # Options: "none", "first_layer", "all_layers" normalize_depth=True ), # Monodepth network configuration monodepth=MonodepthParams( patch_encoder_preset="dinov2l16_384", image_encoder_preset="dinov2l16_384", checkpoint_uri=None, unfreeze_patch_encoder=False, grad_checkpointing=False, dims_decoder=(256, 256, 256, 256, 256) ), # Gaussian decoder settings gaussian_decoder=GaussianDecoderParams( dim_in=5, dim_out=32, norm_type="group_norm", norm_num_groups=8, stride=2, use_depth_input=True, upsampling_mode="transposed_conv", grad_checkpointing=False ), # Learning rate factors for different properties delta_factor=DeltaFactor( xy=0.001, # Position learning rate z=0.001, color=0.1, # 0.1 for linearRGB, 1.0 for sRGB opacity=1.0, scale=1.0, quaternion=1.0 ), # Gaussian constraints max_scale=10.0, min_scale=0.0, # Color space and activations color_space="linearRGB", # or "sRGB" color_activation_type="sigmoid", opacity_activation_type="sigmoid", # Advanced settings num_monodepth_layers=2, sorting_monodepth=False, base_scale_on_predicted_mean=True ) # Create predictor with custom configuration predictor = create_predictor(params) print(f"Internal resolution: {predictor.internal_resolution()}") print(f"Output resolution: {predictor.output_resolution}") # Load weights and run inference # state_dict = torch.load("checkpoint.pt", weights_only=True) # predictor.load_state_dict(state_dict) # predictor.eval() ``` -------------------------------- ### Python API: Video Writing with Synchronized Depth Visualization Source: https://context7.com/apple/ml-sharp/llms.txt Creates video files with synchronized color and depth visualizations using the SHARP Python API. It initializes a VideoWriter, generates synthetic color and depth frames, and writes them to the writer. The output includes a color video and an optional depth visualization video. ```python from sharp.utils.io import VideoWriter import torch from pathlib import Path # Initialize video writer output_path = Path("output.mp4") writer = VideoWriter( output_path=output_path, fps=30.0, # Frames per second render_depth=True # Also create .depth.mp4 file ) # Generate and write frames num_frames = 120 for i in range(num_frames): # Example: synthetic color and depth frames height, width = 480, 640 color = torch.randint(0, 255, (height, width, 3), dtype=torch.uint8) depth = torch.randn(height, width) * 5.0 + 10.0 # Depth in meters writer.add_frame(color, depth) # Finalize video files writer.close() # Output files: # - output.mp4: Color video # - output.depth.mp4: Depth visualization with colormap # Depth is automatically normalized and clamped to [0, 10] meters for visualization ``` -------------------------------- ### CLI - Predict 3D Gaussians from Images with SHARP Source: https://context7.com/apple/ml-sharp/llms.txt Command-line interface for predicting 3D Gaussian splat representations from input images using the SHARP system. Supports automatic model download, custom checkpoints, CUDA rendering, and verbose logging. Outputs standard .ply files compatible with public renderers. ```bash # Basic prediction from single image or directory sharp predict -i /path/to/input/images -o /path/to/output/gaussians # Predict with custom checkpoint sharp predict -i ./photos/image.jpg -o ./output -c sharp_2572gikvuh.pt # Predict and render trajectory video (requires CUDA GPU) sharp predict -i ./input -o ./output --render # Predict on specific device with verbose logging sharp predict -i ./input -o ./output --device cuda -v # Expected output: Creates .ply files in output directory # Example: input/photo.jpg -> output/photo.ply # Each .ply file contains 3D Gaussians with position, color, opacity, scale, and rotation ``` -------------------------------- ### Python API: Render Scene with Camera Trajectories Source: https://context7.com/apple/ml-sharp/llms.txt Generates video renderings of 3D Gaussian splat scenes using customizable camera motion trajectories. It loads a scene, configures a camera trajectory, initializes a renderer and video writer, and then renders each frame. The output is a video file (e.g., output.mp4) and optionally a depth visualization. ```python from sharp.utils import camera, gsplat, io from sharp.utils.gaussians import load_ply, Gaussians3D import torch from pathlib import Path # Load scene gaussians, metadata = load_ply(Path("scene.ply")) device = torch.device("cuda") f_px = metadata.focal_length_px width, height = metadata.resolution_px # Configure trajectory params = camera.TrajectoryParams( type="rotate_forward", # Options: "swipe", "shake", "rotate", "rotate_forward" lookat_mode="point", # Options: "point", "ahead" max_disparity=0.08, max_zoom=0.15, distance_m=0.0, num_steps=60, num_repeats=1 ) # Create camera model intrinsics = torch.tensor([ [f_px, 0, (width - 1) / 2.0, 0], [0, f_px, (height - 1) / 2.0, 0], [0, 0, 1, 0], [0, 0, 0, 1] ], device=device, dtype=torch.float32) camera_model = camera.create_camera_model( gaussians, intrinsics, resolution_px=(width, height), lookat_mode="point" ) # Generate trajectory trajectory = camera.create_eye_trajectory( gaussians, params, resolution_px=(width, height), f_px=f_px ) # Initialize renderer and video writer renderer = gsplat.GSplatRenderer(color_space=metadata.color_space) video_writer = io.VideoWriter(Path("output.mp4"), fps=30.0, render_depth=True) # Render each frame for eye_position in trajectory: camera_info = camera_model.compute(eye_position) output = renderer( gaussians.to(device), extrinsics=camera_info.extrinsics[None].to(device), intrinsics=camera_info.intrinsics[None].to(device), image_width=camera_info.width, image_height=camera_info.height ) color = (output.color[0].permute(1, 2, 0) * 255.0).to(dtype=torch.uint8) depth = output.depth[0] video_writer.add_frame(color, depth) video_writer.close() # Output: output.mp4 (color) and output.depth.mp4 (depth visualization) ``` -------------------------------- ### Python API: Save and Load PLY Files with Metadata Source: https://context7.com/apple/ml-sharp/llms.txt Handles 3D Gaussian splat .ply file I/O, preserving scene metadata such as focal length, resolution, and color space. It takes a Gaussians3D object and optional scene metadata as input and outputs a .ply file. The loaded data includes Gaussians and metadata. ```python from sharp.utils.gaussians import Gaussians3D, save_ply, load_ply, SceneMetaData from pathlib import Path import torch # Create example Gaussians num_gaussians = 10000 gaussians = Gaussians3D( mean_vectors=torch.randn(1, num_gaussians, 3), singular_values=torch.exp(torch.randn(1, num_gaussians, 3)), quaternions=torch.randn(1, num_gaussians, 4), colors=torch.rand(1, num_gaussians, 3), opacities=torch.sigmoid(torch.randn(1, num_gaussians)) ) # Save with camera metadata f_px = 512.0 image_shape = (480, 640) # (height, width) save_ply(gaussians, f_px, image_shape, Path("scene.ply")) # Load Gaussians and metadata loaded_gaussians, metadata = load_ply(Path("scene.ply")) # metadata contains: # - focal_length_px: 512.0 # - resolution_px: (640, 480) # Note: (width, height) # - color_space: "linearRGB" or "sRGB" print(f"Loaded {loaded_gaussians.mean_vectors.shape[1]} Gaussians") print(f"Focal length: {metadata.focal_length_px}px") print(f"Resolution: {metadata.resolution_px}") # Move to device device = torch.device("cuda") gaussians_gpu = loaded_gaussians.to(device) ``` -------------------------------- ### Python API - Load and Display RGB Images with SHARP utils Source: https://context7.com/apple/ml-sharp/llms.txt Python function to load RGB images using SHARP's utility functions. It handles EXIF extraction, automatic rotation based on orientation tags, and focal length estimation from EXIF data. Returns the image as a NumPy array, ICC profile bytes, and focal length in pixels. ```python from sharp.utils import io from pathlib import Path # Load image with metadata extraction image_path = Path("photo.jpg") image, icc_profile, f_px = io.load_rgb(image_path) # Parameters: # - auto_rotate: Apply EXIF orientation (default: True) # - remove_alpha: Strip alpha channel (default: True) # Returns: # - image: numpy array (H, W, 3) with RGB values 0-255 # - icc_profile: ICC color profile bytes or None # - f_px: focal length in pixels, estimated from EXIF or default 30mm print(f"Image shape: {image.shape}") print(f"Focal length: {f_px:.2f}px") # Example output: # Image shape: (2048, 1536, 3) # Focal length: 512.45px ``` -------------------------------- ### Python API - Predict 3D Gaussians with SHARP Source: https://context7.com/apple/ml-sharp/llms.txt Python function to generate 3D Gaussian representations from an RGB image using the SHARP model. It handles model initialization, checkpoint loading, image preprocessing, inference, and conversion to metric space. The output can be saved to a standard .ply file. ```python import torch import torch.nn.functional as F import numpy as np from sharp.models import create_predictor, PredictorParams from sharp.utils.gaussians import save_ply from pathlib import Path # Initialize model device = torch.device("cuda" if torch.cuda.is_available() else "cpu") predictor = create_predictor(PredictorParams()) # Load checkpoint checkpoint_url = "https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt" state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, progress=True) predictor.load_state_dict(state_dict) predictor.eval() predictor.to(device) # Load and preprocess image image = np.random.randint(0, 255, (1024, 768, 3), dtype=np.uint8) # Example f_px = 512.0 height, width = image.shape[:2] image_pt = torch.from_numpy(image).float().to(device).permute(2, 0, 1) / 255.0 disparity_factor = torch.tensor([f_px / width]).float().to(device) # Resize to internal resolution internal_shape = (1536, 1536) image_resized = F.interpolate( image_pt[None], size=internal_shape, mode="bilinear", align_corners=True ) # Run inference with torch.no_grad(): gaussians_ndc = predictor(image_resized, disparity_factor) # Convert to metric space from sharp.utils.gaussians import unproject_gaussians intrinsics = torch.tensor([ [f_px, 0, width / 2, 0], [0, f_px, height / 2, 0], [0, 0, 1, 0], [0, 0, 0, 1] ], device=device, dtype=torch.float32) gaussians = unproject_gaussians( gaussians_ndc, torch.eye(4).to(device), intrinsics, internal_shape ) # Save to PLY format save_ply(gaussians, f_px, (height, width), Path("output.ply")) # gaussians contains: # - mean_vectors: (1, N, 3) 3D positions # - singular_values: (1, N, 3) scale parameters # - quaternions: (1, N, 4) rotations ``` -------------------------------- ### Python API: Apply Affine Transformations to Gaussians Source: https://context7.com/apple/ml-sharp/llms.txt Applies affine transformations to 3D Gaussian splat parameters including mean, scale, and rotation. It takes a Gaussians3D object and a 3x4 affine transform matrix as input. This function is not differentiable due to its use of SVD for covariance transformation. Colors and opacities are preserved. ```python from sharp.utils.gaussians import Gaussians3D, apply_transform, unproject_gaussians import torch # Create example Gaussians gaussians = Gaussians3D( mean_vectors=torch.randn(1, 1000, 3), singular_values=torch.exp(torch.randn(1, 1000, 3)), quaternions=torch.randn(1, 1000, 4), colors=torch.rand(1, 1000, 3), opacities=torch.sigmoid(torch.randn(1, 1000)) ) # Define affine transform (3x4 matrix: rotation + translation) transform = torch.tensor([ [1.0, 0.0, 0.0, 2.0], # Translate +2 in x [0.0, 0.707, -0.707, 0.0], # Rotate 45° around x-axis [0.0, 0.707, 0.707, 1.0] # Translate +1 in z ], dtype=torch.float32) # Apply transformation transformed = apply_transform(gaussians, transform) # Unproject from NDC to world coordinates extrinsics = torch.eye(4) intrinsics = torch.tensor([ [512, 0, 384, 0], [0, 512, 256, 0], [0, 0, 1, 0], [0, 0, 0, 1] ], dtype=torch.float32) image_shape = (768, 512) # (width, height) world_gaussians = unproject_gaussians( gaussians, extrinsics, intrinsics, image_shape ) # Note: apply_transform is NOT differentiable (uses SVD for covariance) # Transforms affect position, scale, and rotation but preserve colors and opacity ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.