### Install InternNav Package Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/onsite_competition/README.md Install the InternNav package by navigating to the project directory and running the setup script. ```bash cd /InternNav pip install -e . ``` -------------------------------- ### Install Flash-Attention 2 Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Downloads and installs the pre-built wheel for Flash-Attention 2. If installation fails, you can skip this and remove the `attn_implementation="flash_attention_2"` argument during model initialization. ```bash !wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp39-cp39-linux_x86_64.whl %pip install flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp39-cp39-linux_x86_64.whl ``` -------------------------------- ### Start Agent Server Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/onsite_competition/README.md Launch the agent server using the provided utility script, specifying the configuration file. ```bash python -m internnav.agent.utils.server --config path/to/cfg.py ``` -------------------------------- ### Install Core Dependencies Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Installs essential libraries including transformers, diffusers, accelerate, opencv-python, pillow, numpy, and gym. ```python %pip install transformers==4.51.0 diffusers==0.31.0 accelerate==1.10.1 opencv-python==4.10.0.82 pillow==10.4.0 numpy==1.26.4 gym==0.23.1 %pip install imageio==2.37.0 imageio-ffmpeg==0.6.0 ftfy==6.3.1 %pip install scipy matplotlib %pip install -e ../../. # install InternNav ``` -------------------------------- ### Start Agent Server (Bash) Source: https://context7.com/internrobotics/internnav/llms.txt Starts a FastAPI-based HTTP agent server. Useful for running agents in separate processes or on different nodes. Ensure the evaluator configuration matches the server port. ```bash python scripts/eval/start_server.py \ --host localhost \ --config scripts/eval/configs/h1_cma_cfg.py ``` ```bash python scripts/eval/start_server.py \ --config scripts/eval/configs/h1_internvla_n1_async_cfg.py \ --reload ``` -------------------------------- ### Agent Base Class and Custom Agent Example Source: https://context7.com/internrobotics/internnav/llms.txt Demonstrates how to define a custom agent by extending the Agent base class and how to instantiate agents using the registry. ```APIDOC ## Agent Base Class — `internnav.agent.base.Agent` The `Agent` base class defines the common interface for all navigation policy agents and provides a class-level registry for instantiating agents by name from configuration objects. ```python from internnav.agent.base import Agent from internnav.configs.agent import AgentCfg # --- Define a custom agent --- @Agent.register('my_custom_agent') class MyCustomAgent(Agent): def __init__(self, config: AgentCfg): super().__init__(config) # initialize your model here def reset(self, reset_index=None): # called at the start of every new episode pass def step(self, obs): # obs is a list of dicts: [{'rgb': np.ndarray, 'depth': np.ndarray, 'instruction': str}] # must return a list of action dicts return [{'action': [1], 'ideal_flag': True}] # action 1 = FORWARD # --- Instantiate via registry --- cfg = AgentCfg( model_name='my_custom_agent', ckpt_path='checkpoints/my_model', model_settings={'device': 'cuda:0'}, ) agent = Agent.init(cfg) obs = [{'rgb': rgb_array, 'depth': depth_array, 'instruction': 'Go to the kitchen.'}] actions = agent.step(obs) # actions -> [{'action': [1], 'ideal_flag': True}] agent.reset(reset_index=[0]) ``` ``` -------------------------------- ### Output Trajectory Example 1 Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Represents a sample output trajectory generated during inference. ```python output_trajectory: [[0.0, 0.0], [0.10174560546875, -0.0021262168884277344], [0.2030487060546875, -0.008135795593261719], [0.3041839599609375, -0.015857279300689697], [0.405426025390625, -0.02214580774307251], [0.506927490234375, -0.023593485355377197], [0.6083984375, -0.017211496829986572], [0.70843505859375, -0.004626333713531494], [0.8078155517578125, 0.012145936489105225], [0.906158447265625, 0.03255265951156616], [1.0032196044921875, 0.05704110860824585], [1.098358154296875, 0.0870322585105896], [1.1915283203125, 0.12381595373153687], [1.28155517578125, 0.16996997594833374], [1.3683929443359375, 0.221796452999115], [1.4533538818359375, 0.277399480342865], [1.5370635986328125, 0.33464282751083374], [1.61981201171875, 0.39309924840927124], [1.7007598876953125, 0.45379871129989624], [1.7807464599609375, 0.51595538854599], [1.8596649169921875, 0.5800651907920837], [1.9368743896484375, 0.64583820104599], [2.0137100219726562, 0.7126182913780212], [2.0897674560546875, 0.780085027217865], [2.1649703979492188, 0.84804767370224], [2.2330245971679688, 0.9104179739952087], [2.2977821826934814, 0.97103451192379], [2.348536729812622, 1.0183265060186386], [2.3651299476623535, 1.0324728339910507], [2.3817588090896606, 1.0465948432683945], [2.3843406438827515, 1.0491845458745956], [2.383785128593445, 1.0496225208044052], [2.3835805654525757, 1.0501257628202438]] ``` -------------------------------- ### Install PyTorch and Check Version Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Installs PyTorch version 2.6.0 with CUDA 12.4 support and prints the installed version. Ensure your CUDA toolkit is compatible. ```python %pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124 import torch print(torch.__version__) ``` -------------------------------- ### Download InteriorNav Dataset Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Download the InteriorNav dataset, which includes scene USD files and navigation data splits (val_seen + val_unseen). Ensure git-lfs is installed and clone the datasets into the specified directory. ```bash git lfs install # At /root/InternNav/ mkdir interiornav_data # InteriorNav scene usd git clone https://huggingface.co/datasets/spatialverse/InteriorAgent interiornav_data/scene_data # InteriorNav val dataset git clone https://huggingface.co/datasets/spatialverse/InteriorAgent_Nav interiornav_data/raw_data ``` -------------------------------- ### Output Trajectory Example 2 Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Represents an alternative sample output trajectory generated during inference. ```python output_trajectory: [[0.0, 0.0], [0.09758758544921875, -0.011452913284301758], [0.19722747802734375, -0.021155595779418945], [0.29154205322265625, -0.02734684944152832], [0.38101673126220703, -0.0293729305267334], [0.469879150390625, -0.0261075496673584], [0.554107666015625, -0.016035795211791992], [0.6353373527526855, -0.0012137889862060547], [0.7102475166320801, 0.018787622451782227], [0.7814936637878418, 0.043318986892700195], [0.8519377708435059, 0.07297158241271973], [0.9173874855041504, 0.1057593822479248], [0.9814028739929199, 0.141371488571167], [1.045126736164093, 0.17730927467346191], [1.1082122921943665, 0.2132399082183838], [1.169586956501007, 0.2529715299606323], [1.2295807003974915, 0.29670655727386475], [1.2895076870918274, 0.3415853977203369], [1.3475502133369446, 0.38594746589660645], [1.4070499539375305, 0.4295980930328369], [1.468199074268341, 0.4694092273712158], [1.5301488041877747, 0.5065538883209229], [1.5891702771186829, 0.5427916049957275], [1.635184109210968, 0.5773718357086182], [1.6797216534614563, 0.6128342151641846], [1.7144139409065247, 0.6410515308380127], [1.7460413575172424, 0.6668822765350342], [1.7761572003364563, 0.6895382404327393], [1.8000407814979553, 0.707094669342041], [1.8237372040748596, 0.7243056297302246], [1.8280614018440247, 0.7274879217147827], [1.8279209733009338, 0.7279735803604126], [1.8280540108680725, 0.7284926176071167]] ``` -------------------------------- ### Submission JSON Format Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Create a JSON file with your Docker image URL and team information for submission on EvalAI. Ensure the structure exactly matches the example provided. ```json { "url": "your-registry/internnav-custom:v1", "team": { "name": "your-team-name", "members": [ { "name": "John Doe", "affiliation": "University of Example", "email": "john.doe@example.com", "leader": true }, { "name": "Jane Smith", "affiliation": "Example Research Lab", "email": "jane.smith@example.com", "leader": false } ] } } ``` -------------------------------- ### Output Trajectory Example 3 Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Represents a third sample output trajectory generated during inference. ```python output_trajectory: [[0.0, 0.0], [0.0973052978515625, -0.010159492492675781], [0.1962432861328125, -0.021831512451171875], [0.2963714599609375, -0.032324790954589844], [0.397308349609375, -0.040537357330322266], [0.4984283447265625, -0.04426717758178711], [0.5986785888671875, -0.04123497009277344], [0.69769287109375, -0.03177833557128906], [0.7961883544921875, -0.01540517807006836], [0.8932037353515625, 0.006081104278564453], [0.98870849609375, 0.03310894966125488], [1.0823516845703125, 0.0654900074005127], [1.174591064453125, 0.10251164436340332], [1.2654876708984375, 0.14447903633117676], [1.3544769287109375, 0.1904308795928955], [1.44110107421875, 0.24196743965148926], [1.525360107421875, 0.29798054695129395], [1.607818603515625, 0.3560631275177002], [1.6896209716796875, 0.4150993824005127], [1.771026611328125, 0.4748260974884033], [1.850372314453125, 0.5357697010040283], [1.9269771575927734, 0.5971939563751221], [2.0023632049560547, 0.6593964099884033], [2.075179100036621, 0.7212479114532471], [2.147168457508087, 0.7842972278594971], [2.217978775501251, 0.8480370044708252], [2.288813889026642, 0.9125664234161377], [2.3539403080940247, 0.9707787036895752], [2.404560387134552, 1.015345811843872], [2.4554598927497864, 1.057633638381958], [2.4636533856391907, 1.0646393299102783], [2.4630263447761536, 1.0648670196533203], [2.463208019733429, 1.0650734901428223]] ``` -------------------------------- ### Train Baseline Model Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Use this script to train your model. Ensure you activate the correct conda environment and install dependencies. ```bash conda activate internutopia pip install -r requirements/train.txt --index-url https://pypi.org/simple ./scripts/train/start_train.sh --name train_rdp --model rdp ``` -------------------------------- ### Define and Instantiate Custom Agent with InternNav Source: https://context7.com/internrobotics/internnav/llms.txt Demonstrates how to define a custom agent by inheriting from `internnav.agent.base.Agent` and registering it. Shows instantiation via the `Agent.init` registry using a configuration object. ```python from internnav.agent.base import Agent from internnav.configs.agent import AgentCfg # --- Define a custom agent --- @Agent.register('my_custom_agent') class MyCustomAgent(Agent): def __init__(self, config: AgentCfg): super().__init__(config) # initialize your model here def reset(self, reset_index=None): # called at the start of every new episode pass def step(self, obs): # obs is a list of dicts: [{'rgb': np.ndarray, 'depth': np.ndarray, 'instruction': str}] # must return a list of action dicts return [{'action': [1], 'ideal_flag': True}] # action 1 = FORWARD # --- Instantiate via registry --- cfg = AgentCfg( model_name='my_custom_agent', ckpt_path='checkpoints/my_model', model_settings={'device': 'cuda:0'}, ) agent = Agent.init(cfg) obs = [{'rgb': rgb_array, 'depth': depth_array, 'instruction': 'Go to the kitchen.'}] actions = agent.step(obs) # actions -> [{'action': [1], 'ideal_flag': True}] agent.reset(reset_index=[0]) ``` -------------------------------- ### Importing HabitatEnv Source: https://github.com/internrobotics/internnav/blob/main/internnav/habitat_extensions/vln/README.md Demonstrates how to import the Habitat environment wrapper from the package. ```python from internnav.habitat_extensions import HabitatEnv ``` -------------------------------- ### Train Policy Models (Bash) Source: https://context7.com/internrobotics/internnav/llms.txt Main training entry point for supported policy models. Initializes distributed training and runs Hugging Face training loops. Use `torchrun` for multi-GPU distributed training. ```bash python scripts/train/base_train/train.py \ --name cma_r2r_run1 \ --model_name cma ``` ```bash python scripts/train/base_train/train.py \ --name rdp_r2r_run1 \ --model_name rdp ``` ```bash torchrun --nproc_per_node=4 scripts/train/base_train/train.py \ --name navdp_run1 \ --model_name navdp ``` -------------------------------- ### Download ddppo-models Baseline Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Download the ddppo-models baseline weights. Ensure the checkpoints/ddppo-models directory exists. ```bash # ddppo-models $ mkdir -p checkpoints/ddppo-models $ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth ``` -------------------------------- ### Download R2R Finetuned Baseline Checkpoints Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Clone the VLN-PE repository and move the r2r checkpoints to the checkpoints directory. ```bash # download r2r finetuned baseline checkpoints $ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/ ``` -------------------------------- ### Initialize and Use Habitat Environment Source: https://context7.com/internrobotics/internnav/llms.txt Wrap the Habitat simulator for distributed evaluation. Episodes are sharded across workers, and progress can be resumed from checkpoints. Use `get_metrics()` for standard VLN measures. ```python from internnav.configs.evaluator import EnvCfg, TaskCfg from internnav.env.habitat_env import HabitatEnv env_config = EnvCfg( env_type='habitat', env_settings={ 'habitat_config': habitat_cfg, # OmegaConf/Habitat config object 'rank': 0, 'world_size': 4, # 4-GPU distributed evaluation 'output_path': './output/rank_0', }, ) env = HabitatEnv(env_config) print(f"Episodes assigned to rank 0: {len(env.episodes)}") while env.is_running: obs = env.reset() # advances to next episode if obs is None: break # all episodes done for _ in range(500): action = policy.act(obs) obs, reward, done, info = env.step(action) if done: metrics = env.get_metrics() # {'spl': 0.72, 'success': 1.0, ...} break env.close() ``` -------------------------------- ### Initialize and Use InternVLAN1Agent Source: https://context7.com/internrobotics/internnav/llms.txt Sets up the InternVLAN1Agent using a dual-system architecture for navigation. It configures model settings, including paths, dimensions, and inference modes. The agent executes navigation steps and breaks the loop if an episode reset is indicated. ```python from internnav.agent.internvla_n1_agent import InternVLAN1Agent from internnav.configs.agent import AgentCfg cfg = AgentCfg( model_name='internvla_n1', ckpt_path='', # weights loaded from model_settings.model_path model_settings={ 'model_path': 'checkpoints/InternVLA-N1-DualVLN', 'width': 640, 'height': 480, 'hfov': 79, 'resize_w': 384, 'resize_h': 384, 'max_new_tokens': 1024, 'num_frames': 32, 'num_history': 8, 'num_future_steps': 4, 'device': 'cuda:0', 'predict_step_nums': 32, 'continuous_traj': True, 'infer_mode': 'partial_async', # 'sync' or 'partial_async' 'sys2_max_forward_step': 8, 'vis_debug': False, 'vis_debug_path': './logs/vis_debug', }, ) agent = InternVLAN1Agent(cfg) agent.reset(reset_index=[0]) # start first episode for step in range(1000): obs = [ { 'rgb': rgb_frame, # np.ndarray (H, W, 3) uint8 'depth': depth_frame, # np.ndarray (H, W, 1) float32 in metres 'instruction': 'Exit the bedroom and go to the living room.', } ] result = agent.step(obs) # result -> [{'action': [1], 'ideal_flag': True}] # action -1 means "episode boundary / reset needed" if result[0]['action'] == [-1]: break agent.reset(reset_index=[0]) # start next episode; closes debug video writers ``` -------------------------------- ### Initialize and Warm-up Agent Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Initializes the InternVLA-N1 agent with the specified arguments and performs a warm-up step using dummy data. This ensures the model is ready for inference and helps catch potential issues early. ```python print("Loading model...") agent = InternVLAN1AsyncAgent(args) # Warm up model print("Warming up model...") dummy_rgb = np.zeros((480, 640, 3), dtype=np.uint8) dummy_depth = np.zeros((480, 640), dtype=np.float32) dummy_pose = np.eye(4) agent.reset() agent.step(dummy_rgb, dummy_depth, dummy_pose, "hello", intrinsic=args.camera_intrinsic) print("Model loaded successfully!") ``` -------------------------------- ### Download SceneData-N1 Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Download the SceneData-N1 dataset, which contains mp3d_pe data, and unzip it into the data/scene_data directory. ```bash # Scene $ wget https://huggingface.co/datasets/InternRobotics/Scene-N1/resolve/main/mp3d_pe.tar.gz # unzip to data/scene_data ``` -------------------------------- ### Build Submission Docker Image Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Build a Docker image for submission. Ensure your trained weights and model code are correctly packaged within the image at `/root/InternNav`. ```bash # Navigate to the directory $ cd PATH/TO/INTERNNAV/ # Build the new image $ docker build -t my-internnav-custom:v1 . ``` ```bash $ docker commit internnav my-internnav-with-updates:v1 # Easier to manage custom environment # May include all changes, making the docker image bloat. Please delete cache and other operations to reduce the image size. ``` ```bash $ docker tag my-internnav-custom:v1 your-registry/internnav-custom:v1 $ docker push your-registry/internnav-custom:v1 ``` -------------------------------- ### Configure Test Data Path Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Sets the directory for the test dataset, which is a pre-collected real-world dataset captured by a Unitree Go2 robot. Users can change this to their own dataset, which should include both aligned depth and RGB images, along with an 'instruction.txt' file. ```python # Configure data directory (single scene per folder) scene_dir = '../../assets/realworld_sample_data1' ``` -------------------------------- ### Registering HabitatVllnEnv Source: https://github.com/internrobotics/internnav/blob/main/internnav/habitat_extensions/vlln/README.md Demonstrates how HabitatVllnEnv is registered under the key "habitat_vlln" using the shared Env.register decorator. ```python HabitatVllnEnv = Env.register("habitat_vlln")(HabitatVllnEnv) ``` -------------------------------- ### Package Structure Source: https://github.com/internrobotics/internnav/blob/main/internnav/habitat_extensions/vln/README.md Illustrates the directory structure of the Habitat extensions within the InternNav project. ```tree habitat_extensions/vln/ ├── __init__.py ├── habitat_env.py ├── habitat_default_evaluator.py ├── habitat_vln_evaluator.py └── measures.py ``` -------------------------------- ### Initialize and Use RdpAgent Source: https://context7.com/internrobotics/internnav/llms.txt Initializes the RdpAgent with a given configuration and demonstrates its usage in a navigation loop. The agent caches waypoints and replans when the cache is empty. Resets the agent if an episode ends. ```python from internnav.agent.rdp_agent import RdpAgent from internnav.configs.agent import AgentCfg cfg = AgentCfg( model_name='rdp', ckpt_path='checkpoints/rdp/r2r_rdp', model_settings={ 'env_num': 1, 'proc_num': 1, }, ) agent = RdpAgent(cfg) agent.reset() for step in range(500): obs_batch = [ { 'instruction': 'Walk past the sofa and stop at the door.', # raw string 'globalgps': np.array([1.2, 0.5, 0.0]), # 3D world position 'globalrotation': np.array([0, 0, 0, 1]), # quaternion [x,y,z,w] 'rgb': rgb_frame, # np.ndarray (H, W, 3) 'depth': depth_frame, # np.ndarray (H, W, 1) } ] actions = agent.step(obs_batch) # actions -> [{'action': [1], 'ideal_flag': True}] # Internally, RdpAgent caches a trajectory of len_traj_act waypoints # and only calls the diffusion denoiser when the cache is empty. if actions[0]['action'] == [-1]: agent.reset(reset_ls=[0]) # -1 indicates episode just reset ``` -------------------------------- ### Define a Custom Simulation Environment Source: https://context7.com/internrobotics/internnav/llms.txt Implement a custom simulation environment by inheriting from `Env` and registering it. Ensure all required methods (`reset`, `step`, `close`, `get_observation`, `get_info`) are defined. Instantiate via the `Env.init` registry. ```python from internnav.env.base import Env from internnav.configs.evaluator import EnvCfg, TaskCfg @Env.register('my_sim') class MySimEnv(Env): def __init__(self, env_config: EnvCfg, task_config: TaskCfg): super().__init__(env_config, task_config) # initialize your simulator here def reset(self): return {'rgb': ..., 'depth': ..., 'instruction': '...'} def step(self, action): obs = ... done = False info = {'reward': 0.0} return obs, 0.0, done, info def close(self): ... def render(self): ... def get_observation(self): return {'rgb': ...} def get_info(self): return {} # Instantiate via registry env = Env.init( env_config=EnvCfg(env_type='my_sim', env_settings={'headless': True}), task_config=TaskCfg(task_name='nav_task', task_settings={'max_step': 500}, scene=None), ) obs = env.reset() obs, reward, done, info = env.step(action=1) env.close() ``` -------------------------------- ### Launch Evaluation Script Source: https://context7.com/internrobotics/internnav/llms.txt Use the unified CLI entry point `scripts/eval/eval.py` to launch registered evaluators. Specify the configuration file using the `--config` argument. Supports distributed evaluation with `torchrun`. ```bash # Evaluate CMA baseline on VLN-PE (R2R val_unseen) using Isaac Sim / InternUtopia python scripts/eval/eval.py --config scripts/eval/configs/h1_cma_cfg.py # Evaluate InternVLA-N1 (DualVLN, partial_async) on VLN-PE python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py # Evaluate InternVLA-N1 dual system on Habitat VLN-CE python scripts/eval/eval.py --config scripts/eval/configs/habitat_dual_system_cfg.py # Distributed evaluation with 4 GPUs (set use_distributed=True in config) torchrun --nproc_per_node=4 scripts/eval/eval.py \ --config scripts/eval/configs/h1_internvla_n1_async_cfg.py ``` -------------------------------- ### Extract Sample Dataset Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Extracts the real-world sample dataset archive to the specified directory. Ensure the archive file exists. ```bash !tar -xvf ../../assets/realworld_sample_data.tar.gz -C ../../assets/ ``` -------------------------------- ### Check and Read Instruction File Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Verifies the existence of 'instruction.txt' in a given scene directory and reads its content. It also lists debug images found in the directory. ```python instruction_path = os.path.join(scene_dir, 'instruction.txt') if not os.path.exists(instruction_path): print(f"Error: instruction.txt not found in {scene_dir}") else: print(f"Scene directory: {scene_dir}") # Read instruction with open(instruction_path, 'r') as f: instruction = f.read().strip() print(f"Instruction: {instruction}") # Get all debug_raw images rgb_paths = sorted(glob.glob(os.path.join(scene_dir, 'debug_raw_*.jpg'))) print(f"\nFound {len(rgb_paths)} images") # Show first few image names print("\nFirst 5 images:") for i, path in enumerate(rgb_paths[:5]): print(f" {i+1}. {os.path.basename(path)}") ``` -------------------------------- ### Configure Inference Parameters Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Defines a class to hold configuration arguments for the agent, including device, model path, image dimensions, history length, camera intrinsics, and a step gap for inference planning. The `plan_step_gap` argument controls inference frequency to mitigate sim-to-real gaps. ```python class Args: def __init__(self): self.device = "cuda:0" self.model_path = "/home/pjlab/fengdelin/data/InternVLA-N1-DualVLN" self.resize_w = 384 self.resize_h = 384 self.num_history = 8 self.camera_intrinsic = np.array([ [386.5, 0.0, 328.9, 0.0], [0.0, 386.5, 244.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0] ]) self.plan_step_gap = 4 args = Args() print(f"Model path: {args.model_path}") print(f"Device: {args.device}") print(f"Image size: {args.resize_w}x{args.resize_h}") print(f"History frames: {args.num_history}") ``` -------------------------------- ### Configure RDP Agent Source: https://context7.com/internrobotics/internnav/llms.txt Configuration for the RDP agent. Ensure the ckpt_path points to the correct checkpoint directory. ```python # RDP agent config rdp_cfg = AgentCfg( model_name='rdp', ckpt_path='checkpoints/r2r/rdp', model_settings={ 'env_num': 1, 'proc_num': 1, }, ) ``` -------------------------------- ### Evaluate Baseline Model Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Use this script for quick evaluation checks. The evaluation process logs can be viewed in the 'logs/' directory. ```bash ./scripts/eval/start_eval.sh --config scripts/eval/configs/challenge_cfg.py ``` -------------------------------- ### Clone InternNav Repository Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Clone the InternNav repository to your local machine. Ensure you use the --recursive flag to include submodules. ```bash git clone git@github.com:InternRobotics/InternNav.git --recursive ``` -------------------------------- ### Habitat Environment - internnav.env.habitat_env.HabitatEnv Source: https://context7.com/internrobotics/internnav/llms.txt Wraps the Habitat simulator, handling distributed workers, episode resuming from checkpoints, and exposing metrics for standard VLN measures. ```APIDOC ## Habitat Environment — `internnav.env.habitat_env.HabitatEnv` `HabitatEnv` wraps the Habitat simulator, shards episodes across distributed workers via `rank/world_size`, resumes from a `progress.json` checkpoint to skip already-completed episodes, and exposes `get_metrics()` for standard VLN measures (NE, SR, SPL). ```python from internnav.configs.evaluator import EnvCfg, TaskCfg from internnav.env.habitat_env import HabitatEnv env_config = EnvCfg( env_type='habitat', env_settings={ 'habitat_config': habitat_cfg, # OmegaConf/Habitat config object 'rank': 0, 'world_size': 4, # 4-GPU distributed evaluation 'output_path': './output/rank_0', }, ) env = HabitatEnv(env_config) print(f"Episodes assigned to rank 0: {len(env.episodes)}") while env.is_running: obs = env.reset() # advances to next episode if obs is None: break # all episodes done for _ in range(500): action = policy.act(obs) obs, reward, done, info = env.step(action) if done: metrics = env.get_metrics() # {'spl': 0.72, 'success': 1.0, ...} break env.close() ``` ``` -------------------------------- ### Run Local Benchmark Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Execute the evaluation benchmark locally on the validation set. This command mirrors the one used by EvalAI, serving as a pre-submission check. ```bash # Run local benchmark on the validation set $ bash challenge/start_eval_iros.sh --config scripts/eval/configs/challenge_cfg.py --split [val_seen/val_unseen] ``` -------------------------------- ### Clone IROS-2025-Challenge-Nav Dataset Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Use this command to clone the IROS-2025-Challenge-Nav dataset, which contains the vln_pe data. ```bash # InternData-N1 with vln-pe data only $ git clone https://huggingface.co/datasets/InternRobotics/IROS-2025-Challenge-Nav data ``` -------------------------------- ### Run InternNav Docker Container Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Execute the InternNav Docker container with all necessary configurations for GPU access, network, volume mounts, and environment variables. This command allows the container to access your display and mounts local directories for data and cache. ```bash xhost +local:root # Allow the container to access the display cd PATH/TO/INTERNNAV/ docker run --name internnav -it --rm --gpus all --network host \ -e "ACCEPT_EULA=Y" \ -e "PRIVACY_CONSENT=Y" \ -e "DISPLAY=${DISPLAY}" \ --entrypoint /bin/bash \ -w /root/InternNav \ -v /tmp/.X11-unix/:/tmp/.X11-unix \ -v ${PWD}:/root/InternNav \ -v ${HOME}/docker/isaac-sim/cache/kit:/isaac-sim/kit/cache:rw \ -v ${HOME}/docker/isaac-sim/cache/ov:/root/.cache/ov:rw \ -v ${HOME}/docker/isaac-sim/cache/pip:/root/.cache/pip:rw \ -v ${HOME}/docker/isaac-sim/cache/glcache:/root/.cache/nvidia/GLCache:rw \ -v ${HOME}/docker/isaac-sim/cache/computecache:/root/.nv/ComputeCache:rw \ -v ${HOME}/docker/isaac-sim/logs:/root/.nvidia-omniverse/logs:rw \ -v ${HOME}/docker/isaac-sim/data:/root/.local/share/ov/data:rw \ -v ${HOME}/docker/isaac-sim/documents:/root/Documents:rw \ -v ${PWD}/data/scene_data/mp3d_pe:/isaac-sim/Matterport3D/data/v1/scans:ro \ crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2 ``` -------------------------------- ### Download InternVLA-N1 Checkpoint Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Creates a 'checkpoints' directory and clones the InternVLA-N1 model checkpoint from Hugging Face. `git lfs pull` is used to download large files. ```bash !mkdir -p checkpoints && cd checkpoints && git clone https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN !git lfs pull ``` -------------------------------- ### Initialize Git Submodules Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Update and initialize git submodules, which may be required for dependencies like longclip and diffusion policy. ```bash # pulled code need to download longclip and diffusion policy $ git submodule update --init ``` -------------------------------- ### Test Agent with Robot Captures Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/onsite_competition/README.md Locally test your agent model using previously recorded robot observations. You may need to adjust the path to your agent. ```bash python challenge/onsite_competition/sdk/test_agent.py # you may need to modify the path to your agent ``` -------------------------------- ### Download longclip-B Baseline Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Download the longclip-B model weights using the huggingface-cli. This command ensures the model is downloaded to the specified local directory. ```bash # longclip-B $ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long ``` -------------------------------- ### Configure Evaluation Settings (Python) Source: https://context7.com/internrobotics/internnav/llms.txt Top-level Pydantic configuration for evaluation. Binds agent, environment, task, dataset, and evaluation settings. Use `use_agent_server=True` to run the agent in a separate process. ```python from internnav.configs.agent import AgentCfg from internnav.configs.evaluator import ( EvalCfg, EnvCfg, TaskCfg, SceneCfg, MetricCfg, EvalDatasetCfg, RobotCfg, SensorCfg, ControllerCfg, ) eval_cfg = EvalCfg( eval_type='vln_distributed', # registered evaluator key eval_settings={ 'save_to_json': True, 'vis_output': True, 'use_agent_server': True, # True: agent runs in separate process ), agent=AgentCfg( server_host='localhost', server_port=8023, model_name='internvla_n1', ckpt_path='', model_settings={ 'model_path': 'checkpoints/InternVLA-N1-DualVLN', 'width': 640, 'height': 480, 'hfov': 79, 'device': 'cuda:0', 'infer_mode': 'partial_async', 'vis_debug': False, }, ), env=EnvCfg( env_type='internutopia', env_settings={'headless': True, 'use_fabric': False}, ), task=TaskCfg( task_name='internvla_n1_eval', task_settings={'max_step': 1000, 'use_distributed': False, 'proc_num': 1, 'env_num': 1}, scene=SceneCfg(scene_type='mp3d', scene_data_dir='data/scene_data/mp3d_pe'), robot_name='h1', robot_flash=True, flash_collision=False, robot_usd_path='data/Embodiments/vln-pe/h1/h1_internvla.usd', camera_resolution=[640, 480], camera_prim_path='torso_link/h1_1_25_down_30', ), dataset=EvalDatasetCfg( dataset_type='mp3d', dataset_settings={ 'base_data_dir': 'data/vln_pe/raw_data/r2r', 'split_data_types': ['val_unseen'], 'filter_stairs': True, }, ), ) ``` -------------------------------- ### Test Submission Docker Image Locally Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Quickly test your built Docker image locally with a mini split of the R2R dataset. This also verifies public access to your image registry. ```bash $ docker logout $ docker run --name internnav-test -it --gpus all --network host \ -e "ACCEPT_EULA=Y" \ -e "PRIVACY_CONSENT=Y" \ -e "DISPLAY=${DISPLAY}" \ --entrypoint /bin/bash \ -w /root/InternNav \ -v /tmp/.X11-unix/:/tmp/.X11-unix \ -v ${PWD}/data:/root/InternNav/data \ -v ${PWD}/interiornav_data:/root/InternNav/interiornav_data \ your-registry/internnav-custom:v1 \ -c "challenge/start_eval_iros.sh --config scripts/eval/configs/challenge_cfg.py --split mini; exec /bin/bash" ``` -------------------------------- ### Clone Embodiments Dataset Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Clone the Embodiments dataset into the data/Embodiments directory. ```bash # Embodiments $ git clone https://huggingface.co/datasets/InternRobotics/Embodiments data/Embodiments ``` -------------------------------- ### Initialize and Use DialogAgent Source: https://context7.com/internrobotics/internnav/llms.txt Configures and utilizes the DialogAgent for interactive instance-goal navigation. This agent employs a vision-language model for navigation actions, pixel-goals, or dialog queries. It merges conversation history with observations for multimodal prompting. The agent interacts with a Habitat environment. ```python from internnav.agent.dialog_agent import DialogAgent from internnav.configs.agent import AgentCfg cfg = AgentCfg( model_name='dialog', model_settings={ 'task_name': 'dialog_r2r', 'task': 'dialog_r2r', 'model_path': 'checkpoints/InternVLA-N1-DualVLN', 'mode': 'system2', 'dialog_enabled': True, 'num_history': 4, 'resize_h': 336, 'resize_w': 336, 'append_look_down': True, 'max_new_tokens': 512, 'local_rank': 0, 'sim_sensors_config': habitat_sensor_cfg, # habitat sensor config object }, ) agent = DialogAgent(cfg) agent.reset(env=habitat_env) # binds ShortestPathFollower and resets state obs, info = habitat_env.reset(), {'step': 0, 'episode_instruction': 'Find the red chair.', 'output_path': './log.txt', 'agent state': agent_state} for step in range(200): info['step'] = step action = agent.step(obs, env=habitat_env, info=info) # action: int — 0=STOP, 1=FWD, 2=LEFT, 3=RIGHT, 5=LOOK_DOWN, 6=DIALOG, 7=NO_OP obs, reward, done, info_env = habitat_env.step(action) if done: break ``` -------------------------------- ### Configure CMA/Seq2Seq Agent Source: https://context7.com/internrobotics/internnav/llms.txt Configuration for the CMA/Seq2Seq baseline agent. Requires matching the model_name to a registered agent key. Ensure the ckpt_path points to the correct checkpoint directory. ```python from internnav.configs.agent import AgentCfg # CMA / Seq2Seq baseline agent config cma_cfg = AgentCfg( server_host='localhost', server_port=8087, model_name='cma', # must match @Agent.register() key ckpt_path='checkpoints/r2r/cma_plus', model_settings={ 'env_num': 1, 'proc_num': 8, }, ) ``` -------------------------------- ### Enable Visualization in Evaluation Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md Update the evaluation configuration to visualize trajectories. Set 'eval_settings['vis_output']=True' for saved frames and video, and 'env_settings['headless']=False' to open the interactive Isaac Sim window. ```python eval_settings['vis_output']=True env_settings['headless']=False ``` -------------------------------- ### Import Required Libraries Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb Imports necessary Python libraries for the project, including system modules, path manipulation, numerical operations, image handling, and PyTorch. It also adds project paths to sys.path for module imports and initializes the InternVLA-N1 agent. ```python import sys import os import glob from pathlib import Path import numpy as np from PIL import Image import torch # Add project path project_root = Path('../../') sys.path.insert(0, str(project_root)) sys.path.insert(0, str(project_root / 'src/diffusion-policy')) from internnav.agent.internvla_n1_agent_realworld import InternVLAN1AsyncAgent ``` -------------------------------- ### Convert Datasets (Bash) Source: https://context7.com/internrobotics/internnav/llms.txt Converts VLN-CE trajectory datasets to LeRobot parquet format. Supports multi-threaded episode processing for efficiency. Specify the data directory, repository name, and datasets to convert. ```bash python scripts/dataset_converters/vlnce2lerobot.py \ --data_dir /data/streamvln \ --repo_name vln_ce_lerobot \ --datasets RxR \ --num_threads 10 \ --start_index 0 \ --end_index 5000 ``` ```bash python scripts/dataset_converters/vlnce2lerobot.py \ --data_dir /data/streamvln \ --repo_name vln_ce_lerobot \ --datasets R2R \ --num_threads 16 ```