### Install InternNav Package

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/onsite_competition/README.md

Install the InternNav package by navigating to the project directory and running the setup script.

```bash
cd /InternNav
pip install -e .
```

--------------------------------

### Install Flash-Attention 2

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Downloads and installs the pre-built wheel for Flash-Attention 2. If installation fails, you can skip this and remove the `attn_implementation="flash_attention_2"` argument during model initialization.

```bash
!wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
%pip install flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
```

--------------------------------

### Start Agent Server

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/onsite_competition/README.md

Launch the agent server using the provided utility script, specifying the configuration file.

```bash
python -m internnav.agent.utils.server --config path/to/cfg.py
```

--------------------------------

### Install Core Dependencies

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Installs essential libraries including transformers, diffusers, accelerate, opencv-python, pillow, numpy, and gym.

```python
%pip install transformers==4.51.0 diffusers==0.31.0 accelerate==1.10.1 opencv-python==4.10.0.82 pillow==10.4.0 numpy==1.26.4 gym==0.23.1
%pip install imageio==2.37.0 imageio-ffmpeg==0.6.0 ftfy==6.3.1
%pip install scipy matplotlib
%pip install -e ../../. # install InternNav
```

--------------------------------

### Start Agent Server (Bash)

Source: https://context7.com/internrobotics/internnav/llms.txt

Starts a FastAPI-based HTTP agent server. Useful for running agents in separate processes or on different nodes. Ensure the evaluator configuration matches the server port.

```bash
python scripts/eval/start_server.py \
    --host localhost \
    --config scripts/eval/configs/h1_cma_cfg.py
```

```bash
python scripts/eval/start_server.py \
    --config scripts/eval/configs/h1_internvla_n1_async_cfg.py \
    --reload
```

--------------------------------

### Agent Base Class and Custom Agent Example

Source: https://context7.com/internrobotics/internnav/llms.txt

Demonstrates how to define a custom agent by extending the Agent base class and how to instantiate agents using the registry.

```APIDOC
## Agent Base Class — `internnav.agent.base.Agent`

The `Agent` base class defines the common interface for all navigation policy agents and provides a class-level registry for instantiating agents by name from configuration objects.

```python
from internnav.agent.base import Agent
from internnav.configs.agent import AgentCfg

# --- Define a custom agent ---
@Agent.register('my_custom_agent')
class MyCustomAgent(Agent):
    def __init__(self, config: AgentCfg):
        super().__init__(config)
        # initialize your model here

    def reset(self, reset_index=None):
        # called at the start of every new episode
        pass

    def step(self, obs):
        # obs is a list of dicts: [{'rgb': np.ndarray, 'depth': np.ndarray, 'instruction': str}]
        # must return a list of action dicts
        return [{'action': [1], 'ideal_flag': True}]  # action 1 = FORWARD

# --- Instantiate via registry ---
cfg = AgentCfg(
    model_name='my_custom_agent',
    ckpt_path='checkpoints/my_model',
    model_settings={'device': 'cuda:0'},
)
agent = Agent.init(cfg)
obs = [{'rgb': rgb_array, 'depth': depth_array, 'instruction': 'Go to the kitchen.'}]
actions = agent.step(obs)
# actions -> [{'action': [1], 'ideal_flag': True}]
agent.reset(reset_index=[0])
```
```

--------------------------------

### Output Trajectory Example 1

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Represents a sample output trajectory generated during inference.

```python
output_trajectory: [[0.0, 0.0], [0.10174560546875, -0.0021262168884277344], [0.2030487060546875, -0.008135795593261719], [0.3041839599609375, -0.015857279300689697], [0.405426025390625, -0.02214580774307251], [0.506927490234375, -0.023593485355377197], [0.6083984375, -0.017211496829986572], [0.70843505859375, -0.004626333713531494], [0.8078155517578125, 0.012145936489105225], [0.906158447265625, 0.03255265951156616], [1.0032196044921875, 0.05704110860824585], [1.098358154296875, 0.0870322585105896], [1.1915283203125, 0.12381595373153687], [1.28155517578125, 0.16996997594833374], [1.3683929443359375, 0.221796452999115], [1.4533538818359375, 0.277399480342865], [1.5370635986328125, 0.33464282751083374], [1.61981201171875, 0.39309924840927124], [1.7007598876953125, 0.45379871129989624], [1.7807464599609375, 0.51595538854599], [1.8596649169921875, 0.5800651907920837], [1.9368743896484375, 0.64583820104599], [2.0137100219726562, 0.7126182913780212], [2.0897674560546875, 0.780085027217865], [2.1649703979492188, 0.84804767370224], [2.2330245971679688, 0.9104179739952087], [2.2977821826934814, 0.97103451192379], [2.348536729812622, 1.0183265060186386], [2.3651299476623535, 1.0324728339910507], [2.3817588090896606, 1.0465948432683945], [2.3843406438827515, 1.0491845458745956], [2.383785128593445, 1.0496225208044052], [2.3835805654525757, 1.0501257628202438]]
```

--------------------------------

### Install PyTorch and Check Version

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Installs PyTorch version 2.6.0 with CUDA 12.4 support and prints the installed version. Ensure your CUDA toolkit is compatible.

```python
%pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
import torch
print(torch.__version__)
```

--------------------------------

### Download InteriorNav Dataset

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Download the InteriorNav dataset, which includes scene USD files and navigation data splits (val_seen + val_unseen). Ensure git-lfs is installed and clone the datasets into the specified directory.

```bash
git lfs install
# At /root/InternNav/ 
mkdir interiornav_data

# InteriorNav scene usd
git clone https://huggingface.co/datasets/spatialverse/InteriorAgent interiornav_data/scene_data

# InteriorNav val dataset
git clone https://huggingface.co/datasets/spatialverse/InteriorAgent_Nav interiornav_data/raw_data
```

--------------------------------

### Output Trajectory Example 2

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Represents an alternative sample output trajectory generated during inference.

```python
output_trajectory: [[0.0, 0.0], [0.09758758544921875, -0.011452913284301758], [0.19722747802734375, -0.021155595779418945], [0.29154205322265625, -0.02734684944152832], [0.38101673126220703, -0.0293729305267334], [0.469879150390625, -0.0261075496673584], [0.554107666015625, -0.016035795211791992], [0.6353373527526855, -0.0012137889862060547], [0.7102475166320801, 0.018787622451782227], [0.7814936637878418, 0.043318986892700195], [0.8519377708435059, 0.07297158241271973], [0.9173874855041504, 0.1057593822479248], [0.9814028739929199, 0.141371488571167], [1.045126736164093, 0.17730927467346191], [1.1082122921943665, 0.2132399082183838], [1.169586956501007, 0.2529715299606323], [1.2295807003974915, 0.29670655727386475], [1.2895076870918274, 0.3415853977203369], [1.3475502133369446, 0.38594746589660645], [1.4070499539375305, 0.4295980930328369], [1.468199074268341, 0.4694092273712158], [1.5301488041877747, 0.5065538883209229], [1.5891702771186829, 0.5427916049957275], [1.635184109210968, 0.5773718357086182], [1.6797216534614563, 0.6128342151641846], [1.7144139409065247, 0.6410515308380127], [1.7460413575172424, 0.6668822765350342], [1.7761572003364563, 0.6895382404327393], [1.8000407814979553, 0.707094669342041], [1.8237372040748596, 0.7243056297302246], [1.8280614018440247, 0.7274879217147827], [1.8279209733009338, 0.7279735803604126], [1.8280540108680725, 0.7284926176071167]]
```

--------------------------------

### Submission JSON Format

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Create a JSON file with your Docker image URL and team information for submission on EvalAI. Ensure the structure exactly matches the example provided.

```json
{
    "url": "your-registry/internnav-custom:v1",
    "team": {
        "name": "your-team-name",
        "members": [
            {
                "name": "John Doe",
                "affiliation": "University of Example",
                "email": "john.doe@example.com",
                "leader": true
            },
            {
                "name": "Jane Smith",
                "affiliation": "Example Research Lab",
                "email": "jane.smith@example.com",
                "leader": false
            }
        ]
    }
}

```

--------------------------------

### Output Trajectory Example 3

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Represents a third sample output trajectory generated during inference.

```python
output_trajectory: [[0.0, 0.0], [0.0973052978515625, -0.010159492492675781], [0.1962432861328125, -0.021831512451171875], [0.2963714599609375, -0.032324790954589844], [0.397308349609375, -0.040537357330322266], [0.4984283447265625, -0.04426717758178711], [0.5986785888671875, -0.04123497009277344], [0.69769287109375, -0.03177833557128906], [0.7961883544921875, -0.01540517807006836], [0.8932037353515625, 0.006081104278564453], [0.98870849609375, 0.03310894966125488], [1.0823516845703125, 0.0654900074005127], [1.174591064453125, 0.10251164436340332], [1.2654876708984375, 0.14447903633117676], [1.3544769287109375, 0.1904308795928955], [1.44110107421875, 0.24196743965148926], [1.525360107421875, 0.29798054695129395], [1.607818603515625, 0.3560631275177002], [1.6896209716796875, 0.4150993824005127], [1.771026611328125, 0.4748260974884033], [1.850372314453125, 0.5357697010040283], [1.9269771575927734, 0.5971939563751221], [2.0023632049560547, 0.6593964099884033], [2.075179100036621, 0.7212479114532471], [2.147168457508087, 0.7842972278594971], [2.217978775501251, 0.8480370044708252], [2.288813889026642, 0.9125664234161377], [2.3539403080940247, 0.9707787036895752], [2.404560387134552, 1.015345811843872], [2.4554598927497864, 1.057633638381958], [2.4636533856391907, 1.0646393299102783], [2.4630263447761536, 1.0648670196533203], [2.463208019733429, 1.0650734901428223]]
```

--------------------------------

### Train Baseline Model

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Use this script to train your model. Ensure you activate the correct conda environment and install dependencies.

```bash
conda activate internutopia
pip install -r requirements/train.txt --index-url https://pypi.org/simple

./scripts/train/start_train.sh --name train_rdp --model rdp
```

--------------------------------

### Define and Instantiate Custom Agent with InternNav

Source: https://context7.com/internrobotics/internnav/llms.txt

Demonstrates how to define a custom agent by inheriting from `internnav.agent.base.Agent` and registering it. Shows instantiation via the `Agent.init` registry using a configuration object.

```python
from internnav.agent.base import Agent
from internnav.configs.agent import AgentCfg

# --- Define a custom agent ---
@Agent.register('my_custom_agent')
class MyCustomAgent(Agent):
    def __init__(self, config: AgentCfg):
        super().__init__(config)
        # initialize your model here

    def reset(self, reset_index=None):
        # called at the start of every new episode
        pass

    def step(self, obs):
        # obs is a list of dicts: [{'rgb': np.ndarray, 'depth': np.ndarray, 'instruction': str}]
        # must return a list of action dicts
        return [{'action': [1], 'ideal_flag': True}]  # action 1 = FORWARD

# --- Instantiate via registry ---
cfg = AgentCfg(
    model_name='my_custom_agent',
    ckpt_path='checkpoints/my_model',
    model_settings={'device': 'cuda:0'},
)
agent = Agent.init(cfg)
obs = [{'rgb': rgb_array, 'depth': depth_array, 'instruction': 'Go to the kitchen.'}]
actions = agent.step(obs)
# actions -> [{'action': [1], 'ideal_flag': True}]
agent.reset(reset_index=[0])
```

--------------------------------

### Importing HabitatEnv

Source: https://github.com/internrobotics/internnav/blob/main/internnav/habitat_extensions/vln/README.md

Demonstrates how to import the Habitat environment wrapper from the package.

```python
from internnav.habitat_extensions import HabitatEnv
```

--------------------------------

### Train Policy Models (Bash)

Source: https://context7.com/internrobotics/internnav/llms.txt

Main training entry point for supported policy models. Initializes distributed training and runs Hugging Face training loops. Use `torchrun` for multi-GPU distributed training.

```bash
python scripts/train/base_train/train.py \
    --name cma_r2r_run1 \
    --model_name cma
```

```bash
python scripts/train/base_train/train.py \
    --name rdp_r2r_run1 \
    --model_name rdp
```

```bash
torchrun --nproc_per_node=4 scripts/train/base_train/train.py \
    --name navdp_run1 \
    --model_name navdp
```

--------------------------------

### Download ddppo-models Baseline

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Download the ddppo-models baseline weights. Ensure the checkpoints/ddppo-models directory exists.

```bash
# ddppo-models
$ mkdir -p checkpoints/ddppo-models
$ wget -P checkpoints/ddppo-models https://dl.fbaipublicfiles.com/habitat/data/baselines/v1/ddppo/ddppo-models/gibson-4plus-mp3d-train-val-test-resnet50.pth
```

--------------------------------

### Download R2R Finetuned Baseline Checkpoints

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Clone the VLN-PE repository and move the r2r checkpoints to the checkpoints directory.

```bash
# download r2r finetuned baseline checkpoints
$ git clone https://huggingface.co/InternRobotics/VLN-PE && mv VLN-PE/r2r checkpoints/
```

--------------------------------

### Initialize and Use Habitat Environment

Source: https://context7.com/internrobotics/internnav/llms.txt

Wrap the Habitat simulator for distributed evaluation. Episodes are sharded across workers, and progress can be resumed from checkpoints. Use `get_metrics()` for standard VLN measures.

```python
from internnav.configs.evaluator import EnvCfg, TaskCfg
from internnav.env.habitat_env import HabitatEnv

env_config = EnvCfg(
    env_type='habitat',
    env_settings={
        'habitat_config': habitat_cfg,   # OmegaConf/Habitat config object
        'rank': 0,
        'world_size': 4,                 # 4-GPU distributed evaluation
        'output_path': './output/rank_0',
    },
)
env = HabitatEnv(env_config)
print(f"Episodes assigned to rank 0: {len(env.episodes)}")

while env.is_running:
    obs = env.reset()                      # advances to next episode
    if obs is None:
        break                              # all episodes done
    for _ in range(500):
        action = policy.act(obs)
        obs, reward, done, info = env.step(action)
        if done:
            metrics = env.get_metrics()    # {'spl': 0.72, 'success': 1.0, ...}
            break

env.close()
```

--------------------------------

### Initialize and Use InternVLAN1Agent

Source: https://context7.com/internrobotics/internnav/llms.txt

Sets up the InternVLAN1Agent using a dual-system architecture for navigation. It configures model settings, including paths, dimensions, and inference modes. The agent executes navigation steps and breaks the loop if an episode reset is indicated.

```python
from internnav.agent.internvla_n1_agent import InternVLAN1Agent
from internnav.configs.agent import AgentCfg

cfg = AgentCfg(
    model_name='internvla_n1',
    ckpt_path='',  # weights loaded from model_settings.model_path
    model_settings={
        'model_path': 'checkpoints/InternVLA-N1-DualVLN',
        'width': 640,
        'height': 480,
        'hfov': 79,
        'resize_w': 384,
        'resize_h': 384,
        'max_new_tokens': 1024,
        'num_frames': 32,
        'num_history': 8,
        'num_future_steps': 4,
        'device': 'cuda:0',
        'predict_step_nums': 32,
        'continuous_traj': True,
        'infer_mode': 'partial_async',  # 'sync' or 'partial_async'
        'sys2_max_forward_step': 8,
        'vis_debug': False,
        'vis_debug_path': './logs/vis_debug',
    },
)
agent = InternVLAN1Agent(cfg)
agent.reset(reset_index=[0])  # start first episode

for step in range(1000):
    obs = [
        {
            'rgb': rgb_frame,       # np.ndarray (H, W, 3) uint8
            'depth': depth_frame,   # np.ndarray (H, W, 1) float32 in metres
            'instruction': 'Exit the bedroom and go to the living room.',
        }
    ]
    result = agent.step(obs)
    # result -> [{'action': [1], 'ideal_flag': True}]
    # action -1 means "episode boundary / reset needed"
    if result[0]['action'] == [-1]:
        break
agent.reset(reset_index=[0])  # start next episode; closes debug video writers
```

--------------------------------

### Initialize and Warm-up Agent

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Initializes the InternVLA-N1 agent with the specified arguments and performs a warm-up step using dummy data. This ensures the model is ready for inference and helps catch potential issues early.

```python
print("Loading model...")
agent = InternVLAN1AsyncAgent(args)

# Warm up model
print("Warming up model...")
dummy_rgb = np.zeros((480, 640, 3), dtype=np.uint8)
dummy_depth = np.zeros((480, 640), dtype=np.float32)
dummy_pose = np.eye(4)
agent.reset()
agent.step(dummy_rgb, dummy_depth, dummy_pose, "hello", intrinsic=args.camera_intrinsic)
print("Model loaded successfully!")
```

--------------------------------

### Download SceneData-N1

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Download the SceneData-N1 dataset, which contains mp3d_pe data, and unzip it into the data/scene_data directory.

```bash
# Scene
$ wget https://huggingface.co/datasets/InternRobotics/Scene-N1/resolve/main/mp3d_pe.tar.gz    # unzip to data/scene_data
```

--------------------------------

### Build Submission Docker Image

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Build a Docker image for submission. Ensure your trained weights and model code are correctly packaged within the image at `/root/InternNav`.

```bash
# Navigate to the directory
$ cd PATH/TO/INTERNNAV/

# Build the new image
$ docker build -t my-internnav-custom:v1 .

```

```bash
$ docker commit internnav my-internnav-with-updates:v1
# Easier to manage custom environment
# May include all changes, making the docker image bloat. Please delete cache and other operations to reduce the image size.

```

```bash
$ docker tag my-internnav-custom:v1 your-registry/internnav-custom:v1
$ docker push your-registry/internnav-custom:v1

```

--------------------------------

### Configure Test Data Path

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Sets the directory for the test dataset, which is a pre-collected real-world dataset captured by a Unitree Go2 robot. Users can change this to their own dataset, which should include both aligned depth and RGB images, along with an 'instruction.txt' file.

```python
# Configure data directory (single scene per folder)
scene_dir = '../../assets/realworld_sample_data1'
```

--------------------------------

### Registering HabitatVllnEnv

Source: https://github.com/internrobotics/internnav/blob/main/internnav/habitat_extensions/vlln/README.md

Demonstrates how HabitatVllnEnv is registered under the key "habitat_vlln" using the shared Env.register decorator.

```python
HabitatVllnEnv = Env.register("habitat_vlln")(HabitatVllnEnv)
```

--------------------------------

### Package Structure

Source: https://github.com/internrobotics/internnav/blob/main/internnav/habitat_extensions/vln/README.md

Illustrates the directory structure of the Habitat extensions within the InternNav project.

```tree
habitat_extensions/vln/
├── __init__.py
├── habitat_env.py
├── habitat_default_evaluator.py
├── habitat_vln_evaluator.py
└── measures.py
```

--------------------------------

### Initialize and Use RdpAgent

Source: https://context7.com/internrobotics/internnav/llms.txt

Initializes the RdpAgent with a given configuration and demonstrates its usage in a navigation loop. The agent caches waypoints and replans when the cache is empty. Resets the agent if an episode ends.

```python
from internnav.agent.rdp_agent import RdpAgent
from internnav.configs.agent import AgentCfg

cfg = AgentCfg(
    model_name='rdp',
    ckpt_path='checkpoints/rdp/r2r_rdp',
    model_settings={
        'env_num': 1,
        'proc_num': 1,
    },
)
agent = RdpAgent(cfg)
agent.reset()

for step in range(500):
    obs_batch = [
        {
            'instruction': 'Walk past the sofa and stop at the door.',  # raw string
            'globalgps': np.array([1.2, 0.5, 0.0]),   # 3D world position
            'globalrotation': np.array([0, 0, 0, 1]), # quaternion [x,y,z,w]
            'rgb': rgb_frame,     # np.ndarray (H, W, 3)
            'depth': depth_frame, # np.ndarray (H, W, 1)
        }
    ]
    actions = agent.step(obs_batch)
    # actions -> [{'action': [1], 'ideal_flag': True}]
    # Internally, RdpAgent caches a trajectory of len_traj_act waypoints
    # and only calls the diffusion denoiser when the cache is empty.

    if actions[0]['action'] == [-1]:
        agent.reset(reset_ls=[0])  # -1 indicates episode just reset
```

--------------------------------

### Define a Custom Simulation Environment

Source: https://context7.com/internrobotics/internnav/llms.txt

Implement a custom simulation environment by inheriting from `Env` and registering it. Ensure all required methods (`reset`, `step`, `close`, `get_observation`, `get_info`) are defined. Instantiate via the `Env.init` registry.

```python
from internnav.env.base import Env
from internnav.configs.evaluator import EnvCfg, TaskCfg

@Env.register('my_sim')
class MySimEnv(Env):
    def __init__(self, env_config: EnvCfg, task_config: TaskCfg):
        super().__init__(env_config, task_config)
        # initialize your simulator here

    def reset(self):
        return {'rgb': ..., 'depth': ..., 'instruction': '...'}

    def step(self, action):
        obs = ...
        done = False
        info = {'reward': 0.0}
        return obs, 0.0, done, info

    def close(self): ...
    def render(self): ...
    def get_observation(self): return {'rgb': ...}
    def get_info(self): return {}

# Instantiate via registry
env = Env.init(
    env_config=EnvCfg(env_type='my_sim', env_settings={'headless': True}),
    task_config=TaskCfg(task_name='nav_task', task_settings={'max_step': 500}, scene=None),
)
obs = env.reset()
obs, reward, done, info = env.step(action=1)
env.close()
```

--------------------------------

### Launch Evaluation Script

Source: https://context7.com/internrobotics/internnav/llms.txt

Use the unified CLI entry point `scripts/eval/eval.py` to launch registered evaluators. Specify the configuration file using the `--config` argument. Supports distributed evaluation with `torchrun`.

```bash
# Evaluate CMA baseline on VLN-PE (R2R val_unseen) using Isaac Sim / InternUtopia
python scripts/eval/eval.py --config scripts/eval/configs/h1_cma_cfg.py

# Evaluate InternVLA-N1 (DualVLN, partial_async) on VLN-PE
python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py

# Evaluate InternVLA-N1 dual system on Habitat VLN-CE
python scripts/eval/eval.py --config scripts/eval/configs/habitat_dual_system_cfg.py

# Distributed evaluation with 4 GPUs (set use_distributed=True in config)
torchrun --nproc_per_node=4 scripts/eval/eval.py \
    --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

--------------------------------

### Extract Sample Dataset

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Extracts the real-world sample dataset archive to the specified directory. Ensure the archive file exists.

```bash
!tar -xvf ../../assets/realworld_sample_data.tar.gz -C ../../assets/
```

--------------------------------

### Check and Read Instruction File

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Verifies the existence of 'instruction.txt' in a given scene directory and reads its content. It also lists debug images found in the directory.

```python
instruction_path = os.path.join(scene_dir, 'instruction.txt')
if not os.path.exists(instruction_path):
    print(f"Error: instruction.txt not found in {scene_dir}")
else:
    print(f"Scene directory: {scene_dir}")
    
    # Read instruction
    with open(instruction_path, 'r') as f:
        instruction = f.read().strip()
    print(f"Instruction: {instruction}")
    
    # Get all debug_raw images
    rgb_paths = sorted(glob.glob(os.path.join(scene_dir, 'debug_raw_*.jpg')))
    print(f"\nFound {len(rgb_paths)} images")
    # Show first few image names
    print("\nFirst 5 images:")
    for i, path in enumerate(rgb_paths[:5]):
        print(f"  {i+1}. {os.path.basename(path)}")
```

--------------------------------

### Configure Inference Parameters

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Defines a class to hold configuration arguments for the agent, including device, model path, image dimensions, history length, camera intrinsics, and a step gap for inference planning. The `plan_step_gap` argument controls inference frequency to mitigate sim-to-real gaps.

```python
class Args:
    def __init__(self):
        self.device = "cuda:0"
        self.model_path = "/home/pjlab/fengdelin/data/InternVLA-N1-DualVLN"
        self.resize_w = 384
        self.resize_h = 384
        self.num_history = 8
        self.camera_intrinsic = np.array([
            [386.5, 0.0, 328.9, 0.0],
            [0.0, 386.5, 244.0, 0.0],
            [0.0, 0.0, 1.0, 0.0],
            [0.0, 0.0, 0.0, 1.0]
        ])
        self.plan_step_gap = 4

args = Args()
print(f"Model path: {args.model_path}")
print(f"Device: {args.device}")
print(f"Image size: {args.resize_w}x{args.resize_h}")
print(f"History frames: {args.num_history}")
```

--------------------------------

### Configure RDP Agent

Source: https://context7.com/internrobotics/internnav/llms.txt

Configuration for the RDP agent. Ensure the ckpt_path points to the correct checkpoint directory.

```python
# RDP agent config
rdp_cfg = AgentCfg(
    model_name='rdp',
    ckpt_path='checkpoints/r2r/rdp',
    model_settings={
        'env_num': 1,
        'proc_num': 1,
    },
)
```

--------------------------------

### Evaluate Baseline Model

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Use this script for quick evaluation checks. The evaluation process logs can be viewed in the 'logs/' directory.

```bash
./scripts/eval/start_eval.sh --config scripts/eval/configs/challenge_cfg.py
```

--------------------------------

### Clone InternNav Repository

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Clone the InternNav repository to your local machine. Ensure you use the --recursive flag to include submodules.

```bash
git clone git@github.com:InternRobotics/InternNav.git --recursive
```

--------------------------------

### Habitat Environment - internnav.env.habitat_env.HabitatEnv

Source: https://context7.com/internrobotics/internnav/llms.txt

Wraps the Habitat simulator, handling distributed workers, episode resuming from checkpoints, and exposing metrics for standard VLN measures.

```APIDOC
## Habitat Environment — `internnav.env.habitat_env.HabitatEnv`

`HabitatEnv` wraps the Habitat simulator, shards episodes across distributed workers via `rank/world_size`, resumes from a `progress.json` checkpoint to skip already-completed episodes, and exposes `get_metrics()` for standard VLN measures (NE, SR, SPL).

```python
from internnav.configs.evaluator import EnvCfg, TaskCfg
from internnav.env.habitat_env import HabitatEnv

env_config = EnvCfg(
    env_type='habitat',
    env_settings={
        'habitat_config': habitat_cfg,   # OmegaConf/Habitat config object
        'rank': 0,
        'world_size': 4,                 # 4-GPU distributed evaluation
        'output_path': './output/rank_0',
    },
)
env = HabitatEnv(env_config)
print(f"Episodes assigned to rank 0: {len(env.episodes)}")

while env.is_running:
    obs = env.reset()                      # advances to next episode
    if obs is None:
        break                              # all episodes done
    for _ in range(500):
        action = policy.act(obs)
        obs, reward, done, info = env.step(action)
        if done:
            metrics = env.get_metrics()    # {'spl': 0.72, 'success': 1.0, ...}
            break

env.close()
```
```

--------------------------------

### Run Local Benchmark

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Execute the evaluation benchmark locally on the validation set. This command mirrors the one used by EvalAI, serving as a pre-submission check.

```bash
# Run local benchmark on the validation set
$ bash challenge/start_eval_iros.sh --config scripts/eval/configs/challenge_cfg.py --split [val_seen/val_unseen]

```

--------------------------------

### Clone IROS-2025-Challenge-Nav Dataset

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Use this command to clone the IROS-2025-Challenge-Nav dataset, which contains the vln_pe data.

```bash
# InternData-N1 with vln-pe data only
$ git clone https://huggingface.co/datasets/InternRobotics/IROS-2025-Challenge-Nav data
```

--------------------------------

### Run InternNav Docker Container

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Execute the InternNav Docker container with all necessary configurations for GPU access, network, volume mounts, and environment variables. This command allows the container to access your display and mounts local directories for data and cache.

```bash
xhost +local:root # Allow the container to access the display

cd PATH/TO/INTERNNAV/ 

docker run --name internnav -it --rm --gpus all --network host \
  -e "ACCEPT_EULA=Y" \
  -e "PRIVACY_CONSENT=Y" \
  -e "DISPLAY=${DISPLAY}" \
  --entrypoint /bin/bash \
  -w /root/InternNav \
  -v /tmp/.X11-unix/:/tmp/.X11-unix \
  -v ${PWD}:/root/InternNav \
  -v ${HOME}/docker/isaac-sim/cache/kit:/isaac-sim/kit/cache:rw \
  -v ${HOME}/docker/isaac-sim/cache/ov:/root/.cache/ov:rw \
  -v ${HOME}/docker/isaac-sim/cache/pip:/root/.cache/pip:rw \
  -v ${HOME}/docker/isaac-sim/cache/glcache:/root/.cache/nvidia/GLCache:rw \
  -v ${HOME}/docker/isaac-sim/cache/computecache:/root/.nv/ComputeCache:rw \
  -v ${HOME}/docker/isaac-sim/logs:/root/.nvidia-omniverse/logs:rw \
  -v ${HOME}/docker/isaac-sim/data:/root/.local/share/ov/data:rw \
  -v ${HOME}/docker/isaac-sim/documents:/root/Documents:rw \
  -v ${PWD}/data/scene_data/mp3d_pe:/isaac-sim/Matterport3D/data/v1/scans:ro \
  crpi-mdum1jboc8276vb5.cn-beijing.personal.cr.aliyuncs.com/iros-challenge/internnav:v1.2
```

--------------------------------

### Download InternVLA-N1 Checkpoint

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Creates a 'checkpoints' directory and clones the InternVLA-N1 model checkpoint from Hugging Face. `git lfs pull` is used to download large files.

```bash
!mkdir -p checkpoints && cd checkpoints && git clone https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN
!git lfs pull
```

--------------------------------

### Initialize Git Submodules

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Update and initialize git submodules, which may be required for dependencies like longclip and diffusion policy.

```bash
# pulled code need to download longclip and diffusion policy
$ git submodule update --init
```

--------------------------------

### Test Agent with Robot Captures

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/onsite_competition/README.md

Locally test your agent model using previously recorded robot observations. You may need to adjust the path to your agent.

```bash
python challenge/onsite_competition/sdk/test_agent.py  # you may need to modify the path to your agent
```

--------------------------------

### Download longclip-B Baseline

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Download the longclip-B model weights using the huggingface-cli. This command ensures the model is downloaded to the specified local directory.

```bash
# longclip-B
$ huggingface-cli download --include 'longclip-B.pt' --local-dir-use-symlinks False --resume-download Beichenzhang/LongCLIP-B --local-dir checkpoints/clip-long
```

--------------------------------

### Configure Evaluation Settings (Python)

Source: https://context7.com/internrobotics/internnav/llms.txt

Top-level Pydantic configuration for evaluation. Binds agent, environment, task, dataset, and evaluation settings. Use `use_agent_server=True` to run the agent in a separate process.

```python
from internnav.configs.agent import AgentCfg
from internnav.configs.evaluator import (
    EvalCfg, EnvCfg, TaskCfg, SceneCfg, MetricCfg, EvalDatasetCfg,
    RobotCfg, SensorCfg, ControllerCfg,
)

eval_cfg = EvalCfg(
    eval_type='vln_distributed',             # registered evaluator key
    eval_settings={
        'save_to_json': True,
        'vis_output': True,
        'use_agent_server': True,            # True: agent runs in separate process
    ),
    agent=AgentCfg(
        server_host='localhost',
        server_port=8023,
        model_name='internvla_n1',
        ckpt_path='',
        model_settings={
            'model_path': 'checkpoints/InternVLA-N1-DualVLN',
            'width': 640, 'height': 480, 'hfov': 79,
            'device': 'cuda:0',
            'infer_mode': 'partial_async',
            'vis_debug': False,
        },
    ),
    env=EnvCfg(
        env_type='internutopia',
        env_settings={'headless': True, 'use_fabric': False},
    ),
    task=TaskCfg(
        task_name='internvla_n1_eval',
        task_settings={'max_step': 1000, 'use_distributed': False, 'proc_num': 1, 'env_num': 1},
        scene=SceneCfg(scene_type='mp3d', scene_data_dir='data/scene_data/mp3d_pe'),
        robot_name='h1',
        robot_flash=True,
        flash_collision=False,
        robot_usd_path='data/Embodiments/vln-pe/h1/h1_internvla.usd',
        camera_resolution=[640, 480],
        camera_prim_path='torso_link/h1_1_25_down_30',
    ),
    dataset=EvalDatasetCfg(
        dataset_type='mp3d',
        dataset_settings={
            'base_data_dir': 'data/vln_pe/raw_data/r2r',
            'split_data_types': ['val_unseen'],
            'filter_stairs': True,
        },
    ),
)
```

--------------------------------

### Test Submission Docker Image Locally

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Quickly test your built Docker image locally with a mini split of the R2R dataset. This also verifies public access to your image registry.

```bash
$ docker logout
$ docker run --name internnav-test -it --gpus all --network host \
  -e "ACCEPT_EULA=Y" \
  -e "PRIVACY_CONSENT=Y" \
  -e "DISPLAY=${DISPLAY}" \
  --entrypoint /bin/bash \
  -w /root/InternNav \
  -v /tmp/.X11-unix/:/tmp/.X11-unix \
  -v ${PWD}/data:/root/InternNav/data \
  -v ${PWD}/interiornav_data:/root/InternNav/interiornav_data \
  your-registry/internnav-custom:v1 \
  -c "challenge/start_eval_iros.sh --config scripts/eval/configs/challenge_cfg.py --split mini; exec /bin/bash"

```

--------------------------------

### Clone Embodiments Dataset

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Clone the Embodiments dataset into the data/Embodiments directory.

```bash
# Embodiments
$ git clone https://huggingface.co/datasets/InternRobotics/Embodiments data/Embodiments
```

--------------------------------

### Initialize and Use DialogAgent

Source: https://context7.com/internrobotics/internnav/llms.txt

Configures and utilizes the DialogAgent for interactive instance-goal navigation. This agent employs a vision-language model for navigation actions, pixel-goals, or dialog queries. It merges conversation history with observations for multimodal prompting. The agent interacts with a Habitat environment.

```python
from internnav.agent.dialog_agent import DialogAgent
from internnav.configs.agent import AgentCfg

cfg = AgentCfg(
    model_name='dialog',
    model_settings={
        'task_name': 'dialog_r2r',
        'task': 'dialog_r2r',
        'model_path': 'checkpoints/InternVLA-N1-DualVLN',
        'mode': 'system2',
        'dialog_enabled': True,
        'num_history': 4,
        'resize_h': 336,
        'resize_w': 336,
        'append_look_down': True,
        'max_new_tokens': 512,
        'local_rank': 0,
        'sim_sensors_config': habitat_sensor_cfg,   # habitat sensor config object
    },
)
agent = DialogAgent(cfg)
agent.reset(env=habitat_env)   # binds ShortestPathFollower and resets state

obs, info = habitat_env.reset(), {'step': 0, 'episode_instruction': 'Find the red chair.', 'output_path': './log.txt', 'agent state': agent_state}
for step in range(200):
    info['step'] = step
    action = agent.step(obs, env=habitat_env, info=info)
    # action: int — 0=STOP, 1=FWD, 2=LEFT, 3=RIGHT, 5=LOOK_DOWN, 6=DIALOG, 7=NO_OP
    obs, reward, done, info_env = habitat_env.step(action)
    if done:
        break
```

--------------------------------

### Configure CMA/Seq2Seq Agent

Source: https://context7.com/internrobotics/internnav/llms.txt

Configuration for the CMA/Seq2Seq baseline agent. Requires matching the model_name to a registered agent key. Ensure the ckpt_path points to the correct checkpoint directory.

```python
from internnav.configs.agent import AgentCfg

# CMA / Seq2Seq baseline agent config
cma_cfg = AgentCfg(
    server_host='localhost',
    server_port=8087,
    model_name='cma',                          # must match @Agent.register() key
    ckpt_path='checkpoints/r2r/cma_plus',
    model_settings={
        'env_num': 1,
        'proc_num': 8,
    },
)
```

--------------------------------

### Enable Visualization in Evaluation

Source: https://github.com/internrobotics/internnav/blob/main/scripts/iros_challenge/README.md

Update the evaluation configuration to visualize trajectories. Set 'eval_settings['vis_output']=True' for saved frames and video, and 'env_settings['headless']=False' to open the interactive Isaac Sim window.

```python
eval_settings['vis_output']=True
env_settings['headless']=False
```

--------------------------------

### Import Required Libraries

Source: https://github.com/internrobotics/internnav/blob/main/scripts/notebooks/inference_only_demo.ipynb

Imports necessary Python libraries for the project, including system modules, path manipulation, numerical operations, image handling, and PyTorch. It also adds project paths to sys.path for module imports and initializes the InternVLA-N1 agent.

```python
import sys
import os
import glob
from pathlib import Path

import numpy as np
from PIL import Image
import torch

# Add project path
project_root = Path('../../')
sys.path.insert(0, str(project_root))
sys.path.insert(0, str(project_root / 'src/diffusion-policy'))

from internnav.agent.internvla_n1_agent_realworld import InternVLAN1AsyncAgent
```

--------------------------------

### Convert Datasets (Bash)

Source: https://context7.com/internrobotics/internnav/llms.txt

Converts VLN-CE trajectory datasets to LeRobot parquet format. Supports multi-threaded episode processing for efficiency. Specify the data directory, repository name, and datasets to convert.

```bash
python scripts/dataset_converters/vlnce2lerobot.py \
    --data_dir /data/streamvln \
    --repo_name vln_ce_lerobot \
    --datasets RxR \
    --num_threads 10 \
    --start_index 0 \
    --end_index 5000
```

```bash
python scripts/dataset_converters/vlnce2lerobot.py \
    --data_dir /data/streamvln \
    --repo_name vln_ce_lerobot \
    --datasets R2R \
    --num_threads 16
```