### Start Workspace Capability

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx

This snippet shows how to easily start a workspace, which provides a sandboxed shell, and have it automatically published as an 'ssh' capability when the environment serves. This is a one-line setup for a common use case.

```python
from hud.environment import Environment

env = Environment(name="coder")
env.workspace("workspace")   # publishes "shell" (ssh/2) when the env serves
```

--------------------------------

### Start Virtual Display and VNC Server

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx

Sets up a virtual framebuffer (`Xvfb`) and a VNC server (`x11vnc`) to provide remote desktop access. Requires `xvfb` and `x11vnc` to be installed (`apt install xvfb x11vnc`). The VNC server listens on port 5900 + display number.

```python
import asyncio

from hud.capabilities import Capability
from hud.environment import Environment

env = Environment(name="desktop")
_procs: tuple | None = None

@env.initialize
async def _up():
    global _procs
    if _procs is None:
        xvfb = await asyncio.create_subprocess_exec(
            "Xvfb", ":0", "-screen", "0", "1280x1024x24",
        )
        await asyncio.sleep(0.5)               # let the X server come up first
        vnc = await asyncio.create_subprocess_exec(
            "x11vnc", "-display", ":0", "-rfbport", "5900",
            "-localhost", "-forever", "-nopw",
        )
        await asyncio.sleep(1.0)               # wait until VNC is ready
        _procs = (xvfb, vnc)
    env.add_capability(Capability.rfb(name="screen", url="rfb://127.0.0.1", display=0))

@env.shutdown
async def _down():
    global _procs
    if _procs:
        for p in reversed(_procs):
            p.terminate()
            await p.wait()
        _procs = None

```

--------------------------------

### Install Cookbook Environment

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/fireworks-rl-training/README.md

Install the necessary dependencies for the isolated cookbook environment using uv sync.

```bash
uv sync --pre
```

--------------------------------

### Manually Manage Workspace Lifecycle

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx

This example demonstrates how to manually control the lifecycle of a Workspace, including starting it and publishing its 'shell' capability. This approach offers more control than the automatic publishing method.

```python
from hud.environment import Environment, Workspace

env = Environment(name="coder")
ws = Workspace("workspace", host="127.0.0.1", port=0)   # port 0 → ephemeral

@env.initialize
async def _up():
    await ws.start()                          # binds, generates keys; idempotent
    env.add_capability(ws.capability("shell"))

@env.shutdown
async def _down():
    await ws.stop()
```

--------------------------------

### List, Start, and Grade Tasks

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/cli.mdx

Use `hud task list` to see available tasks. Start a task with `hud task start <task>`, which outputs a prompt. Grade a task with `hud task grade <task>` by providing an answer.

```bash
hud task list                          # what tasks are exposed
hud task start fix_bug                 # -> the prompt (stdout)
hud task grade fix_bug --answer "..."  # -> the reward (stdout)
```

--------------------------------

### Install Development Dependencies with uv

Source: https://github.com/hud-evals/hud-python/blob/main/CONTRIBUTING.md

Installs development dependencies using uv after cloning the repository. Ensure you have uv installed.

```bash
cd hud-python
uv sync --extra dev
```

--------------------------------

### Install Robot Capability with UV

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx

Install the hud-python package with the robot extra using uv.

```bash
uv add 'hud-python[robot]'
```

--------------------------------

### Initiate Training

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/fireworks-rl-training/README.md

Start the training process after successful calibration. This command uses the direct Training API managed service path.

```bash
uv run train.py --steps 5 --groups-per-step 8 --rollouts-per-prompt 8 --parallelism 32
```

--------------------------------

### Setup Environment Variables

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/fireworks-rl-training/README.md

Ensure the Fireworks API key and account ID are set in the .env file for authentication.

```bash
FIREWORKS_API_KEY=...
FIREWORKS_ACCOUNT_ID=...
```

--------------------------------

### Install HUD Python CLI

Source: https://github.com/hud-evals/hud-python/blob/main/README.md

Installs the HUD Python command-line interface using uv. This is the recommended installation method.

```bash
# Install the CLI (recommended)
uv tool install hud-python --python 3.12
```

--------------------------------

### Wrap BrowserUseAgent with Configuration

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/extending.mdx

Instantiate and run the BrowserUseAgent, configuring it with BrowserUseConfig. This example demonstrates how to use a pre-built agent for browser automation tasks.

```python
from hud.agents.browser_use import BrowserUseAgent
from hud.agents.types import BrowserUseConfig

agent = BrowserUseAgent(BrowserUseConfig(model="claude-sonnet-4-5", max_steps=25))
job = await my_browser_task().run(agent)
```

--------------------------------

### Install Robot Capability with Pip

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx

Install the hud-python package with the robot extra using pip.

```bash
pip install 'hud-python[robot]'
```

--------------------------------

### Environment Setup and Task Definition

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/cookbooks/ops-diagnostics.mdx

Sets up the workspace, seeds log files, and defines the 'diagnose' task. This snippet includes the agent's prompt and the LLM grader configuration for evaluating the diagnosis.

```python
from pathlib import Path

from hud.environment import Environment
from hud.graders import LLMJudgeGrader

ROOT = Path("/workspace/incident")
env = Environment(name="ops-diagnostics")
env.workspace("/workspace")

@env.initialize
async def _seed():
    ROOT.mkdir(parents=True, exist_ok=True)
    (ROOT / "api.log").write_text(
        "12:01 INFO  request /checkout ok 120ms\n"
        "12:02 WARN  db pool wait 1400ms\n"
        "12:03 ERROR /checkout 503 upstream timeout\n"
    )
    (ROOT / "db.log").write_text(
        "12:02 connections=100/100 saturated\n"
        "12:02 slow query: SELECT * FROM carts (no index on user_id)\n"
    )
    (ROOT / "deploy.log").write_text("11:58 deployed v412: 'remove cart index migration'\n")

@env.template()
async def diagnose():
    answer = yield (
        "Checkout started returning 503s at 12:03. The logs and deploy history are "
        "in the incident/ directory of your workspace. What is the root cause, and "
        "what's the evidence?"
    )
    result = await LLMJudgeGrader.grade(
        weight=1.0,
        answer=answer,
        question="Root cause of the checkout 503s",
        criteria=[
            "Identifies the removed cart index (deploy v412) as the root cause",
            "Connects DB pool saturation and the slow cart query to the 503s",
            ("Cites specific log evidence rather than guessing", 2.0),
        ],
    )
    yield result.value

tasks = [diagnose()]

```

--------------------------------

### Run Environment Locally with Docker

Source: https://github.com/hud-evals/hud-python/blob/main/README.md

Starts a Docker container for the built environment and allows interaction with tasks.

```bash
docker run -d --name run1 my-env
docker exec run1 hud task start fix_bug
docker exec run1 hud task grade fix_bug --answer "…"
docker rm -f run1
```

--------------------------------

### Install hud-python using uv

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/start/quickstart.mdx

Install the hud-python package using the uv tool for Python 3.12.

```bash
uv tool install hud-python --python 3.12
```

--------------------------------

### v5 Environment Setup with Tools

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/more/migrate-v6.mdx

In v5, you explicitly added tools like BashTool and EditTool to the environment. Scenarios were defined using the @env.scenario decorator.

```python
from hud import Environment
from hud.tools import BashTool, EditTool
from hud.native import BashGrader

env = Environment("coder")
env.add_tool(BashTool())
env.add_tool(EditTool())

@env.scenario("fix-tests")
async def fix_tests(target: str = "tests/"):
    answer = yield f"Make the tests in {target} pass."
    yield await BashGrader.grade(command=f"pytest {target} -q")
```

--------------------------------

### Install HUD Docs Skill (CLI)

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/start/index.mdx

For AI agents, install the HUD docs skill using npx to ensure you are using the current v6 API. This skill helps catch potential issues and keeps your development aligned with the latest specifications.

```bash
npx skills add https://docs.hud.ai
```

--------------------------------

### Run Loaded Harbor Tasks

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/harbor-convert.mdx

Once Harbor tasks are loaded into a Taskset, they can be run by supplying an agent and a runtime. This example shows how to initiate a task run using a specified runtime.

```python
from hud import Runtime

job = await taskset.run(agent, runtime=Runtime("tcp://127.0.0.1:8765"))
```

--------------------------------

### Synchronize Dependencies and Run Tests

Source: https://github.com/hud-evals/hud-python/blob/main/AGENTS.md

Use `uv sync` to install development dependencies and `uv run pytest` to execute tests. Ensure you are in the repository root.

```bash
uv sync --extra dev
uv run pytest -q
```

--------------------------------

### Serve Custom Tools with FastMCP

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx

This snippet illustrates how to serve custom tools using FastMCP, exposing them via an HTTP transport under the '/mcp' path. It includes starting the server, defining a tool, and publishing the MCP capability.

```python
import asyncio

from fastmcp import FastMCP

from hud.capabilities import Capability
from hud.environment import Environment

server = FastMCP(name="tools")

@server.tool
def add(a: int, b: int) -> int:
    """Add two integers."""
    return a + b

env = Environment(name="calc")
_task: asyncio.Task | None = None

@env.initialize
async def _up():
    global _task
    if _task is None:                          # idempotent
        _task = asyncio.create_task(
            server.run_async(transport="http", host="127.0.0.1", port=8040)
        )
        await asyncio.sleep(1.0)               # wait until the server is ready
    env.add_capability(Capability.mcp(name="tools", url="http://127.0.0.1:8040/mcp"))

@env.shutdown
async def _down():
    global _task
    if _task is not None:
        _task.cancel()
        _task = None
```

--------------------------------

### Synchronize Python Dependencies

Source: https://github.com/hud-evals/hud-python/blob/main/CLAUDE.md

Use `uv sync` to install development dependencies. Ensure you are running commands from the repository root.

```bash
uv sync --extra dev
```

--------------------------------

### Environment and Workspace Setup

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx

Sets up a HUD environment and configures a workspace with specific network and mount settings. The workspace is a directory served by a bwrap-isolated SSH server.

```python
from hud.environment import Environment, Mount

env = Environment(name="coder")
env.workspace(
    "/workspace",
    network=True,
    mounts=[Mount("ro", src="/data", dst="/data")],
)
```

--------------------------------

### Install Dependencies in Dockerfile

Source: https://github.com/hud-evals/hud-python/blob/main/docs/skill.md

Explicitly declare all binary dependencies required by your `@env.initialize` hook in the Dockerfile. This prevents runtime errors due to missing tools.

```dockerfile
RUN apt-get update && apt-get install -y --no-install-recommends \
        git curl ca-certificates bubblewrap \
    && rm -rf /var/lib/apt/lists/*
RUN pip install uv   # if your initialize hook calls uv
```

--------------------------------

### Run A2A Chat Server

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/a2a-chat/README.md

Starts the A2A server to serve the bundled chat task. This command should be run in the first terminal.

```bash
uv run server.py
```

--------------------------------

### Build Robot Environment Docker Image

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/cookbooks/robot-benchmark.mdx

Builds the Docker image for the robot environment. Ensure you are in the parent directory of both 'demos/' and 'hud-python/' for local SDK installation.

```bash
docker build -f demos/inventory/envs/libero/Dockerfile -t hud-libero-env .

```

--------------------------------

### Sim Process Side: Serving a Robot Endpoint

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx

This snippet demonstrates the simulation process side setup. It wraps a bridge with RobotEndpoint and serves it, ensuring all simulation interactions occur on the main thread using MainThreadSimRunner. This is essential for simulators like Isaac Sim.

```python
import asyncio
from hud.environment.robot import RobotEndpoint, MainThreadSimRunner

async def main():
    bridge = MySimBridge(sim_runner=MainThreadSimRunner())   # sim touches run on main
    server = await RobotEndpoint(bridge).serve("127.0.0.1", 9100)
    await server.wait_closed()

asyncio.run(main())   # launched on the main thread the sim owns
```

--------------------------------

### Start a New Job for Multiple Runs

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/tasks.mdx

Initiate a new job using `Job.start` to group multiple task runs under a single identifier, such as for training sessions or multi-turn conversations. All subsequent runs passed with the `job=` argument will be associated with this job.

```python
from hud import Job

job = await Job.start("grpo-session", group=8)
for step in range(epochs):
    await ts.run(agent, runtime=LocalRuntime("env.py"), job=job)   # all runs accumulate here
```

--------------------------------

### Environment Setup with Buggy Code and Tests

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/cookbooks/coding-agent.mdx

Initializes a HUD environment with a workspace for the agent and a separate directory for authoritative checks. It seeds a buggy Python module and a test file in the agent's workspace, and also places an identical copy of the test file in the grader's accessible directory.

```python
from pathlib import Path

from hud.environment import Environment
from hud.graders import BashGrader

ROOT = Path("workspace").resolve()     # the agent's directory
CHECKS = Path("checks").resolve()      # grader-only, outside the workspace

TEST = "from calc import add\n\ndef test_add():\n    assert add(2, 3) == 5\n"

env = Environment(name="coder")
env.workspace(ROOT)

@env.initialize
async def _seed():
    (ROOT / "calc.py").write_text("def add(a, b):\n    return a - b\n")   # bug
    (ROOT / "test_calc.py").write_text(TEST)          # the agent's copy
    CHECKS.mkdir(exist_ok=True)
    (CHECKS / "test_calc.py").write_text(TEST)        # the authoritative copy

@env.template()
async def fix_add(target: str = "test_calc.py"):
    yield f"There's a failing test in {target} in your workspace. Find and fix the bug so the test passes."
    result = await BashGrader.grade(
        weight=1.0,
        command=f"python -m pytest {CHECKS / target} -q",
        cwd=str(ROOT),
    )
    yield result.value

tasks = [fix_add()]

```

--------------------------------

### Run LLM-Fronted A2A Chat Client

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/a2a-chat/README.md

Starts an LLM-fronted A2A client that uses an OpenAI model to decide when to call the A2A agent. This command should be run in the second terminal.

```bash
uv run llm_client.py
```

--------------------------------

### Initialize a New Environment Package

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/cli.mdx

Scaffolds a new environment package. Use presets to download starter environments from GitHub or omit for a minimal local scaffold.

```bash
hud init                          # pick a template → ./<template>
hud init my-env                   # pick a template (or minimal scaffold) → ./my-env
hud init my-env --preset browser  # download the "browser" starter into ./my-env
hud init --preset cua             # download the "cua" starter into ./cua-template
hud init my-env --dir envs        # create ./envs/my-env
```

--------------------------------

### Initialize New Environment

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/start/quickstart.mdx

Scaffold a new, complete, and runnable environment for your project.

```bash
hud init my-env
cd my-env
```

--------------------------------

### Launch and Publish a Daemon

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx

This snippet demonstrates the four-step process for launching a daemon, waiting for it to become available, and then publishing its capability within the environment. It's crucial to wait until the daemon is listening before publishing to ensure availability.

```python
import asyncio

from hud.environment import Environment
from hud.capabilities import Capability

PORT = 8000

@env.initialize
async def _up():
    start_daemon(host="127.0.0.1", port=PORT)            # 1. launch it (subprocess / task)
    await wait_until_listening("127.0.0.1", PORT)         # 2. block until it accepts connections
    env.add_capability(Capability.mcp(name="tools",      # 3. publish its address
                                      url=f"http://127.0.0.1:{PORT}/mcp"))

@env.shutdown
async def _down():
    stop_daemon()                                        # 4. tear it down with the env
```

--------------------------------

### Install HUD Python in Development Mode

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/more/contributing.mdx

Install the HUD Python package in editable mode with development dependencies. Also, install the HUD Python tool for development.

```bash
uv pip install -e ".[dev]"
uv tool install --force --from "." hud-python --refresh
```

--------------------------------

### Deploy and Sync Environment

Source: https://github.com/hud-evals/hud-python/blob/main/docs/skill.md

Commands to package and deploy a local environment to the platform, and then upload tasks.

```bash
hud deploy .
hud sync tasks my-taskset env.py
```

--------------------------------

### Install HUD Python as a Library

Source: https://github.com/hud-evals/hud-python/blob/main/README.md

Installs the hud-python package using pip. Use this if you need to integrate HUD into your existing Python projects.

```bash
# …or as a library
pip install hud-python
```

--------------------------------

### Get Final Agent Citations from Trace

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/types.mdx

Use `trace.final` to retrieve the `citations` from the most recent `AgentStep` in a trace. This is helpful for getting the final answer's metadata.

```python
citations = trace.final(lambda s: s.citations if isinstance(s, AgentStep) else None)
```

--------------------------------

### Creating and Loading Tasksets

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/tasks.mdx

Demonstrates various ways to create and load Tasksets, including defining them in code, loading from Python files, JSON/JSONL files, and fetching from the HUD platform.

```python
from hud import Taskset

# in code - the authoring case
ts = Taskset("letters", [count_letter(word="strawberry"), count_letter(word="raspberry")])

# from a Python source (.py file or directory) - scans it for Task / Taskset objects
ts = Taskset.from_file("tasks.py")

# from a data file (.json / .jsonl) - portable rows, no source needed
ts = Taskset.from_file("tasks.jsonl")

# from the platform - by taskset name or id (uses HUD_API_KEY)
ts = Taskset.from_api("SheetBench-50")
```

--------------------------------

### Launch Chromium with DevTools Port

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx

Launches a headless Chromium instance with a DevTools port enabled for remote debugging. Ensure Playwright is installed (`playwright install chromium`). A temporary user data directory is created for isolation.

```python
import asyncio
import tempfile

from playwright.async_api import async_playwright

from hud.capabilities import Capability
from hud.environment import Environment

env = Environment(name="browser")
_proc: asyncio.subprocess.Process | None = None

@env.initialize
async def _up():
    global _proc
    if _proc is None:
        pw = await async_playwright().start()
        _proc = await asyncio.create_subprocess_exec(
            pw.chromium.executable_path,
            "--headless=new",
            "--remote-debugging-port=9222",
            "--remote-debugging-address=127.0.0.1",
            "--no-first-run",
            "--user-data-dir=" + tempfile.mkdtemp(prefix="cdp_"),
        )
        await asyncio.sleep(1.0)               # wait until Chromium is ready
    env.add_capability(Capability.cdp(name="browser", url="http://127.0.0.1:9222"))

@env.shutdown
async def _down():
    global _proc
    if _proc is not None:
        _proc.terminate()
        await _proc.wait()
        _proc = None

```

--------------------------------

### DaytonaRuntime Constructor

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/runtime.mdx

Initialize a DaytonaRuntime to boot from a specified Daytona snapshot or build an image if the snapshot is missing. Configure working directory, serving port, and SSH tunnel settings for accessing the runtime.

```python
DaytonaRuntime(snapshot_name=None, *, image=None, command=None, workdir="/app", port=8765, ssh_host="ssh.app.daytona.io", ssh_expires_minutes=1440, runtime_config=None)
```

--------------------------------

### Set up MCP Capability with Tools

Source: https://github.com/hud-evals/hud-python/blob/main/docs/skill.md

This snippet illustrates setting up an MCP (Message Communication Protocol) capability with in-process tools. It defines a tool (`do_thing`) and exposes it via an MCP server, then adds this capability to the environment. Use this when the agent needs to call specific functions or services directly within the environment.

```python
import asyncio, contextlib, socket
from fastmcp import FastMCP
from hud.capabilities import Capability
from hud.environment import Environment

server = FastMCP(name="my-env")
env = Environment(name="my-env")
_task: asyncio.Task | None = None

@server.tool
async def do_thing(x: int) -> str:
    return f"result: {x}"

@env.initialize
async def _start() -> None:
    global _task
    if _task is None:
        s = socket.socket(); s.bind(("", 0)); port = s.getsockname()[1]; s.close()
        _task = asyncio.create_task(
            server.run_async(transport="http", host="127.0.0.1", port=port, show_banner=False)
        )
        await asyncio.sleep(0.3)
        env.add_capability(Capability.mcp(name="tools", url=f"http://127.0.0.1:{port}/mcp"))

@env.shutdown
async def _stop() -> None:
    global _task
    if _task is not None:
        _task.cancel()
        with contextlib.suppress(Exception): await _task
        _task = None

@env.template()
async def my_task(param: str = "default"):
    answer = yield f"Use the do_thing tool with x=42. Param hint: {param}"
    yield 1.0 if answer and "result: 42" in answer else 0.0
```

--------------------------------

### Serve Environment Locally

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/environment.mdx

Use the 'hud serve' command to serve the environment locally on a default port.

```bash
hud serve env.py     # serve locally on tcp://127.0.0.1:8765 while you iterate
```

--------------------------------

### Run Plain A2A Chat Client

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/a2a-chat/README.md

Starts a minimal A2A client to communicate with the server. This command should be run in the second terminal.

```bash
uv run client.py
```

--------------------------------

### Initialize TrainingClient

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/training.mdx

Instantiate the TrainingClient with the model slug or ID and optional API credentials and URLs.

```python
TrainingClient(model, *, api_key=None, base_url=None, api_url=None)
```

--------------------------------

### List Recent Jobs

Source: https://github.com/hud-evals/hud-python/blob/main/docs/skill.md

Lists recent jobs in the HUD Evals system. Useful for getting an overview of ongoing or completed tasks.

```bash
hud jobs                    # list recent jobs
```

--------------------------------

### Get Active Checkpoint

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/training.mdx

Retrieve the currently active checkpoint, which represents the weights served by the gateway. Returns `None` if only base weights are available.

```python
head()
```

--------------------------------

### Run the Training Loop

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/training.mdx

This Python script demonstrates the core training loop. It involves rolling out batches of data using an agent, collecting rewards, and then using the TrainingClient to nudge the model weights based on these rewards. Ensure 'return_token_ids' is enabled for the agent to provide necessary training data.

```python
from hud import TrainingClient
from hud.agents import create_agent
from hud.eval import Job

# return_token_ids tells the gateway to send back the token ids + logprobs training needs
agent = create_agent("arith-rl", completion_kwargs={"extra_body": {"return_token_ids": True}})
trainer = TrainingClient("arith-rl")
taskset, runtime = ...   # your taskset + where rollouts run (see Tasks / Run & deploy)

session = await Job.start("arith-rl", group=8)   # 8 rollouts per task
for _ in range(10):
    start = len(session.runs)
    await taskset.run(agent, runtime=runtime, job=session)   # roll out a batch
    batch = session.runs[start:]
    await trainer.step(batch, learning_rate=1e-5, group_size=8)   # nudge the weights
```

--------------------------------

### Train with Managed Calibration Backend

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/fireworks-rl-training/README.md

Run training with the calibration backend set to 'managed'. This provisions the same resources for calibration as for training.

```bash
uv run train.py --calibration-backend managed --steps 5 --groups-per-step 8 --rollouts-per-prompt 8 --parallelism 32
```

--------------------------------

### Retrieve Checkpoint Tree

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/training.mdx

Get the checkpoint tree, where each node contains rewards, loss, counts, and a metrics blob. This is useful for inspecting training history.

```python
checkpoints()
```

--------------------------------

### Deploy Environment to HUD Platform

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/build/overview.mdx

Builds and registers your environment image on HUD. Use this to deploy your environment for use on the HUD platform.

```bash
hud deploy

```

```bash
hud sync tasks my-taskset

```

--------------------------------

### RuntimeConfig Constructor

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/runtime.mdx

Demonstrates how to create a RuntimeConfig object to specify image, resources, and limits for a container-based runtime.

```APIDOC
## RuntimeConfig

`RuntimeConfig` carries the construction hints a container-based runtime needs: which image, how much hardware, and what timeouts. Set it on the runtime (`runtime_config=`) or per row on
[`Task.runtime_config`](/v6/core/tasks#the-task-row); the runtime merges the two and applies what it supports.

```python
from hud.eval import RuntimeConfig, RuntimeResources, RuntimeGPU, RuntimeLimits

RuntimeConfig(
    image="my-env",
    resources=RuntimeResources(cpu=4, memory_mb=8192, gpu=RuntimeGPU(type="A100", count=1)),
    limits=RuntimeLimits(startup_timeout_s=300, run_timeout_s=1800),
)
```

| Field | Description |
|-------|-------------|
| `image` | Image to run. |
| `resources` | `RuntimeResources(cpu, memory_mb, gpu=RuntimeGPU(type, count))`. |
| `limits` | `RuntimeLimits(startup_timeout_s, run_timeout_s)`. |

Support differs per runtime: `DockerRuntime`, `ModalRuntime`, and `DaytonaRuntime` accept it (Docker ignores `limits`; Daytona ignores `run_timeout_s` and resource overrides when booting from a snapshot). `LocalRuntime` and `HUDRuntime` reject a per-task `runtime_config`.
```

--------------------------------

### Running Tasks and Tasksets

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/tasks.mdx

Shows how to execute tasks or entire tasksets using a specified agent and runtime. It covers running a single task and running a taskset with options for grouping and concurrency.

```python
from hud import LocalRuntime

# one task
job = await count_letter(word="strawberry").run(agent, runtime=LocalRuntime("env.py"))

# a whole taskset: 8 rollouts per task, capped concurrency
job = await ts.run(agent, runtime=LocalRuntime("env.py"), group=8, max_concurrent=10)
print(job.reward)
```

--------------------------------

### Deploy Environment to HUD

Source: https://github.com/hud-evals/hud-python/blob/main/README.md

Builds and registers your environment on HUD. This is the recommended first step for remote deployment.

```bash
hud deploy
```

--------------------------------

### Target Sidebar Structure

Source: https://github.com/hud-evals/hud-python/blob/main/docs/docs-restructure.md

This outlines the planned navigation structure for the reorganized documentation, separating content into 'Start here', 'Build', 'Reference', 'Advanced', 'Cookbooks', and 'More' sections.

```markdown
Start here   start/index, start/quickstart
Build        build/index (spine), protocol, environments, tasks, run, train, advice
Reference    environment, tasks, capabilities, graders, agents, runtime, robots, training, types, cli
Advanced     integrations, subagents, chat, patterns, harbor-convert
Cookbooks    coding-agent, ops-diagnostics, a2a-chat, robot-benchmark
More         faq, migrate-v6, contributing
```

--------------------------------

### Initialize HUD Environment

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/environment.mdx

Instantiate the Environment object with a name, version, and optional capabilities.

```python
from hud import Environment

env = Environment(name="environment", version="0.0.1", capabilities=None)
```

--------------------------------

### Install hud-python in a project venv

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/more/faq.mdx

When your environment imports packages not found by the global CLI, add hud-python to your project's virtual environment and run commands from within it.

```bash
uv add hud-python
uv run hud eval tasks.py claude
```

--------------------------------

### Run Robot Benchmark with Persistent Container

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/cookbooks/robot-benchmark.mdx

Starts a long-lived Docker container for the robot environment. This is useful for heavy simulations or sweeps to avoid per-episode container boot times.

```bash
docker run -d --name libero-env -p 8765:8765 hud-libero-env
```

--------------------------------

### ModalRuntime Constructor

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/runtime.mdx

Instantiate a ModalRuntime for deploying environments on Modal. Use either a published image name or an Image object for building. Configure app name, serving port, command, and environment variables as needed.

```python
ModalRuntime(image_name=None, *, image=None, command=None, app_name="hud-envs", port=8765, runtime_config=None, env_vars=None)
```

--------------------------------

### Declare Environment with Workspace Capability

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/environment.mdx

Initialize an Environment and automatically set up a workspace directory accessible via SSH.

```python
from hud import Environment

env = Environment(name="coder")
env.workspace("workspace")
```

--------------------------------

### Get Raw Event List for Trace

Source: https://github.com/hud-evals/hud-python/blob/main/docs/skill.md

Retrieves the raw event list for a trace in JSON format. Useful for programmatic filtering with tools like jq. Requires a trace ID.

```bash
hud trace <trace-id> --json # raw event list (pipe to jq for filtering)
```

--------------------------------

### Minimal Robot Contract Example

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx

This JSON defines a contract with one camera observation, a state vector observation, and a single action. It specifies the control rate and the roles and names for each feature.

```json
{
  "control_rate": 10,
  "features": {
    "observation/image": {
      "role": "observation",
      "type": "rgb",
      "names": ["height", "width", "channel"]
    },
    "observation/state": {
      "role": "observation",
      "names": ["eef_x", "eef_y", "eef_z", "axis_x", "axis_y", "axis_z", "grip_l", "grip_r"]
    },
    "action": {
      "role": "action",
      "names": ["dx", "dy", "dz", "drx", "dry", "drz", "gripper"]
    }
  }
}
```

--------------------------------

### Stateful Environment with Initialization and Shutdown Hooks

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/extending.mdx

Manage resources like databases or services that tasks depend on using `@env.initialize` and `@env.shutdown` decorators. These hooks run once around serving, ensuring consistent environment state.

```python
import asyncpg

db: asyncpg.Connection | None = None

@env.initialize
async def _start():
    global db
    db = await asyncpg.connect("postgresql://localhost/app")

@env.shutdown
async def _stop():
    if db is not None:
        await db.close()
```

--------------------------------

### Environment Side: Connecting to a Remote Robot Endpoint

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx

This snippet shows the environment side setup for a remote simulation. It connects to a RobotEndpoint serving in a separate process and adds its capabilities to the environment. Use this when your simulator needs its own process.

```python
from hud import Environment
from hud.environment.robot import RobotEndpoint

env = Environment(name="isaac-sim")
endpoint = RobotEndpoint.remote("127.0.0.1", 9100)   # a handle on the bridge in the sim process

@env.initialize
async def _up():
    await endpoint.connect()    # retries until the sim process is serving
    await endpoint.start()
    env.add_capability(await endpoint.capability(contract=CONTRACT))

@env.shutdown
async def _down():
    await endpoint.close()      # drops the link; does not stop the sim

@env.template()
async def pick_and_place(task_id: str, seed: int = 0):
    prompt = yield {"prompt": await endpoint.reset(task_id=task_id, seed=seed)}
    yield await endpoint.result()
```

--------------------------------

### Initialize and Control Robot Endpoint

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx

Set up the environment and robot endpoint. The endpoint is used to start and stop the bridge, publish capabilities, reset episodes, and retrieve results. The initialize and shutdown decorators manage the endpoint's lifecycle.

```python
from hud import Environment
from hud.environment.robot import RobotEndpoint

env = Environment(name="my-sim")
endpoint = RobotEndpoint(MySimBridge())  # the env drives the bridge only through the endpoint

@env.initialize
async def _up():
    await endpoint.start()
    env.add_capability(await endpoint.capability(contract=CONTRACT))

@env.shutdown
async def _down():
    await endpoint.stop()

@env.template()
async def pick_and_place(task_id: str, seed: int = 0):
    prompt = yield {"prompt": await endpoint.reset(task_id=task_id, seed=seed)}
    yield await endpoint.result()  # {"score", "success", "total_reward"}
```

--------------------------------

### Serve Chat via FastAPI

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/extending.mdx

Integrate HUD's Chat runner into a FastAPI application to serve chat interactions via an API endpoint. This example shows a basic POST endpoint for sending messages and receiving responses.

```python
app = FastAPI()
chat = Chat(assistant(messages=[]), create_agent("claude-sonnet-4-5"))

@app.post("/api/chat")
async def chat_endpoint(message: str):
    result = await chat.send(message)
    return {"response": result.content}
```

--------------------------------

### Example Task Returning an Integer

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/types.mdx

This snippet demonstrates a task that returns an integer. The `Answer[int]` type is used, and the task yields a grading score based on whether the parsed integer answer matches the expected length of the word.

```python
from hud.environment import Answer

@env.template(returns=int)
async def count(word: str = "strawberry"):
    answer: Answer[int] = yield f"How many letters in '{word}'?"
    yield 1.0 if answer.content == len(word) else 0.0
```

--------------------------------

### DockerRuntime Constructor

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/runtime.mdx

Create a DockerRuntime to run a specified Docker image. Configure the image name, the internal port, optional Docker run arguments, and a RuntimeConfig for finer control over resources and timeouts.

```python
DockerRuntime(image=None, *, port=8765, run_args=(), runtime_config=None)
```

--------------------------------

### Instantiate a Task

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/tasks.mdx

Creates a concrete Task instance by calling a defined template with specific arguments. This task is ready to be run but has not been executed yet.

```python
task = count_letter(word="raspberry")   # a Task row, not yet run
```

--------------------------------

### Generic Runtime Constructor

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/runtime.mdx

Initialize a generic Runtime by providing the control-channel URL of an already running substrate. Additional parameters like authentication tokens or sandbox IDs can be passed via the `params` argument.

```python
Runtime(url, params=..., config=...)
```

--------------------------------

### Define a Custom Runtime

Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/runtime.mdx

This snippet shows how to create a custom asynchronous context manager function that starts a sandbox, yields a Modal Runtime object with its control-channel URL, and ensures the sandbox is terminated afterward. Use this when you need to integrate with your own infrastructure for task execution.

```python
from contextlib import asynccontextmanager
from hud import Runtime

@asynccontextmanager
async def my_runtime(task):
    sandbox = await start_my_sandbox(image="my-env")   # your infra brings it up
    try:
        yield Runtime(f"tcp://{sandbox.host}:{sandbox.port}")
    finally:
        await sandbox.terminate()                       # ...and tears it down

await TASKS.run(agent, runtime=my_runtime)
```

--------------------------------

### RL Training Loop

Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/rl-training/README.md

This snippet shows the core training loop. It loads tasksets and runtimes, starts a job session, and iterates through steps. In each step, it rolls out the current agent weights, collects the runs from the session, and then trains the agent using the collected batch.

```python
taskset, runtime = load_taskset_and_runtime()   # deployed+remote, or local
session = await Job.start("rl", group=8)         # one job spans the session
for step in range(steps):
    start = len(session.runs)
    await taskset.run(agent, runtime=runtime, job=session)   # roll out current weights
    batch = session.runs[start:]                             # this step's runs
    await trainer.step(batch, learning_rate=1e-5, group_size=8)   # train + promote
```