### Start Workspace Capability Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx This snippet shows how to easily start a workspace, which provides a sandboxed shell, and have it automatically published as an 'ssh' capability when the environment serves. This is a one-line setup for a common use case. ```python from hud.environment import Environment env = Environment(name="coder") env.workspace("workspace") # publishes "shell" (ssh/2) when the env serves ``` -------------------------------- ### Start Virtual Display and VNC Server Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx Sets up a virtual framebuffer (`Xvfb`) and a VNC server (`x11vnc`) to provide remote desktop access. Requires `xvfb` and `x11vnc` to be installed (`apt install xvfb x11vnc`). The VNC server listens on port 5900 + display number. ```python import asyncio from hud.capabilities import Capability from hud.environment import Environment env = Environment(name="desktop") _procs: tuple | None = None @env.initialize async def _up(): global _procs if _procs is None: xvfb = await asyncio.create_subprocess_exec( "Xvfb", ":0", "-screen", "0", "1280x1024x24", ) await asyncio.sleep(0.5) # let the X server come up first vnc = await asyncio.create_subprocess_exec( "x11vnc", "-display", ":0", "-rfbport", "5900", "-localhost", "-forever", "-nopw", ) await asyncio.sleep(1.0) # wait until VNC is ready _procs = (xvfb, vnc) env.add_capability(Capability.rfb(name="screen", url="rfb://127.0.0.1", display=0)) @env.shutdown async def _down(): global _procs if _procs: for p in reversed(_procs): p.terminate() await p.wait() _procs = None ``` -------------------------------- ### Install Cookbook Environment Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/fireworks-rl-training/README.md Install the necessary dependencies for the isolated cookbook environment using uv sync. ```bash uv sync --pre ``` -------------------------------- ### Manually Manage Workspace Lifecycle Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx This example demonstrates how to manually control the lifecycle of a Workspace, including starting it and publishing its 'shell' capability. This approach offers more control than the automatic publishing method. ```python from hud.environment import Environment, Workspace env = Environment(name="coder") ws = Workspace("workspace", host="127.0.0.1", port=0) # port 0 → ephemeral @env.initialize async def _up(): await ws.start() # binds, generates keys; idempotent env.add_capability(ws.capability("shell")) @env.shutdown async def _down(): await ws.stop() ``` -------------------------------- ### List, Start, and Grade Tasks Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/cli.mdx Use `hud task list` to see available tasks. Start a task with `hud task start `, which outputs a prompt. Grade a task with `hud task grade ` by providing an answer. ```bash hud task list # what tasks are exposed hud task start fix_bug # -> the prompt (stdout) hud task grade fix_bug --answer "..." # -> the reward (stdout) ``` -------------------------------- ### Install Development Dependencies with uv Source: https://github.com/hud-evals/hud-python/blob/main/CONTRIBUTING.md Installs development dependencies using uv after cloning the repository. Ensure you have uv installed. ```bash cd hud-python uv sync --extra dev ``` -------------------------------- ### Install Robot Capability with UV Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx Install the hud-python package with the robot extra using uv. ```bash uv add 'hud-python[robot]' ``` -------------------------------- ### Initiate Training Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/fireworks-rl-training/README.md Start the training process after successful calibration. This command uses the direct Training API managed service path. ```bash uv run train.py --steps 5 --groups-per-step 8 --rollouts-per-prompt 8 --parallelism 32 ``` -------------------------------- ### Setup Environment Variables Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/fireworks-rl-training/README.md Ensure the Fireworks API key and account ID are set in the .env file for authentication. ```bash FIREWORKS_API_KEY=... FIREWORKS_ACCOUNT_ID=... ``` -------------------------------- ### Install HUD Python CLI Source: https://github.com/hud-evals/hud-python/blob/main/README.md Installs the HUD Python command-line interface using uv. This is the recommended installation method. ```bash # Install the CLI (recommended) uv tool install hud-python --python 3.12 ``` -------------------------------- ### Wrap BrowserUseAgent with Configuration Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/extending.mdx Instantiate and run the BrowserUseAgent, configuring it with BrowserUseConfig. This example demonstrates how to use a pre-built agent for browser automation tasks. ```python from hud.agents.browser_use import BrowserUseAgent from hud.agents.types import BrowserUseConfig agent = BrowserUseAgent(BrowserUseConfig(model="claude-sonnet-4-5", max_steps=25)) job = await my_browser_task().run(agent) ``` -------------------------------- ### Install Robot Capability with Pip Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx Install the hud-python package with the robot extra using pip. ```bash pip install 'hud-python[robot]' ``` -------------------------------- ### Environment Setup and Task Definition Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/cookbooks/ops-diagnostics.mdx Sets up the workspace, seeds log files, and defines the 'diagnose' task. This snippet includes the agent's prompt and the LLM grader configuration for evaluating the diagnosis. ```python from pathlib import Path from hud.environment import Environment from hud.graders import LLMJudgeGrader ROOT = Path("/workspace/incident") env = Environment(name="ops-diagnostics") env.workspace("/workspace") @env.initialize async def _seed(): ROOT.mkdir(parents=True, exist_ok=True) (ROOT / "api.log").write_text( "12:01 INFO request /checkout ok 120ms\n" "12:02 WARN db pool wait 1400ms\n" "12:03 ERROR /checkout 503 upstream timeout\n" ) (ROOT / "db.log").write_text( "12:02 connections=100/100 saturated\n" "12:02 slow query: SELECT * FROM carts (no index on user_id)\n" ) (ROOT / "deploy.log").write_text("11:58 deployed v412: 'remove cart index migration'\n") @env.template() async def diagnose(): answer = yield ( "Checkout started returning 503s at 12:03. The logs and deploy history are " "in the incident/ directory of your workspace. What is the root cause, and " "what's the evidence?" ) result = await LLMJudgeGrader.grade( weight=1.0, answer=answer, question="Root cause of the checkout 503s", criteria=[ "Identifies the removed cart index (deploy v412) as the root cause", "Connects DB pool saturation and the slow cart query to the 503s", ("Cites specific log evidence rather than guessing", 2.0), ], ) yield result.value tasks = [diagnose()] ``` -------------------------------- ### Run Environment Locally with Docker Source: https://github.com/hud-evals/hud-python/blob/main/README.md Starts a Docker container for the built environment and allows interaction with tasks. ```bash docker run -d --name run1 my-env docker exec run1 hud task start fix_bug docker exec run1 hud task grade fix_bug --answer "…" docker rm -f run1 ``` -------------------------------- ### Install hud-python using uv Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/start/quickstart.mdx Install the hud-python package using the uv tool for Python 3.12. ```bash uv tool install hud-python --python 3.12 ``` -------------------------------- ### v5 Environment Setup with Tools Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/more/migrate-v6.mdx In v5, you explicitly added tools like BashTool and EditTool to the environment. Scenarios were defined using the @env.scenario decorator. ```python from hud import Environment from hud.tools import BashTool, EditTool from hud.native import BashGrader env = Environment("coder") env.add_tool(BashTool()) env.add_tool(EditTool()) @env.scenario("fix-tests") async def fix_tests(target: str = "tests/"): answer = yield f"Make the tests in {target} pass." yield await BashGrader.grade(command=f"pytest {target} -q") ``` -------------------------------- ### Install HUD Docs Skill (CLI) Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/start/index.mdx For AI agents, install the HUD docs skill using npx to ensure you are using the current v6 API. This skill helps catch potential issues and keeps your development aligned with the latest specifications. ```bash npx skills add https://docs.hud.ai ``` -------------------------------- ### Run Loaded Harbor Tasks Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/harbor-convert.mdx Once Harbor tasks are loaded into a Taskset, they can be run by supplying an agent and a runtime. This example shows how to initiate a task run using a specified runtime. ```python from hud import Runtime job = await taskset.run(agent, runtime=Runtime("tcp://127.0.0.1:8765")) ``` -------------------------------- ### Synchronize Dependencies and Run Tests Source: https://github.com/hud-evals/hud-python/blob/main/AGENTS.md Use `uv sync` to install development dependencies and `uv run pytest` to execute tests. Ensure you are in the repository root. ```bash uv sync --extra dev uv run pytest -q ``` -------------------------------- ### Serve Custom Tools with FastMCP Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx This snippet illustrates how to serve custom tools using FastMCP, exposing them via an HTTP transport under the '/mcp' path. It includes starting the server, defining a tool, and publishing the MCP capability. ```python import asyncio from fastmcp import FastMCP from hud.capabilities import Capability from hud.environment import Environment server = FastMCP(name="tools") @server.tool def add(a: int, b: int) -> int: """Add two integers.""" return a + b env = Environment(name="calc") _task: asyncio.Task | None = None @env.initialize async def _up(): global _task if _task is None: # idempotent _task = asyncio.create_task( server.run_async(transport="http", host="127.0.0.1", port=8040) ) await asyncio.sleep(1.0) # wait until the server is ready env.add_capability(Capability.mcp(name="tools", url="http://127.0.0.1:8040/mcp")) @env.shutdown async def _down(): global _task if _task is not None: _task.cancel() _task = None ``` -------------------------------- ### Synchronize Python Dependencies Source: https://github.com/hud-evals/hud-python/blob/main/CLAUDE.md Use `uv sync` to install development dependencies. Ensure you are running commands from the repository root. ```bash uv sync --extra dev ``` -------------------------------- ### Environment and Workspace Setup Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/capabilities.mdx Sets up a HUD environment and configures a workspace with specific network and mount settings. The workspace is a directory served by a bwrap-isolated SSH server. ```python from hud.environment import Environment, Mount env = Environment(name="coder") env.workspace( "/workspace", network=True, mounts=[Mount("ro", src="/data", dst="/data")], ) ``` -------------------------------- ### Install Dependencies in Dockerfile Source: https://github.com/hud-evals/hud-python/blob/main/docs/skill.md Explicitly declare all binary dependencies required by your `@env.initialize` hook in the Dockerfile. This prevents runtime errors due to missing tools. ```dockerfile RUN apt-get update && apt-get install -y --no-install-recommends \ git curl ca-certificates bubblewrap \ && rm -rf /var/lib/apt/lists/* RUN pip install uv # if your initialize hook calls uv ``` -------------------------------- ### Run A2A Chat Server Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/a2a-chat/README.md Starts the A2A server to serve the bundled chat task. This command should be run in the first terminal. ```bash uv run server.py ``` -------------------------------- ### Build Robot Environment Docker Image Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/cookbooks/robot-benchmark.mdx Builds the Docker image for the robot environment. Ensure you are in the parent directory of both 'demos/' and 'hud-python/' for local SDK installation. ```bash docker build -f demos/inventory/envs/libero/Dockerfile -t hud-libero-env . ``` -------------------------------- ### Sim Process Side: Serving a Robot Endpoint Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/advanced/robots.mdx This snippet demonstrates the simulation process side setup. It wraps a bridge with RobotEndpoint and serves it, ensuring all simulation interactions occur on the main thread using MainThreadSimRunner. This is essential for simulators like Isaac Sim. ```python import asyncio from hud.environment.robot import RobotEndpoint, MainThreadSimRunner async def main(): bridge = MySimBridge(sim_runner=MainThreadSimRunner()) # sim touches run on main server = await RobotEndpoint(bridge).serve("127.0.0.1", 9100) await server.wait_closed() asyncio.run(main()) # launched on the main thread the sim owns ``` -------------------------------- ### Start a New Job for Multiple Runs Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/tasks.mdx Initiate a new job using `Job.start` to group multiple task runs under a single identifier, such as for training sessions or multi-turn conversations. All subsequent runs passed with the `job=` argument will be associated with this job. ```python from hud import Job job = await Job.start("grpo-session", group=8) for step in range(epochs): await ts.run(agent, runtime=LocalRuntime("env.py"), job=job) # all runs accumulate here ``` -------------------------------- ### Environment Setup with Buggy Code and Tests Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/cookbooks/coding-agent.mdx Initializes a HUD environment with a workspace for the agent and a separate directory for authoritative checks. It seeds a buggy Python module and a test file in the agent's workspace, and also places an identical copy of the test file in the grader's accessible directory. ```python from pathlib import Path from hud.environment import Environment from hud.graders import BashGrader ROOT = Path("workspace").resolve() # the agent's directory CHECKS = Path("checks").resolve() # grader-only, outside the workspace TEST = "from calc import add\n\ndef test_add():\n assert add(2, 3) == 5\n" env = Environment(name="coder") env.workspace(ROOT) @env.initialize async def _seed(): (ROOT / "calc.py").write_text("def add(a, b):\n return a - b\n") # bug (ROOT / "test_calc.py").write_text(TEST) # the agent's copy CHECKS.mkdir(exist_ok=True) (CHECKS / "test_calc.py").write_text(TEST) # the authoritative copy @env.template() async def fix_add(target: str = "test_calc.py"): yield f"There's a failing test in {target} in your workspace. Find and fix the bug so the test passes." result = await BashGrader.grade( weight=1.0, command=f"python -m pytest {CHECKS / target} -q", cwd=str(ROOT), ) yield result.value tasks = [fix_add()] ``` -------------------------------- ### Run LLM-Fronted A2A Chat Client Source: https://github.com/hud-evals/hud-python/blob/main/cookbooks/a2a-chat/README.md Starts an LLM-fronted A2A client that uses an OpenAI model to decide when to call the A2A agent. This command should be run in the second terminal. ```bash uv run llm_client.py ``` -------------------------------- ### Initialize a New Environment Package Source: https://github.com/hud-evals/hud-python/blob/main/docs/v6/core/cli.mdx Scaffolds a new environment package. Use presets to download starter environments from GitHub or omit for a minimal local scaffold. ```bash hud init # pick a template → ./