### Nexa SDK Local LLM Integration Setup Source: https://context7.com/apireno/domshell/llms.txt Provides instructions for setting up the Nexa SDK for local LLM integration, including installing dependencies, pulling models, and starting the Nexa serve process. This enables on-device browser automation. ```bash # Install dependencies cd integrations/nexa pip install -r requirements.txt # Pull a local model and start nexa serve nexa pull NexaAI/Qwen3-4B-4bit-MLX # Apple Silicon nexa pull NexaAI/Granite-4-Micro-GGUF # Cross-platform nexa serve # Starts OpenAI-compatible API on :18181 # Start DOMShell MCP server in another terminal cd mcp-server && npx tsx index.ts --no-confirm --allow-all ``` -------------------------------- ### Install DOMShell from Source Source: https://github.com/apireno/domshell/blob/main/README.md Clone the repository, install dependencies, and build the project to load DOMShell into Chrome manually. ```bash git clone https://github.com/apireno/DOMShell.git cd DOMShell npm install npm run build ``` -------------------------------- ### Start Nexa Serve Source: https://github.com/apireno/domshell/blob/main/experiments/nexa_ollama/README.md Starts the Nexa serve process. This is the first step in setting up the environment for Nexa-related trials. ```bash nexa serve ``` -------------------------------- ### Install and Run DOMShell Server Source: https://context7.com/apireno/domshell/llms.txt Install the DOMShell npm package globally or run it directly using npx. The init wizard helps configure installed MCP clients. ```bash npm install -g @apireno/domshell ``` ```bash npx @apireno/domshell --allow-write --no-confirm --token my-secret-token ``` ```bash npx @apireno/domshell init ``` ```bash npx @apireno/domshell init --yes --token my-secret-token ``` ```bash npx @apireno/domshell init --client cursor --token my-secret-token ``` -------------------------------- ### Install Python Dependencies Source: https://github.com/apireno/domshell/blob/main/integrations/nexa/README.md Install the necessary Python packages for the Nexa integration by running this command in the integrations/nexa directory. ```bash cd integrations/nexa pip install -r requirements.txt ``` -------------------------------- ### Start Ollama Server and Pull Model Source: https://github.com/apireno/domshell/blob/main/experiments/nexa_ollama/README.md Starts the Ollama inference server and pulls the specified Qwen3-4B model. This is required for Ollama backend trials. ```bash ollama serve ollama pull qwen3:4b ``` -------------------------------- ### Install @apireno/domshell Source: https://github.com/apireno/domshell/blob/main/mcp-server/README.md Install the DOMShell package globally using npm or run it directly with npx. ```bash npm install -g @apireno/domshell ``` ```bash npx @apireno/domshell ``` -------------------------------- ### Start Ollama Server Source: https://github.com/apireno/domshell/blob/main/experiments/model_shootout/README.md Initiates the Ollama service, which is required for running the AI models. ```bash ollama serve ``` -------------------------------- ### Start DOMShell MCP Server Source: https://github.com/apireno/domshell/blob/main/experiments/claude_domshell_vs_cic/README.md Run the DOMShell MCP server locally. Ensure you replace YOUR_TOKEN with your actual token. ```bash cd ~/repos/DOMShell/mcp-server npx tsx index.ts --allow-write --no-confirm --token YOUR_TOKEN ``` -------------------------------- ### Start DOMShell MCP Server Source: https://github.com/apireno/domshell/blob/main/experiments/nexa_ollama/README.md Starts the DOMShell MCP server, which is necessary for trials utilizing the DOMShell interface. Ensure you are in the correct directory. ```bash cd mcp-server && npx tsx index.ts --no-confirm --allow-all ``` -------------------------------- ### Full Workflow Example with DOMShell in Python Source: https://github.com/apireno/domshell/blob/main/HARNESS.md Demonstrates a complete workflow using DOMShell tools in Python, including opening a page, orienting, scoping, discovering links, extracting text, clicking a link, and detecting changes. ```python async with ClientSession(read, write) as session: await session.initialize() # Open a page await session.call_tool("domshell_open", {"url": "https://en.wikipedia.org/wiki/Model_Context_Protocol"}) # Orient: what's on this page? result = await session.call_tool("domshell_tree", {"depth": 2}) # Scope: navigate to the article body await session.call_tool("domshell_cd", {"path": "main/article"}) # Discover: find all links with URLs links = await session.call_tool("domshell_find", { "pattern": "", "type": "link", "meta": True }) # Extract: get the full article text with inline link URLs content = await session.call_tool("domshell_text", {"links": True}) # Act: click a specific link await session.call_tool("domshell_click", {"target": "References_link"}) # Detect changes after navigation changes = await session.call_tool("domshell_diff", {}) ``` -------------------------------- ### Start DOMShell MCP server manually Source: https://github.com/apireno/domshell/blob/main/HARNESS.md Manually start the DOMShell MCP server for SSE/Streamable HTTP transport. Specify the port for the server to listen on. ```bash npx @apireno/domshell --allow-write --no-confirm --mcp-port 3001 ``` -------------------------------- ### MCP Server HTTP API Subsequent Requests Source: https://context7.com/apireno/domshell/llms.txt Shows how to make subsequent requests to the MCP server after initialization, using the mcp-session-id obtained from the initialization response. This example demonstrates calling the tools/call method. ```bash # Subsequent requests use the session ID curl -X POST http://localhost:3001/mcp \ -H "Authorization: Bearer my-secret-token" \ -H "mcp-session-id: " \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "domshell_tabs", "arguments": {} }, "id": 2 }' ``` -------------------------------- ### Getting Help with DOMShell Commands Source: https://github.com/apireno/domshell/blob/main/README.md Access help for any command by appending '--help'. This displays usage information and available options. ```bash dom@shell:> ls --help ls — List children of the current node Usage: ls [options] Options: -l, --long Long format: type prefix, role, and name -r, --recursive Show nested children (one level deep) -n N Limit output to first N entries --offset N Skip first N entries (for pagination) --type ROLE Filter by AX role (e.g. --type button) --count Show count of children only ... ``` -------------------------------- ### Install DOMShell Globally Source: https://github.com/apireno/domshell/blob/main/README.md Install the DOMShell package globally on your system using npm. This makes the `domshell` command available in your terminal. ```bash npm install -g @apireno/domshell ``` -------------------------------- ### Pull and Serve Nexa Models Source: https://github.com/apireno/domshell/blob/main/integrations/nexa/README.md Download a local LLM model and start the Nexa inference server. The server exposes an OpenAI-compatible API at http://127.0.0.1:18181/v1. ```bash # Pull a model (examples — use whatever is available for your platform) nexa pull NexaAI/Qwen3-1.7B-4bit-MLX # Apple Silicon (MLX) nexa pull NexaAI/Qwen3-4B-4bit-MLX # Apple Silicon, larger nexa pull NexaAI/Granite-4-Micro-GGUF # Cross-platform (GGUF) # Start the local inference server (default: 127.0.0.1:18181) nexa serve ``` -------------------------------- ### Initialize DOMShell Non-Interactively Source: https://github.com/apireno/domshell/blob/main/README.md Use this command to run the DOMShell initialization wizard in non-interactive mode with default settings. This is useful for automated setups. ```bash npx @apireno/domshell init --yes ``` -------------------------------- ### Configure DOMShell for Claude Desktop Source: https://context7.com/apireno/domshell/llms.txt Add DOMShell to Claude Desktop's configuration file. Supports direct server start or stdio proxy mode. ```json // ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) // Direct mode — Claude Desktop spawns and manages the server process { "mcpServers": { "domshell": { "command": "npx", "args": ["-y", "@apireno/domshell", "--allow-write", "--no-confirm", "--token", "my-secret-token"] } } } ``` ```json // Stdio proxy mode — forwards to a running server on port 3001 // Use when you want the server to outlive any single Claude session { "mcpServers": { "domshell": { "command": "npx", "args": ["-y", "-p", "@apireno/domshell", "domshell-proxy", "--port", "3001", "--token", "my-secret-token"] } } } ``` -------------------------------- ### Nexa SDK Agent - Read-Only Tasks Source: https://context7.com/apireno/domshell/llms.txt Example of running the Nexa SDK agent for read-only tasks, such as extracting information from a webpage. Requires the --task argument and optionally --verbose for detailed output. ```bash # Run the agent — read-only tasks python agent.py \ --task "Open wikipedia.org/wiki/Artificial_intelligence and extract the first paragraph" \ --verbose ``` -------------------------------- ### Start DOMShell Server for E2E Testing Source: https://github.com/apireno/domshell/blob/main/HARNESS.md Launch the DOMShell MCP server with flags enabled for end-to-end testing. This configuration allows all tiers and skips interactive confirmation for write commands, simplifying automated testing. ```bash npx @apireno/domshell --allow-all --no-confirm ``` -------------------------------- ### Run DOMShell Directly with Options Source: https://github.com/apireno/domshell/blob/main/README.md Execute DOMShell directly using npx without global installation. This command allows write operations, bypasses confirmation prompts, and uses a specified token. ```bash npx @apireno/domshell --allow-write --no-confirm --token my-secret-token ``` -------------------------------- ### Display DOM Tree View Source: https://github.com/apireno/domshell/blob/main/README.md Show a hierarchical tree view of the DOM structure starting from the current node. The default depth is 2. Use `tree ` for deeper views. ```bash # Get a tree view (default depth: 2) dom@shell:~ $tree navigation/ ├── [x] home_link ├── [x] about_link ├── [x] products_link └── [x] contact_link ``` ```bash # Deeper tree dom@shell:~ $tree 4 ``` -------------------------------- ### Initialize DOMShell Source: https://github.com/apireno/domshell/blob/main/mcp-server/README.md Run the DOMShell initialization wizard to set up MCP clients, generate a token, and configure settings. Use --yes for non-interactive mode. ```bash npx @apireno/domshell init ``` -------------------------------- ### Perform One-Time Production Build Source: https://github.com/apireno/domshell/blob/main/README.md Execute this command to create a production-ready build of the project. This is typically done before deployment. ```bash npm run build ``` -------------------------------- ### Run Agent with Model Hint Source: https://github.com/apireno/domshell/blob/main/integrations/nexa/README.md Execute the agent script and specify a model hint to match against loaded models in `nexa serve`. This helps the agent select the appropriate model. ```python python agent.py --task "Summarize this page" --model qwen3-4b --verbose ``` -------------------------------- ### Nexa SDK Agent - All CLI Options Source: https://context7.com/apireno/domshell/llms.txt Lists all available command-line options for the `agent.py` script, including task definition, backend configuration, operational modes, and verbosity. ```APIDOC ## agent.py CLI Options ### Description Provides a comprehensive list of all command-line arguments available for the `agent.py` script, enabling detailed control over task execution, LLM integration, and agent behavior. ### Options - **--task TEXT** - Required - Task for the agent (required). - **--nexa-endpoint URL** - Optional - OpenAI-compatible API endpoint (default: http://127.0.0.1:18181/v1). - **--model NAME** - Optional - Model name hint (auto-discovers from nexa serve if omitted). - **--port N** - Optional - DOMShell MCP server port (default: 3001). - **--token TOKEN** - Optional - DOMShell auth token. - **--mode full|compact** - Optional - `full` uses 38 tools, `compact` uses a single `execute` tool (default: full). - **--allow-write** - Optional - Include write-tier tools. - **--max-turns N** - Optional - Max agent loop iterations (default: 20). - **--verbose** - Optional - Print each turn's tool calls and results. ``` -------------------------------- ### Verify Evaluation Files Exist Source: https://github.com/apireno/domshell/blob/main/CLAUDE.md List the evaluation files to confirm they have been generated. This command checks for the product review and test evaluation files. ```bash ls -la docs/sprints/sprint-XX/product-review.md docs/sprints/sprint-XX/test-eval.md ``` -------------------------------- ### Paginate Large Directory Listings Source: https://github.com/apireno/domshell/blob/main/README.md Control the number of items displayed in directory listings using `-n` for count and `--offset` for starting position. ```bash # Paginate large directories dom@shell:~ $ ls -n 10 # First 10 items dom@shell:~ $ ls -n 10 --offset 10 # Items 11-20 ``` -------------------------------- ### Verify Review Files Exist Source: https://github.com/apireno/domshell/blob/main/CLAUDE.md Lists the review files to confirm they were generated successfully. This step is crucial before proceeding with plan revisions or CEO approval. ```bash ls -la docs/sprints/sprint-XX/product-review.md docs/sprints/sprint-XX/vp-eng-review.md ``` -------------------------------- ### Configure Claude Desktop MCP Settings Source: https://github.com/apireno/domshell/blob/main/mcp-server/README.md Add DOMShell to your Claude Desktop MCP settings. This example shows the configuration for the standard MCP server. ```json { "mcpServers": { "domshell": { "command": "npx", "args": ["-y", "@apireno/domshell", "--allow-write"] } } } ``` -------------------------------- ### Execute VP Review Script for Engineering Source: https://github.com/apireno/domshell/blob/main/CLAUDE.md Run the VP review script for engineering documentation. Ensure the sprint number and file paths are correctly specified. ```bash ./scripts/agentic/vp-review.sh vp-eng docs/sprints/sprint-XX/dev-report.md docs/sprints/sprint-XX/test-eval.md ``` -------------------------------- ### Navigate to a URL Source: https://github.com/apireno/domshell/blob/main/README.md Use the `navigate` command to load a new URL in the current browser tab. ```bash # Navigate to a URL (current tab) dom@shell:~$navigate https://example.com ✓ Navigated to https://example.com ``` -------------------------------- ### Get Current Directory Source: https://github.com/apireno/domshell/blob/main/README.md The 'pwd' command shows the current location within the DOMShell's virtual filesystem, typically within a tab's context. ```bash dom@shell:~$ pwd ~/tabs/125 ``` -------------------------------- ### domshell_for Source: https://context7.com/apireno/domshell/llms.txt Runs a source command, splits output into lines, and executes an action template for each line. Capped at 50 items and 120 seconds. ```APIDOC ## domshell_for — Iterate Over Command Output Runs a source command, splits output into lines, and executes an action template for each line. `{}` is replaced with each line. Capped at 50 items and 120 seconds. ### Usage Examples: ``` // Extract text from each of the first 5 headings domshell_for { source: "find --type heading -n 5", template: "text {}" } // Open a list of URLs in new tabs domshell_for { source: "eval [...document.querySelectorAll('.result a')].map(a=>a.href).join('\n')", template: "open {}" } // cat each link found in the navigation domshell_for { source: "find --type link -n 10", template: "cat {}" } ``` ``` -------------------------------- ### Filter find results with grep Source: https://github.com/apireno/domshell/blob/main/README.md Use the pipe operator (`|`) with `grep` to filter the output of the `find` command, similar to bash. This example filters for GitHub links. ```bash # Filter find results to only GitHub links dom@shell:~$ find --type link --meta | grep github [x] /main/repo_link (link) href=https://github.com/example ``` -------------------------------- ### Execute VP Review Script Source: https://github.com/apireno/domshell/blob/main/CLAUDE.md Run the VP review script for production documentation. Ensure the sprint number and file paths are correctly specified. ```bash ./scripts/agentic/vp-review.sh vp-prod docs/sprints/sprint-XX/dev-report.md docs/sprints/sprint-XX/product-review.md ``` -------------------------------- ### Run Nexa AI Agent Locally Source: https://github.com/apireno/domshell/blob/main/README.md Execute the Nexa AI agent script to perform tasks using local LLMs. Ensure you have the nexa-sdk installed and configured. ```bash python integrations/nexa/agent.py --task "Open wikipedia.org/wiki/AI and extract the first paragraph" --verbose ``` -------------------------------- ### Navigate Up One Directory Source: https://github.com/apireno/domshell/blob/main/README.md Move to the parent directory in the DOM hierarchy. ```bash # Go back up dom@shell:~ $ cd .. ``` -------------------------------- ### Nexa SDK Agent CLI Options Source: https://context7.com/apireno/domshell/llms.txt Lists all available command-line interface options for the Nexa SDK agent, detailing their purpose and default values. This includes task specification, API endpoints, model selection, and operational modes. ```bash # All CLI options: # --task TEXT Task for the agent (required) # --nexa-endpoint URL OpenAI-compatible API endpoint (default: http://127.0.0.1:18181/v1) # --model NAME Model name hint (auto-discovers from nexa serve if omitted) # --port N DOMShell MCP server port (default: 3001) # --token TOKEN DOMShell auth token # --mode full|compact full=38 tools, compact=single execute tool (default: full) # --allow-write Include write-tier tools # --max-turns N Max agent loop iterations (default: 20) # --verbose Print each turn's tool calls and results ``` -------------------------------- ### Read text from a nested element directly Source: https://github.com/apireno/domshell/blob/main/README.md Commands like `text` can accept relative paths, eliminating the need to `cd` into the directory first. This example reads text from a nested paragraph. ```bash # Read text from a nested element directly dom@shell:~$ text main/article/paragraph_2971 ``` -------------------------------- ### Nexa SDK Agent - Write-Access Tasks Source: https://context7.com/apireno/domshell/llms.txt Demonstrates running the Nexa SDK agent for tasks that require write access, such as interacting with forms or clicking elements. Use the --allow-write flag to enable these capabilities. ```bash # Write-access tasks (clicking, typing, form submission) python agent.py \ --task "Go to google.com, search for nexa ai, and list the first 3 results" \ --allow-write \ --verbose ``` -------------------------------- ### Configure DOMShell MCP Server in Claude Desktop Source: https://github.com/apireno/domshell/blob/main/experiments/claude_domshell_vs_cic/README.md Add the DOMShell server configuration to Claude Desktop's MCP settings. Replace PATH_TO/DOMShell with the actual path to your DOMShell installation and YOUR_TOKEN with your token. ```json { "mcpServers": { "domshell": { "command": "npx", "args": ["tsx", "PATH_TO/DOMShell/mcp-server/proxy.ts", "--port", "3001", "--token", "YOUR_TOKEN"] } } } ``` -------------------------------- ### Enter a Directory Source: https://github.com/apireno/domshell/blob/main/README.md Change the current directory to a subdirectory, which represents a container element in the DOM. ```bash # Enter a directory (container element) dom@shell:~ $ cd navigation ``` -------------------------------- ### MCP Server HTTP API Initialization Source: https://context7.com/apireno/domshell/llms.txt Demonstrates how to initialize a new MCP session using a POST request to the /mcp endpoint. Authentication can be done via the Authorization header or a ?token= query parameter. The response includes an mcp-session-id header. ```bash # Initialize a new MCP session (must be an initialize request) curl -X POST http://localhost:3001/mcp \ -H "Authorization: Bearer my-secret-token" \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "initialize", "params": { "protocolVersion": "2024-11-05", "clientInfo": { "name": "my-client", "version": "1.0" } }, "id": 1 }' ``` -------------------------------- ### DOMShell CLI Flags Reference Source: https://context7.com/apireno/domshell/llms.txt Configure security, networking, and permissions using CLI flags when starting the DOMShell server. Flags control write access, sensitive data access, domain restrictions, and logging. ```bash # Full flag reference npx @apireno/domshell \ --allow-write \ --allow-sensitive \ --allow-all \ --no-confirm \ --token my-token \ --mcp-port 3001 \ --port 9876 \ --domains "github.com,docs.google.com" \ --log-file audit.log \ --expose-cookies ``` ```bash # Read-only server (safest — agents can browse but not interact) npx @apireno/domshell --token read-only-token ``` ```bash # Full-access server for trusted local use npx @apireno/domshell --allow-all --no-confirm --token dev-token ``` ```bash # Domain-restricted write server npx @apireno/domshell --allow-write --no-confirm --domains "github.com" --token gh-token ``` -------------------------------- ### Navigate to Parent to Find Properties Source: https://github.com/apireno/domshell/blob/main/README.md If an element is not directly accessible or lacks desired properties, navigate to its parent directory using `cd ..` and inspect the parent. ```bash # Navigate to parent to find its properties (e.g. span inside a link) dom@shell:~ $ cd .. dom@shell:~ $ cat parent_link ``` -------------------------------- ### Navigate Back to Browser Root Source: https://github.com/apireno/domshell/blob/main/README.md Use 'cd ~' to return to the root of the DOMShell's virtual filesystem from within a tab. ```bash dom@shell:~$ cd ~ dom@shell:~$ ``` -------------------------------- ### Running Agent with Ollama Backend Source: https://github.com/apireno/domshell/blob/main/integrations/nexa/README.md Configure the agent to use Ollama as an alternative backend by specifying the endpoint and model. Requires Ollama to be running and a model pulled. ```bash # Start Ollama and pull a model ollama serve ollama pull qwen3:4b # Run agent with Ollama endpoint python agent.py --task "..." --nexa-endpoint http://127.0.0.1:11434/v1 --model qwen3 --allow-write --verbose ``` -------------------------------- ### Nexa SDK Agent - Run Read-Only Tasks Source: https://context7.com/apireno/domshell/llms.txt Executes read-only tasks using the Nexa SDK and a local LLM. This involves setting up the environment, pulling a model, starting the Nexa server, and then running the agent script. ```APIDOC ## python agent.py --task TEXT ### Description Runs read-only tasks using the agent. Requires Nexa SDK setup and a running DOMShell MCP server. ### Command ```bash python agent.py --task "[Your task description]" ``` ### Parameters - **--task** (TEXT) - Required - The task for the agent to perform. - **--verbose** - Optional - Prints each turn's tool calls and results. ``` -------------------------------- ### Iterate Over Command Output with domshell_for Source: https://context7.com/apireno/domshell/llms.txt Runs a source command, splits its output into lines, and executes an action template for each line. Capped at 50 items and 120 seconds. The placeholder `{}` is replaced with each line of output. ```javascript domshell_for { source: "find --type heading -n 5", template: "text {}" } ``` ```javascript domshell_for { source: "eval [...document.querySelectorAll('.result a')].map(a=>a.href).join('\n')", template: "open {}" } ``` ```javascript domshell_for { source: "find --type link -n 10", template: "cat {}" } ``` -------------------------------- ### Workflow 1: Find and read article content Source: https://github.com/apireno/domshell/blob/main/README.md This workflow demonstrates finding an article section using `grep -r` and then reading its full content using `text`. ```bash # Workflow 1: Find and read an article section dom@shell:~$ grep -r article [d] article (article) → ./main/article/ dom@shell:~$ cd main/article dom@shell:~/main/article$ text [full article content in one call] ``` -------------------------------- ### AXNode and VFSNode TypeScript Interfaces Source: https://context7.com/apireno/domshell/llms.txt Defines the core TypeScript interfaces for Accessibility Tree nodes (AXNode) and virtual filesystem nodes (VFSNode). Includes examples of their structure and usage, along with constants for role aliases and shell state representation. ```typescript import type { AXNode, VFSNode, ShellState } from "./src/shared/types.ts"; import { CONTAINER_ROLES, INTERACTIVE_ROLES, ROLE_ALIASES } from "./src/shared/types.ts"; // AXNode — raw CDP Accessibility Tree node const node: AXNode = { nodeId: "123", backendDOMNodeId: 456, role: { type: "role", value: "button" }, name: { type: "computedString", value: "Submit" }, childIds: [], ignored: false, }; // VFSNode — mapped virtual filesystem node const vfs: VFSNode = { axNodeId: "123", backendDOMNodeId: 456, name: "submit_btn", // generated by generateNodeName() role: "button", value: "Submit", isDirectory: false, // leaf node }; // ROLE_ALIASES — natural-language aliases accepted by find --type ROLE_ALIASES["input"] // → ["textbox", "searchbox", "combobox", "spinbutton"] ROLE_ALIASES["dropdown"] // → ["combobox", "listbox"] ROLE_ALIASES["nav"] // → ["navigation"] ROLE_ALIASES["toggle"] // → ["switch", "checkbox"] ROLE_ALIASES["modal"] // → ["dialog", "alertdialog"] // ShellState — persisted shell context (path, tabs, env vars) const state: ShellState = { path: ["tabs", "125", "main", "article"], // current directory segments axNodeIds: ["rootId", "mainId", "articleId"], // parallel AX node IDs activeTabId: 125, // currently attached tab env: { "MY_VAR": "hello" }, // exported env variables }; ``` -------------------------------- ### DOMShell Workflow: Form Interaction and Change Detection Source: https://context7.com/apireno/domshell/llms.txt Automates form submission and detects changes on the page. It uses `domshell_here` to get the current location, `domshell_find` for input elements, `domshell_submit` to fill and submit, `domshell_wait` for results, and `domshell_diff` to show changes. ```shell // ── PATTERN 4: Form Interaction + Change Detection ── domshell_here {} domshell_find { type: "input", meta: true } domshell_submit { input: "search_input", value: "machine learning" } domshell_wait { pattern: "results" } domshell_diff {} // → Shows exactly what appeared after search ``` -------------------------------- ### Navigate Multi-Level Paths Source: https://github.com/apireno/domshell/blob/main/README.md Use `cd` with multiple directory names to navigate deep into the DOM structure in a single command. ```bash # Multi-level paths work too dom@shell:~ $ cd main/article/form ``` -------------------------------- ### Execute Ideation Sprint Script Source: https://github.com/apireno/domshell/blob/main/CLAUDE.md Initiate an ideation sprint by running the agentic script. Provide the path to the goal file and the output directory. ```bash ./scripts/agentic/ideo-sprint.sh docs/ideation/YYYY-MM-DD-{slug}/goal.md docs/ideation/YYYY-MM-DD-{slug}/ ``` -------------------------------- ### DOMShell Security Tier Definitions Source: https://context7.com/apireno/domshell/llms.txt Defines sets of commands for different security tiers: navigate, write, and sensitive. Server start flags control access levels, and write actions may require user confirmation unless --no-confirm is used. ```typescript // Tier definitions (from mcp-server/index.ts) const NAVIGATE_COMMANDS = new Set(["navigate", "goto", "open", "back", "forward"]); const WRITE_COMMANDS = new Set(["click", "focus", "type", "scroll", "js", "select", "close", "call"]); const SENSITIVE_COMMANDS = new Set(["whoami"]); // All other commands: Read tier (always allowed) // Security flags at server start: // --allow-write enables Navigate + Write tiers // --allow-sensitive enables Sensitive tier // --allow-all enables all three // --no-confirm skips the y/n prompt for Write actions // --domains a.com restricts all commands to matching tab URLs // Confirmation flow for Write tier (when --no-confirm is NOT set): // [DOMShell] Claude wants to: click submit_btn // Allow? (y/n): // Timeout after 60s → deny // NOTE: Always use --no-confirm with Claude Desktop (no /dev/tty available) // Audit log format (every command logged): // [2026-02-07T12:00:00.000Z] EXECUTE: ls -l // [2026-02-07T12:00:01.000Z] RESULT: 12 items // [2026-02-07T12:00:05.000Z] [WRITE] EXECUTE: click submit_btn // [2026-02-07T12:00:05.500Z] [WRITE] RESULT: ✓ Clicked: submit_btn (button) // [2026-02-07T12:00:10.000Z] [SENSITIVE] EXECUTE: whoami ``` -------------------------------- ### Generate Filesystem-Safe Names with VFS Mapper Source: https://context7.com/apireno/domshell/llms.txt The VFS mapper functions convert Accessibility Tree nodes into human-readable filesystem names. Node roles get semantic suffixes, and duplicate names are auto-incremented. Use these functions to create consistent naming conventions for UI elements. ```typescript import { generateNodeName, isContainerNode, getChildVFSNodes, resolveByPath } from "./src/background/vfs_mapper.ts"; ``` ```typescript // generateNodeName — produces filesystem-safe names from AX nodes generateNodeName({ role: { value: "button" }, name: { value: "Submit" } }) // → "submit_btn" ``` ```typescript generateNodeName({ role: { value: "link" }, name: { value: "Contact Us" } }) // → "contact_us_link" ``` ```typescript generateNodeName({ role: { value: "textbox" }, name: { value: "Email" } }) // → "email_input" ``` ```typescript generateNodeName({ role: { value: "checkbox" }, name: { value: "I agree" } }) // → "i_agree_chk" ``` ```typescript // isContainerNode — decides if a node becomes a directory isContainerNode({ role: { value: "navigation" }, childIds: ["1", "2"] }) // → true ``` ```typescript isContainerNode({ role: { value: "button" }, childIds: [] }) // → false ``` ```typescript // resolveByPath — resolves multi-segment paths (e.g. "main/article/form") const vfsNode = resolveByPath(rootNodeId, "main/article/form", nodeMap); // → { axNodeId: "456", name: "form", role: "form", isDirectory: true } ``` ```typescript // getChildVFSNodes — returns deduplicated, flattened children of a node // (skips ignored nodes, role=none, unnamed generic wrappers) const children = getChildVFSNodes(parentId, nodeMap); // → [{ name: "navigation", role: "navigation", isDirectory: true }, ...] ``` -------------------------------- ### Claude in Chrome Prompt for Search and Navigation Source: https://github.com/apireno/domshell/blob/main/experiments/claude_domshell_vs_cic/prompts.md This prompt guides Claude in Chrome to perform a search, navigate to a result, and extract specific content. It outlines rules for using browser tools, handling navigation, and formatting the output, including limitations on tool calls and error handling. ```text RULES — read these first: - You MUST use your browser tools (navigate, read_page, find, get_page_text, form_input, etc.) to complete this task. Do NOT use domshell or any external MCP tools. - You MUST actually navigate to each page and read its content using your tools. Do NOT use prior knowledge or training data to answer. Every fact in your response must come from what you read on the page. - If you cannot find an element after 3 attempts, skip it and note it as "[not found]". - Do not explore the page beyond what is needed for the task. - Be fast and direct. Minimize unnecessary tool calls. - If you are still working after 20 tool calls, wrap up immediately with whatever you have. - Return partial results rather than nothing. - When done, close all tabs you opened during this task to keep the browser clean for the next task. TASK: Go to https://en.wikipedia.org. Search for "machine learning" using the search box. On the results page, click the first result. Then extract the first paragraph of the article and list all items in the "See also" section. OUTPUT FORMAT: ## First Paragraph (paste the paragraph text) ``` -------------------------------- ### Print Current Working Directory Source: https://github.com/apireno/domshell/blob/main/README.md Display the current path within the DOM's filesystem-like structure. ```bash # See where you are dom@shell:~ $ pwd ~/tabs/125/navigation ``` -------------------------------- ### domshell_find — Deep Recursive Search Source: https://context7.com/apireno/domshell/llms.txt Performs a deep recursive search starting from the current directory to find elements. It supports searching by type, pattern matching, metadata inclusion, visible text content, and natural-language type aliases. Results include full paths to matching elements. ```APIDOC ## domshell_find — Deep Recursive Search Deep recursive search from the current directory. Returns full paths to matching elements. Scope with `domshell_cd` first for targeted results. ``` // Find all links with their URLs (most common pattern) domshell_find { type: "link", meta: true } // [x] /nav/home_link (link) href=https://example.com/ // [x] /main/read_more_link (link) href=https://example.com/article // Find by pattern + show visible text domshell_find { pattern: "login", meta: true, text: true } // Natural-language type aliases — no need to know exact AX role names domshell_find { type: "input" } // matches textbox, searchbox, combobox, spinbutton domshell_find { type: "dropdown" } // matches combobox, listbox domshell_find { type: "nav" } // matches navigation domshell_find { type: "toggle" } // matches switch, checkbox domshell_find { type: "modal" } // matches dialog, alertdialog domshell_find { type: "btn" } // matches button domshell_find { type: "sidebar" } // matches complementary // Find by visible text content (not just element name) domshell_find { type: "button", content: true, pattern: "sign up" } // [x] /main/hero/get_started_btn (button) — displayed text: "Sign Up Free" // Limit results and pipe through grep for filtering // (via domshell_execute: "find --type link --meta | grep github") domshell_execute { command: "find --type link --meta | grep github.com" } ``` ``` -------------------------------- ### List Open Windows and Tabs Source: https://github.com/apireno/domshell/blob/main/README.md The 'windows' command displays all open browser windows and lists the tabs contained within each, indicating the focused window. ```bash dom@shell:~$ windows Window 1 (focused) ├── *123 Google google.com ├── 124 GitHub - apireno github.com/apireno └── 125 Wikipedia en.wikipedia.org Window 2 ├── *126 Stack Overflow stackoverflow.com └── 127 MDN Web Docs developer.mozilla.org ``` -------------------------------- ### Connect via stdio transport with Python Source: https://github.com/apireno/domshell/blob/main/HARNESS.md Use this method for CLI wrappers. It establishes a client session using the MCP protocol over stdio, allowing interaction with DOMShell tools. ```python from mcp import ClientSession, StdioServerParameters from mcp.client.stdio import stdio_client server_params = StdioServerParameters( command="npx", args=["@apireno/domshell", "--allow-write", "--no-confirm"] ) async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() # List available tools tools = await session.list_tools() # Execute a command result = await session.call_tool("domshell_ls", {"path": "/"}) ``` -------------------------------- ### domshell_help Source: https://github.com/apireno/domshell/blob/main/HARNESS.md Displays help information for DOMShell commands. ```APIDOC ## domshell_help ### Description Show help for commands. ### Category Core ``` -------------------------------- ### Pull AI Models for Ollama Source: https://github.com/apireno/domshell/blob/main/experiments/model_shootout/README.md Downloads the specified AI models to be used with Ollama. Ensure you have sufficient disk space. ```bash ollama pull qwen3:4b ollama pull hermes3:3b ollama pull ibm/granite4:tiny-h-q4_K_M ollama pull llama3.2:3b ``` -------------------------------- ### Create Permissions Asked Marker Source: https://github.com/apireno/domshell/blob/main/CLAUDE.md Creates a marker file to indicate that permissions have been asked. Use this when the user opts not to enable permissive settings. ```bash touch .claude/.permissions-asked ``` -------------------------------- ### Configure Claude Desktop MCP Settings for stdio proxy Source: https://github.com/apireno/domshell/blob/main/mcp-server/README.md Configure DOMShell for the stdio proxy, which is required if your client needs command/args format. Replace YOUR_TOKEN with your actual token. ```json { "mcpServers": { "domshell": { "command": "npx", "args": ["-y", "-p", "@apireno/domshell", "domshell-proxy", "--port", "3001", "--token", "YOUR_TOKEN"] } } } ``` -------------------------------- ### Open a URL in a new tab Source: https://github.com/apireno/domshell/blob/main/README.md Use the `open` command to open a specified URL in a new browser tab. ```bash # Open a URL in a new tab dom@shell:~$open https://github.com ✓ Opened new tab → https://github.com ``` -------------------------------- ### Workflow 2: Find and extract links from a section Source: https://github.com/apireno/domshell/blob/main/README.md This workflow shows how to locate a section using `grep -r`, navigate into it with `cd`, and then extract all links with their metadata using `find --type link --meta`. ```bash # Workflow 2: Find a section and extract its links dom@shell:~$ grep -r references [d] references (region) → ./main/article/references/ dom@shell:~$ cd main/article/references dom@shell:~/main/article/references$ find --type link --meta [x] /wiki_link (link) href=https://en.wikipedia.org/... [x] /paper_link (link) href=https://arxiv.org/... ``` -------------------------------- ### Workflow 5: Find elements by visible text Source: https://github.com/apireno/domshell/blob/main/README.md This workflow demonstrates finding elements not just by name, but by their visible text content using `grep -r --content`. It then shows how to interact with the found element using `click`. ```bash # Workflow 5: Find elements by visible text (not just name) dom@shell:~$ grep -r --content "sign up" [x] get_started_btn (button) → ./main/hero/get_started_btn # The button's NAME is "get_started_btn" but its displayed text says "Sign Up Free" dom@shell:~$ click get_started_btn ``` -------------------------------- ### domshell_windows Source: https://github.com/apireno/domshell/blob/main/HARNESS.md Lists all windows, with tabs grouped by window. ```APIDOC ## domshell_windows ### Description List windows with grouped tabs. ### Category Navigation ``` -------------------------------- ### Navigate Using Substring Matching Source: https://github.com/apireno/domshell/blob/main/README.md DOMShell allows navigating to tabs using partial matches of their titles or URLs with the 'cd' command. ```bash dom@shell:~$ cd tabs/github ✓ Entered tab 124 (GitHub - apireno) ``` -------------------------------- ### System Commands in DOMShell Source: https://github.com/apireno/domshell/blob/main/README.md Execute system-level commands like 'whoami' to check authentication, 'env' to view environment variables, or 'export' to set them. ```bash # Check if you're authenticated (reads cookies) dom@shell:~>whoami URL: https://example.com Status: Authenticated Via: session_id Expires: 2025-12-31T00:00:00.000Z Total cookies: 12 ``` ```bash # Environment variables dom@shell:~>env SHELL=/bin/domshell TERM=xterm-256color ``` ```bash # Set a variable dom@shell:~>export API_KEY=sk-abc123 ``` ```bash # Debug the raw AX tree dom@shell:~>debug stats --- Debug Stats --- Total AX nodes: 247 Ignored nodes: 83 Generic nodes: 41 With children: 62 Iframes: 2 ``` -------------------------------- ### Run DOMShell in Watch Mode Source: https://github.com/apireno/domshell/blob/main/README.md Use this command to continuously rebuild the project on file changes during development. After building, reload the extension and reopen the side panel to see updates. ```bash npm run dev ``` -------------------------------- ### Click a button directly Source: https://github.com/apireno/domshell/blob/main/README.md Use the `click` command with a relative path to interact with elements like buttons without needing to `cd` first. ```bash # Click a button inside a form without cd'ing dom@shell:~$ click main/form/submit_btn ``` -------------------------------- ### Workflow 4: Discover and read section text Source: https://github.com/apireno/domshell/blob/main/README.md This workflow shows how to discover sections using `grep -r` and then navigate into a specific section to read its text content using `cd` and `text`. ```bash # Workflow 4: Discover sections, then drill into one dom@shell:~$ grep -r heading [−] Introduction_heading (heading) → ./main/article/Introduction_heading [−] Methods_heading (heading) → ./main/article/Methods_heading [−] Results_heading (heading) → ./main/article/Results_heading dom@shell:~$ cd main/article/Results_heading dom@shell:~/main/article/Results_heading$ text [text content of the Results section] ``` -------------------------------- ### Python Client for DOMShell Integration Testing Source: https://github.com/apireno/domshell/blob/main/HARNESS.md Use the Python SDK to establish a client session with the DOMShell MCP server and verify connectivity. This snippet initializes a session and calls the 'domshell_here' tool, expecting tab information as a response. ```python async with ClientSession(read, write) as session: await session.initialize() # This should return tab info if the extension is connected result = await session.call_tool("domshell_here", {}) assert "Title:" in str(result) # Extension is connected and a tab is active ``` -------------------------------- ### Reproduce Trials with Nexa and DOMShell Source: https://github.com/apireno/domshell/blob/main/experiments/nexa_claude/README.md Execute all trials for the Nexa and DOMShell experiment. Ensure nexa serve and the DOMShell MCP server are running in separate terminals before executing. ```bash # 1. Start nexa serve nexa serve # 2. Start DOMShell MCP server (separate terminal) cd mcp-server && npx tsx index.ts --no-confirm --allow-all # 3. Run all trials bash experiments/nexa_claude/run_trials.sh ``` -------------------------------- ### List Children with Type Prefixes and Roles Source: https://github.com/apireno/domshell/blob/main/README.md Perform a long listing of children, showing type prefixes (like `[d]` for directory, `[x]` for interactive) and ARIA roles. ```bash # Long format shows type prefixes and roles dom@shell:~ $ ls -l [d] navigation navigation/ [d] main main/ [x] link skip_to_content_link [x] link logo_link ``` -------------------------------- ### Enable Write Commands with Domain Restriction Source: https://github.com/apireno/domshell/blob/main/README.md Use the `--allow-write` flag to enable write commands and `--domains` to restrict execution to specified domains. This is useful for targeted automation. ```bash npx tsx index.ts --allow-write --domains "github.com,docs.google.com" ``` -------------------------------- ### Navigation with domshell_navigate, domshell_open, domshell_back, and domshell_forward Source: https://context7.com/apireno/domshell/llms.txt Commands for navigating between web pages. `domshell_navigate` changes the URL in the current tab, `domshell_open` opens a URL in a new tab, and `domshell_back`/`domshell_forward` move through browser history. ```bash // Navigate current tab to a URL (requires being inside a tab) domshell_navigate { url: "https://en.wikipedia.org/wiki/Machine_learning" } ``` ```bash // Open in a new tab and enter it automatically domshell_open { url: "https://github.com/apireno/DOMShell" } ``` ```bash // Use back instead of navigate to return to previous pages (browser cache = faster) domshell_back {} // go back ``` ```bash domshell_forward {} // go forward ``` -------------------------------- ### Copy Permissive Settings Source: https://github.com/apireno/domshell/blob/main/CLAUDE.md Copies the permissive settings file to the active settings file and creates a marker file to indicate permissions have been asked. Use this when the user opts for broader tool access. ```bash cp .claude/settings.permissive.json .claude/settings.json && touch .claude/.permissions-asked ``` -------------------------------- ### Navigate to a Specific Window Source: https://github.com/apireno/domshell/blob/main/README.md Change the current directory to a specific window using its ID. ```bash dom@shell:~ $ cd windows/2 ```