### Start LLM Server with 9B Model (64K Context)

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Starts the llama.cpp server using the 9B model, configured for a large 64K context window. This setup is beneficial for tasks requiring extensive context retention and tool usage.

```bash
llama-server \
    --model ~/models/Qwen3.5-9B-Q4_K_M.gguf \
    --port 8000 --host 127.0.0.1 \
    --flash-attn on --ctx-size 65536 \
    --cache-type-k q4_0 --cache-type-v q4_0 \
    --n-gpu-layers 99 --reasoning off -t 4

```

--------------------------------

### Build PicoClaw Agent Framework

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Clones the PicoClaw repository, navigates into it, and builds the agent framework. This involves installing dependencies and compiling the necessary binaries.

```bash
cd <this-repo-directory>
git clone https://github.com/sipeed/picoclaw.git
cd picoclaw && make deps && make build && cd ..

```

--------------------------------

### Configure PicoClaw

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Sets up the PicoClaw configuration by creating the necessary directory and copying an example configuration file to the user's configuration path.

```bash
mkdir -p ~/.picoclaw/workspace
cp config.example.json ~/.picoclaw/config.json

```

--------------------------------

### Install Go Programming Language

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Installs the Go programming language (version 1.25+) using Homebrew, which is a prerequisite for the mac code project.

```bash
brew install go

```

--------------------------------

### Homebrew Installation Script

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

A one-command script to install Homebrew, the package manager for macOS, which is a prerequisite for installing other tools like llama.cpp.

```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

```

--------------------------------

### Install llama.cpp using Homebrew

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Installs the llama.cpp library, a core component for running local LLMs, using the Homebrew package manager.

```bash
brew install llama.cpp
```

--------------------------------

### Start LLM Server with 35B MoE Model

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Starts the llama.cpp server using the 35B MoE model. This configuration is optimized for speed with SSD paging and a 12K context size, suitable for general use.

```bash
llama-server \
    --model ~/models/Qwen3.5-35B-A3B-UD-IQ2_M.gguf \
    --port 8000 --host 127.0.0.1 \
    --flash-attn on --ctx-size 12288 \
    --cache-type-k q4_0 --cache-type-v q4_0 \
    --n-gpu-layers 99 --reasoning off -np 1 -t 4

```

--------------------------------

### Setup and Run Agent with llama.cpp

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

Instructions for installing dependencies, downloading models via Hugging Face, and launching the llama.cpp server for both 35B and 9B model configurations.

```bash
brew install llama.cpp
pip3 install rich ddgs huggingface-hub --break-system-packages

# Download 35B model
mkdir -p ~/models
python3 -c "from huggingface_hub import hf_hub_download; hf_hub_download('unsloth/Qwen3.5-35B-A3B-GGUF', 'Qwen3.5-35B-A3B-UD-IQ2_M.gguf', local_dir='$HOME/models/')"

# Start server
llama-server --model ~/models/Qwen3.5-35B-A3B-UD-IQ2_M.gguf --port 8000 --host 127.0.0.1 --flash-attn on --ctx-size 12288 --cache-type-k q4_0 --cache-type-v q4_0 --n-gpu-layers 99 --reasoning off -np 1 -t 4

# Run agent
python3 agent.py
```

```bash
# Download 9B model
python3 -c "from huggingface_hub import hf_hub_download; hf_hub_download('unsloth/Qwen3.5-9B-GGUF', 'Qwen3.5-9B-Q4_K_M.gguf', local_dir='$HOME/models/')"

# Start server
llama-server --model ~/models/Qwen3.5-9B-Q4_K_M.gguf --port 8000 --host 127.0.0.1 --flash-attn on --ctx-size 65536 --cache-type-k q4_0 --cache-type-v q4_0 --n-gpu-layers 99 --reasoning off -t 4

# Run agent
python3 agent.py
```

--------------------------------

### Setup and Run Agent with MLX

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

Instructions for setting up the MLX backend, which supports persistent KV cache and faster generation speeds.

```bash
pip3 install mlx-lm rich ddgs --break-system-packages

# Start MLX engine
python3 mlx/mlx_engine.py

# Run agent
python3 agent.py
```

--------------------------------

### Download 9B Model with 64K Context

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Downloads the 9B model (Qwen3.5-9B-Q4_K_M.gguf) from Hugging Face Hub to the local '~/models/' directory. This model supports a 64K context window and utilizes MLX for persistent KV cache.

```python
from huggingface_hub import hf_hub_download
hf_hub_download('unsloth/Qwen3.5-9B-GGUF',
    'Qwen3.5-9B-Q4_K_M.gguf', local_dir='$HOME/models/')

```

--------------------------------

### mac-code Quick Start - Option A: llama.cpp + 35B MoE

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

This section details the steps to set up and run mac-code using the llama.cpp backend with a 35B MoE model, leveraging SSD paging for larger models on limited RAM.

```APIDOC
## Quick Start: llama.cpp + 35B MoE

### Description
This option sets up mac-code using the `llama.cpp` backend with a 35B MoE model. It's the default and offers approximately 30 tokens/second via SSD paging, making it suitable for Macs with 16GB RAM.

### Setup Steps

1.  **Install llama.cpp:**
    ```bash
    brew install llama.cpp
    ```

2.  **Install Python dependencies:**
    ```bash
    pip3 install rich ddgs huggingface-hub --break-system-packages
    ```

3.  **Download the 35B model:**
    ```bash
    mkdir -p ~/models
    python3 -c "
    from huggingface_hub import hf_hub_download
    hf_hub_download('unsloth/Qwen3.5-35B-A3B-GGUF',
        'Qwen3.5-35B-A3B-UD-IQ2_M.gguf', local_dir='$HOME/models/')
    "
    ```
    *(Note: This downloads a 10.6 GB model.)*

4.  **Start the llama-server:**
    ```bash
    llama-server \
        --model ~/models/Qwen3.5-35B-A3B-UD-IQ2_M.gguf \
        --port 8000 --host 127.0.0.1 \
        --flash-attn on --ctx-size 12288 \
        --cache-type-k q4_0 --cache-type-v q4_0 \
        --n-gpu-layers 99 --reasoning off -np 1 -t 4
    ```

5.  **Run the agent:**
    ```bash
    python3 agent.py
    ```

### Backend Details
*   **Model:** Qwen3.5-35B-A3B (10.6 GB)
*   **Speed:** 16-30 tokens/sec (via SSD paging on 16GB Macs)
*   **Context Size:** 12K (35B) / 64K (9B)
*   **Persistent Memory:** No
```

--------------------------------

### mac-code Quick Start - Option B: MLX + 9B

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

This section outlines the setup for mac-code using the MLX backend with a 9B model, highlighting its speed and persistent context capabilities.

```APIDOC
## Quick Start: MLX + 9B

### Description
This option configures mac-code to use the `MLX` backend with a 9B model. It offers approximately 20 tokens/second and features persistent context, allowing you to save and load conversation states.

### Setup Steps

1.  **Install Python dependencies:**
    ```bash
    pip3 install mlx-lm rich ddgs --break-system-packages
    ```

2.  **Start the MLX engine:**
    ```bash
    python3 mlx/mlx_engine.py
    ```
    *(Note: The 9B model will be downloaded automatically on the first run.)*

3.  **Run the agent:**
    ```bash
    python3 agent.py
    ```

### Backend Details
*   **Model:** 9B (downloaded automatically)
*   **Speed:** ~20 tokens/sec (+25% faster than llama.cpp for generation)
*   **Context Size:** 64K
*   **Persistent Memory:** Yes (save/load/R2 sync)
```

--------------------------------

### Install Python Dependencies

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Installs necessary Python packages, including huggingface-hub for model downloads and rich for enhanced terminal output. The --break-system-packages flag is used to allow installation in system-wide Python environments.

```bash
pip3 install huggingface-hub rich --break-system-packages
```

--------------------------------

### mac-code Quick Start - Option C: llama.cpp + 9B (64K Context)

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

This option details how to run mac-code with llama.cpp and a 9B model, specifically configured for a 64K context window without using the MLX backend.

```APIDOC
## Quick Start: llama.cpp + 9B (64K Context)

### Description
This option allows you to use the `llama.cpp` backend with a 9B model, configured to achieve a 64K context window. This is an alternative if you prefer `llama.cpp` but need a larger context than typically available with its default settings.

### Setup Steps

1.  **Install Python dependencies (if not already installed):**
    ```bash
    pip3 install rich ddgs huggingface-hub --break-system-packages
    ```

2.  **Download the 9B model:**
    ```bash
    python3 -c "
    from huggingface_hub import hf_hub_download
    hf_hub_download('unsloth/Qwen3.5-9B-GGUF',
        'Qwen3.5-9B-Q4_K_M.gguf', local_dir='$HOME/models/')
    "
    ```

3.  **Start the llama-server with 64K context:**
    ```bash
    llama-server \
        --model ~/models/Qwen3.5-9B-Q4_K_M.gguf \
        --port 8000 --host 127.0.0.1 \
        --flash-attn on --ctx-size 65536 \
        --cache-type-k q4_0 --cache-type-v q4_0 \
        --n-gpu-layers 99 --reasoning off -t 4
    ```
    *(Note: `--ctx-size 65536` enables the 64K context. The quantized KV cache flags reduce memory usage.)*

4.  **Run the agent:**
    ```bash
    python3 agent.py
    ```

### Backend Details
*   **Model:** Qwen3.5-9B-Q4_K_M
*   **Speed:** Comparable to other llama.cpp setups, but optimized for context.
*   **Context Size:** 64K
*   **Persistent Memory:** No (inherent limitation of this `llama.cpp` configuration)
```

--------------------------------

### Download 35B MoE Model

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Downloads the 35B MoE model (Qwen3.5-35B-A3B-UD-IQ2_M.gguf) from Hugging Face Hub to the local '~/models/' directory. This model is optimized for SSD paging and offers approximately 30 tokens/second.

```python
from huggingface_hub import hf_hub_download
hf_hub_download('unsloth/Qwen3.5-35B-GGUF',
    'Qwen3.5-35B-A3B-UD-IQ2_M.gguf', local_dir='$HOME/models/')

```

--------------------------------

### Run the mac code Agent

Source: https://github.com/walter-grace/mac-code/blob/main/CLAUDE.md

Executes the main agent script, which provides a TUI for interacting with the LLM, including features like auto-routing, slash commands, web search, and tool integration.

```python
python3 agent.py

```

--------------------------------

### Ask LLM Directly (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Sends a POST request to an LLM endpoint to get a direct response. It supports backend or direct LLM URLs, sending model configuration and messages. Returns the LLM's response as JSON. Requires 'fetch' API.

```javascript
async function askLLM(messages){
  const url = useBackend ? LLM_URL : DIRECT_LLM;
  const res=await fetch(url,{
    method:'POST',
    headers:{'Content-Type':'application/json'},
    body:JSON.stringify({model:'local',messages,max_tokens:2000,temperature:0.7})
  });
  return await res.json();
}
```

--------------------------------

### Handle Input and Slash Menu UI (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

This JavaScript code handles user input for a command-line-like interface. It checks if the input starts with a specific prefix, updates the input value, and toggles the visibility of a slash menu. It also includes functionality to close the slash menu when the Escape key is pressed or when the screen is clicked.

```javascript
c.startsWith(input.value));if(m.length===1){input.value=m[0][0]+' ';slashMenu.classList.remove('visible');}} if(e.key==='Escape')slashMenu.classList.remove('visible'); }); document.querySelector('.screen').addEventListener('click',()=>input.focus()); boot();
```

--------------------------------

### Displaying Welcome Banner (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Displays a welcome banner with project information, model details, available tools, and server status. This function is called during the initial boot sequence.

```javascript
function showBanner(){ addLine(''); addHTML(' <span style="color:#ffcc00">🍎 mac code</span>'); addLine(' open-source AI on your Mac','line-dim'); addLine(''); addHTML(' <span style="color:#227722">model</span> <span style="color:#ccddcc;font-weight:bold">Qwen3.5-9B</span> <span style="color:#227722">Q4_K_M · 32K ctx</span>'); addHTML(' <span style="color:#227722">tools</span> <span style="color:#ccddcc">search · fetch · exec · files</span>'); addHTML(' <span style="color:#227722">cost </span> <span style="color:#33ff33;font-weight:bold">$0.00/hr</span> <span style="color:#227722">Metal GPU · localhost:8000</span>'); addLine(''); addLine(' ──────────────────────────────────────','line-dim'); addLine(' type / for commands','line-dim'); if(useBackend){ addLine(' web search enabled via PicoClaw','line-dim'); } else { addLine(' no web search — run server.py to enable','line-dim'); } addLine(''); }
```

--------------------------------

### System Initialization (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Initializes the mac-code system by displaying boot messages, checking for backend agent availability, and showing the welcome banner. It includes timed delays for a smoother startup experience.

```javascript
async function boot(){ addLine(''); addLine(' Macintosh System 2026.1','line-dim'); await new Promise(r=>setTimeout(r,300)); addLine(' Loading mac code v1.0...','line-dim'); await new Promise(r=>setTimeout(r,250)); addLine(' Metal GPU: Apple M4 ✓','line-dim'); await new Promise(r=>setTimeout(r,200)); addLine(' Model: Qwen3.5-9B (5.28 GB) ✓','line-dim'); await new Promise(r=>setTimeout(r,200)); // Detect backend (quick health check, not a full agent call) try{ const res=await fetch(BACKEND_URL+'/',{method:'GET',signal:AbortSignal.timeout(2000)}); if(res.ok){useBackend=true;addLine(' Agent: PicoClaw ✓ (web search enabled)','line-dim');} else{addLine(' Agent: direct LLM only (run server.py for search)','line-dim');} }catch(e){addLine(' Agent: direct LLM only (run server.py for search)','line-dim');} await new Promise(r=>setTimeout(r,200)); addLine(' Server: localhost:8000 ✓','line-dim'); await new Promise(r=>setTimeout(r,400)); terminal.innerHTML=''; showBanner(); input.focus(); }
```

--------------------------------

### Ask Agent (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Sends a POST request to an agent endpoint with a user message and session identifier. Returns the agent's response as JSON. Requires 'fetch' API.

```javascript
async function askAgent(message, session='web'){
  const res = await fetch(AGENT_URL, {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({message, session}),
  });
  return await res.json();
}
```

--------------------------------

### Manage Persistent Context via MLX API

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

API endpoints for saving, loading, and downloading persistent context states to enable cross-device resume functionality.

```bash
# Save context
curl -X POST localhost:8000/v1/context/save -d '{"name":"my-project","prompt":"your codebase here"}'

# Load context
curl -X POST localhost:8000/v1/context/load -d '{"name":"my-project"}'

# Download from R2
curl -X POST localhost:8000/v1/context/download -d '{"name":"my-project"}'
```

--------------------------------

### mac-code Commands

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

A list of available commands for interacting with the mac-code agent, including mode switching, utility functions, and session management.

```APIDOC
## mac-code Commands

### Description
mac-code provides a set of slash commands (`/`) to control its behavior, switch modes, access information, and manage the session. Type `/` in the agent's interface to see a list of available commands.

### Available Commands

| Command | Action |
|---|---|
| `/agent` | Switches to the default agent mode, utilizing tools and LLM reasoning. |
| `/raw` | Enters raw mode for direct streaming of LLM output without tool usage. |
| `/model 9b` | Switches the active model to the 9B version (typically with 64K context). |
| `/model 35b` | Switches the active model to the 35B MoE version (requires `llama.cpp` backend). |
| `/search <q>` | Performs a quick web search for the provided query `<q>`. |
| `/bench` | Initiates a speed benchmark to measure agent performance. |
| `/stats` | Displays current session statistics, such as token usage and response times. |
| `/cost` | Shows estimated cost savings compared to using cloud-based AI services. |
| `/good` / `/bad` | Grades the last response, used for self-improvement logging and feedback. |
| `/improve` | Views statistics related to response grading and agent improvement. |
| `/clear` | Resets the current conversation history and context. |
| `/quit` | Exits the mac-code agent application.
```

--------------------------------

### MLX Backend - Persistent Context API

Source: https://github.com/walter-grace/mac-code/blob/main/README.md

This section details the API endpoints for managing persistent context with the MLX backend, allowing you to save, load, and sync conversation states.

```APIDOC
## MLX Backend: Persistent Context API

### Description
The MLX backend in mac-code offers persistent context capabilities, allowing you to save your current conversation state locally or sync it to Cloudflare R2 for cross-device resumption. This feature significantly speeds up resuming work compared to reprocessing context.

### API Endpoints

All context management endpoints are accessed via `localhost:8000`.

1.  **Save Context**
    *   **Method:** `POST`
    *   **Endpoint:** `/v1/context/save`
    *   **Description:** Saves the current conversation context to a named file locally.
    *   **Request Body:**
        ```json
        {
          "name": "string",
          "prompt": "string" 
        }
        ```
        *   `name` (string) - Required - The name to save the context under.
        *   `prompt` (string) - Optional - The current prompt or context to save.
    *   **Example Request:**
        ```json
        {
          "name": "my-project",
          "prompt": "your codebase here"
        }
        ```

2.  **Load Context**
    *   **Method:** `POST`
    *   **Endpoint:** `/v1/context/load`
    *   **Description:** Loads a previously saved conversation context by name. This is extremely fast (e.g., 0.0003s).
    *   **Request Body:**
        ```json
        {
          "name": "string"
        }
        ```
        *   `name` (string) - Required - The name of the context to load.
    *   **Example Request:**
        ```json
        {
          "name": "my-project"
        }
        ```

3.  **Download Context from R2**
    *   **Method:** `POST`
    *   **Endpoint:** `/v1/context/download`
    *   **Description:** Downloads a saved context from Cloudflare R2, enabling cross-device resumption. Takes approximately 1.5 seconds.
    *   **Request Body:**
        ```json
        {
          "name": "string"
        }
        ```
        *   `name` (string) - Required - The name of the context to download.
    *   **Example Request:**
        ```json
        {
          "name": "my-project"
        }
        ```

### Performance & Compression
*   **Save Time:** ~0.04s
*   **Load Time:** ~0.0003s
*   **R2 Upload/Download:** ~1.5s
*   **TurboQuant Compression:** Reduces storage size by 4x (e.g., 26.6 MB to 6.7 MB) with minimal quality loss (0.993 cosine similarity).
```

--------------------------------

### Stream LLM Responses (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Initiates a POST request to an LLM endpoint to stream responses. It handles both backend and direct LLM URLs, sending model configuration and messages. Returns a ReadableStream reader for processing the response chunk by chunk. Requires 'fetch' API.

```javascript
async function streamLLM(messages){
  const url = useBackend ? LLM_URL : DIRECT_LLM;
  const res=await fetch(url,{
    method:'POST',
    headers:{'Content-Type':'application/json'},
    body:JSON.stringify({model:'local',messages,max_tokens:2000,temperature:0.7,stream:true})
  });
  if(!res.ok)throw new Error(`Server ${res.status}`);
  return res.body.getReader();
}
```

--------------------------------

### Terminal Interaction and State Management (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

This JavaScript code initializes and manages the terminal interface. It sets up event listeners, defines constants for API endpoints and tool keywords, and implements functions for updating the clock, adding lines to the terminal, showing/hiding loading indicators, and displaying slash commands. It also manages chat and command history, session statistics, and model switching.

```javascript
const terminal = document.getElementById('terminal');
const input = document.getElementById('input');
const slashMenu = document.getElementById('slashMenu');
const clock = document.getElementById('clock');
const promptTag = document.getElementById('promptTag');

// Use backend server if available (enables web search), fallback to direct llama-server
const BACKEND_URL = 'http://localhost:8080';
const LLM_URL = BACKEND_URL + '/api/chat';
const AGENT_URL = BACKEND_URL + '/api/agent';
const DIRECT_LLM = 'http://localhost:8000/v1/chat/completions';

let useBackend = false; // detected on boot

const TOOL_KEYWORDS = [
  'search','find','look up','google','weather','news','latest','when do','when is','when does','who do','who is playing','who plays','who won','what happened','score','price','stock','schedule','what time','tonight','today','tomorrow','next game','play next','play','playing','explore','repo','repository','github','tell me more','more about','bitcoin','crypto','where is','recipe','results','standings','they play','next?'
];

function needsTools(msg) {
  return TOOL_KEYWORDS.some(k => msg.toLowerCase().includes(k));
}

// State
let chatHistory = [];
let cmdHistory = [];
let cmdIdx = -1;
let sessionTurns = 0;
let sessionTokens = 0;
let currentModel = '9b';
let autoRoute = true; // Toggle auto-routing

// Clock
function updateClock() {
  clock.textContent = new Date().toLocaleTimeString('en-US', {hour:'numeric',minute:'2-digit',hour12:true});
}
updateClock();
setInterval(updateClock, 60000);

// Commands
const CMDS = [
  ["/model","/stats","/tools","/bench","/cost","/clear","/auto","/quit"],
];

const SLASH = [
  ["/agent", "Agent mode (tools + web search)"],
  ["/raw", "Raw mode (direct streaming)"],
  ["/model", "Show/switch model — /model 9b or 35b"],
  ["/auto", "Toggle auto-routing"],
  ["/bench", "Speed benchmark"],
  ["/search", "Web search — /search <query>"],
  ["/stats", "Session stats"],
  ["/tools", "List tools"],
  ["/cost", "Cost savings"],
  ["/clear", "Clear screen"],
  ["/compact", "Toggle markdown"],
  ["/quit", "Exit"],
];

function addLine(t, c = '') {
  const d = document.createElement('div');
  d.className = `line ${c}`;
  d.textContent = t;
  terminal.appendChild(d);
  terminal.scrollTop = terminal.scrollHeight;
}

function addHTML(h) {
  const d = document.createElement('div');
  d.className = 'line';
  d.innerHTML = h;
  terminal.appendChild(d);
  terminal.scrollTop = terminal.scrollHeight;
}

function showLoading() {
  const d = document.createElement('div');
  d.className = 'line line-cyan';
  d.id = 'loading';
  terminal.appendChild(d);
  terminal.scrollTop = terminal.scrollHeight;
  let f = 0;
  const dots = ['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏'];
  const iv = setInterval(() => {
    f++;
    const el = document.getElementById('loading');
    if (!el) {
      clearInterval(iv);
      return;
    }
    const s = Math.floor(f * 0.15);
    el.innerHTML = ` ${dots[f % dots.length]} thinking ${s}s`;
  }, 150);
  return iv;
}

function hideLoading(iv) {
  clearInterval(iv);
  const el = document.getElementById('loading');
  if (el) el.remove();
}

function showSlash(filter = '/') {
  const m = filter === '/' ? SLASH : SLASH.filter(([c]) => c.startsWith(filter));
  if (!m.length) {
    slashMenu.classList.remove('visible');
    return;
  }
  slashMenu.innerHTML = m.map(([c, d]) => `<div class="slash-item" data-cmd="${c}"><span class="slash-cmd">${c}</span><span class="slash-desc">${d}</span></div>`).join('');
  slashMenu.classList.add('visible');
  slashMenu.querySelectorAll('.slash-item').forEach(el => el.onclick = () => {
    input.value = el.dataset.cmd + ' ';
    slashMenu.classList.remove('visible');
    input.focus();
  });
}

async function
```

--------------------------------

### Handle Terminal Input Commands (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Processes user input strings, interpreting them as commands for a terminal interface. It handles commands like '/', '/clear', '/stats', '/tools', '/bench', '/cost', '/auto', '/model', and '/quit'. Some commands interact with LLMs or display information.

```javascript
async function handleInput(text){
  const lo=text.toLowerCase().trim();
  if(lo==='/'){showSlash('/');return;}
  if(lo==='/clear'){terminal.innerHTML='';showBanner();return;}
  if(lo==='/stats'){ addLine(` turns ${sessionTurns}`,'line-cyan'); addLine(` tokens ${sessionTokens.toLocaleString()}`,'line-cyan'); addLine(` model ${currentModel==='9b'?'Qwen3.5-9B':'Qwen3.5-35B-A3B'}`,'line-cyan'); addLine('');return; }
  if(lo==='/tools'){ 
    [['web_search','DuckDuckGo'],['web_fetch','URLs'],['exec','shell'],['read_file','files'],['write_file','create']].forEach(([n,d])=>addHTML(` <span style="color:#00dddd">▸ ${n}</span> <span style="color:#227722">${d}</span>`));
    addLine('');return;
  }
  if(lo==='/bench'){ 
    addLine(' benchmarking...','line-dim'); 
    try{
      const d=await askLLM([{role:'user',content:'Count 1 to 30'}]);
      const t=d.timings||{};
      addHTML(` <span style="color:#33ff33;font-weight:bold">${(t.predicted_per_second||0).toFixed(1)} tok/s</span> gen`);
      addHTML(` <span style="color:#33ff33">${(t.prompt_per_second||0).toFixed(1)} tok/s</span> prompt`);
    }catch(e){addLine(` error: ${e.message}`,'line-error');}
    addLine('');return;
  }
  if(lo==='/cost'){addHTML(` <span style="color:#33ff33;font-weight:bold">$0.00</span> spent`);addLine(' ~$0.34/hr on cloud GPU','line-dim');addLine('');return;}
  if(lo==='/auto'){autoRoute=!autoRoute;addLine(` auto-routing ${autoRoute?'on':'off'}`,'line-dim');addLine('');return;}
  if(lo==='/model 9b'||lo==='/model 35b'){ 
    const target=lo.includes('9b')?'9b':'35b';
    if(!useBackend){addLine(' swap requires server.py backend','line-error');addLine('');return;}
    addLine(` swapping to ${target==='9b'?'Qwen3.5-9B':'Qwen3.5-35B-A3B'}...`,'line-dim');
    const iv=showLoading();
    try{
      const res=await fetch(BACKEND_URL+'/api/swap',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({model:target})});
      const d=await res.json();
      hideLoading(iv);
      if(d.ok){currentModel=target;addLine(` ${d.message}`,'line-cyan');}
      else{addLine(` ${d.message||d.error}`,'line-error');}
    }catch(e){hideLoading(iv);addLine(` swap failed: ${e.message}`,'line-error');}
    addLine('');return;
  }
  if(lo==='/model'){addLine(` ${currentModel==='9b'?'Qwen3.5-9B (32K ctx)':'Qwen3.5-35B-A3B (8K ctx)'}`,'line-white line-bold');addLine(` /model 9b or /model 35b to switch`,'line-dim');addLine('');return;}
  if(lo==='/quit'){addLine(`\n mac code ${sessionTurns} turns · ${sessionTokens} tokens`,'line-yellow');addLine('\n Goodbye!','line-dim');input.disabled=true;return;}
  
  slashMenu.classList.remove('visible');

  if(useBackend && currentModel==='9b'){ 
    const t0=performance.now();
    const logDiv = document.createElement('div');
    logDiv.className = 'line line-dim';
    logDiv.id = 'agent-log';
    logDiv.innerHTML = ' <span style="color:#00dddd">▸</span> <span style="color:#227722">starting agent...</span>';
    terminal.appendChild(logDiv);
    terminal.scrollTop = terminal.scrollHeight;
    
    let frame = 0;
    const dots = ['⠋','⠙','⠹','⠸','⠼','⠴','⠦','⠧','⠇','⠏']
    const spinDiv = document.createElement('div');
    spinDiv.className = 'line line-cyan';
    spinDiv.id = 'agent-spin';
    terminal.insertBefore(spinDiv, logDiv);
    const spinIv = setInterval(() => {
      frame++;
      const el = document.getElementById('agent-spin');
      if (!el) { clearInterval(spinIv); return; }
      const s = Math.floor(frame * 0.15);
      el.innerHTML = ` ${dots[frame % dots.length]} working ${s}s`;
    }, 150);

    try{
      const d = await askAgent(text);
      clearInterval(spinIv);
      const spin = document.getElementById('agent-spin');
      if (spin) spin.remove();
      const log = document.getElementById('agent-log');
      if (log) log.remove();
      const elapsed = ((performance.now()-t0)/1000).toFixed(1);
      const resp = d.response || d.error || 'no response';
      addLine('');
      resp.split('\n').forEach(line => addLine(' '+line,'line-white'));
      addLine('');
      const tokens = resp.split(/\s+/).length;
      const method = d.method === 'fast' ? 'search' :

```

--------------------------------

### Agent Response Handling and Fallback (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Handles responses from an agent, including displaying speed and timing information. If an error occurs, it falls back to direct LLM streaming and logs the error.

```javascript
function handleAgentResponse(method, elapsed, tokens, text) { const speed = d.speed ? ` · ${d.speed.toFixed(1)} tok/s` : ''; addHTML(` <span style="color:#00dddd">▸ ${method}</span> <span style="color:#227722">${elapsed}s${speed}</span>`); sessionTurns++; sessionTokens += tokens; }catch(e){ clearInterval(spinIv); const spin = document.getElementById('agent-spin'); if (spin) spin.remove(); const log = document.getElementById('agent-log'); if (log) log.remove(); addLine(`\n agent error: ${e.message}`,'line-error'); addLine(' falling back to direct LLM...','line-dim'); chatHistory.push({role:'user',content:text}); await streamDirect(); } addLine(''); return; }
```

--------------------------------

### Direct LLM Streaming (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Streams responses directly from a Large Language Model (LLM) when the backend agent is unavailable. It decodes streamed data, displays token count and elapsed time, and handles potential errors.

```javascript
async function streamDirect(){ const iv=showLoading(); const t0=performance.now(); let full='',tokens=0; try{ const reader=await streamLLM(chatHistory); hideLoading(iv); const div=document.createElement('div'); div.className='line line-white'; div.textContent=' '; terminal.appendChild(div); const dec=new TextDecoder(); let buf=''; while(true){ const{done,value}=await reader.read(); if(done)break; buf+=dec.decode(value,{stream:true}); while(buf.includes('\n')){ const i=buf.indexOf('\n'); const line=buf.slice(0,i).trim(); buf=buf.slice(i+1); if(!line.startsWith('data: '))continue; const data=line.slice(6); if(data==='[DONE]')break; try{ const c=JSON.parse(data).choices?.[0]?.delta?.content||''; if(c){full+=c;tokens++;div.textContent+=c;terminal.scrollTop=terminal.scrollHeight;} }catch(e){} } } const elapsed=((performance.now()-t0)/1000).toFixed(1); const speed=tokens>0?(tokens/parseFloat(elapsed)).toFixed(1):'0'; addLine(''); addHTML(` <span style="color:#33ff33">${speed} tok/s</span> <span style="color:#227722">· ${tokens} tokens · ${elapsed}s</span>`); chatHistory.push({role:'assistant',content:full}); sessionTurns++;sessionTokens+=tokens; }catch(e){ hideLoading(iv); addLine(`\n error: ${e.message}`,'line-error'); addLine(' is llama-server running on localhost:8000?','line-dim'); chatHistory.pop(); } addLine(''); } // end streamDirect await streamDirect(); }
```

--------------------------------

### macOS Terminal Styling (CSS)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

This CSS code styles various elements to create a macOS terminal appearance. It includes styles for the main Mac container, screen, terminal text, chin, model name, base, shadow, loading dots, and power LED. It also includes a media query for responsiveness on smaller screens.

```css
.mac-model { position: absolute; bottom: 10px; left: 50%; transform: translateX(-50%); font-size: 9px; letter-spacing: 2px; color: #b09868; text-transform: uppercase; text-shadow: 0 1px 0 rgba(255,255,255,0.3); }
.mac-base { background: linear-gradient(180deg, #d0bc98 0%, #c4b088 50%, #b8a478 100%); height: 14px; margin: 0 10px; border-radius: 0 0 8px 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.3), 0 8px 24px rgba(0,0,0,0.2); }
.mac-shadow { width: 380px; height: 20px; margin: 0 auto; background: radial-gradient(ellipse at center, rgba(0,0,0,0.35) 0%, transparent 70%); margin-top: -4px; }
.loading-dots span { animation: loadDot 1.4s infinite both; }
.loading-dots span:nth-child(2) { animation-delay: 0.2s; }
.loading-dots span:nth-child(3) { animation-delay: 0.4s; }
@keyframes loadDot { 0%, 80%, 100% { opacity: 0.2; } 40% { opacity: 1; } }
.power-led { position: absolute; left: 28px; top: 32px; width: 6px; height: 6px; background: #44bb44; border-radius: 50%; box-shadow: 0 0 4px rgba(68,187,68,0.6), 0 0 8px rgba(68,187,68,0.3); animation: ledPulse 3s ease-in-out infinite; }
@keyframes ledPulse { 0%, 100% { opacity: 0.8; } 50% { opacity: 1; } }
@media (max-width: 480px) {
  .mac { width: 95vw; }
  .screen { height: 280px; }
  .terminal { font-size: 12px; }
  .mac-chin { height: 60px; }
}
```

--------------------------------

### Input Event Handling (JavaScript)

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

Handles user input events for the command line interface. It manages command history, tab completion for slash commands, and triggers input processing on Enter key press.

```javascript
input.addEventListener('input',()=>{ if(input.value.startsWith('/'))showSlash(input.value); else slashMenu.classList.remove('visible'); }); input.addEventListener('keydown',async e=>{ if(e.key==='Enter'){ const t=input.value.trim();if(!t)return; addHTML(` <span style="color:#ffcc00">🍎</span> <span style="color:#ffcc00;font-weight:bold">></span> <span style="color:#33ff33">${t}</span>`); addLine(''); cmdHistory.unshift(t);cmdIdx=-1;input.value='';slashMenu.classList.remove('visible'); input.disabled=true; await handleInput(t); input.disabled=false;input.focus(); promptTag.textContent='🍎'; } if(e.key==='ArrowUp'&&cmdHistory.length){cmdIdx=Math.min(cmdIdx+1,cmdHistory.length-1);input.value=cmdHistory[cmdIdx];e.preventDefault();} if(e.key==='ArrowDown'){cmdIdx=Math.max(cmdIdx-1,-1);input.value=cmdIdx>=0?cmdHistory[cmdIdx]:'';e.preventDefault();} if(e.key==='Tab'&&input.value.startsWith('/')){e.preventDefault();const m=SLASH.filter(([c])=>)
```

--------------------------------

### Style Classic Macintosh UI Components

Source: https://github.com/walter-grace/mac-code/blob/main/web/index.html

This CSS defines the structural and visual elements of a retro Macintosh interface. It utilizes gradients, box-shadows, and pseudo-elements to simulate hardware features like the CRT screen curvature, scanlines, and the beige plastic case.

```css
* { margin:0; padding:0; box-sizing:border-box; }
body { background: linear-gradient(180deg, #2c2c3a 0%, #1a1a28 100%); min-height: 100vh; display: flex; align-items: center; justify-content: center; font-family: "VT323", "Monaco", "Courier New", monospace; }
.mac-container { perspective: 800px; }
.mac { width: 440px; transform: rotateX(1deg); position: relative; }
.mac-top { background: linear-gradient(180deg, #f2e6cc 0%, #e8d8b8 40%, #dcc8a0 100%); border-radius: 16px 16px 2px 2px; padding: 16px 18px 10px 18px; position: relative; }
.screen { background: radial-gradient(ellipse at center, #0a1a0a 0%, #050f05 60%, #020802 100%); border-radius: 6px; height: 340px; overflow: hidden; position: relative; }
.screen::after { content: ''; position: absolute; top: 0; left: 0; right: 0; bottom: 0; background: repeating-linear-gradient( 0deg, transparent, transparent 1px, rgba(0,0,0,0.06) 1px, rgba(0,0,0,0.06) 2px ); pointer-events: none; z-index: 10; border-radius: 6px; }
.terminal { flex: 1; padding: 8px 10px; overflow-y: auto; font-size: 14px; line-height: 1.4; color: #33ff33; }
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.