### Command-Line Interface - Starting the Proxy Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Instructions and examples for launching the proxy server using the command-line interface with various configuration options. ```APIDOC ## Command-Line Interface - Starting the Proxy ### Description Launch the proxy with comprehensive configuration options for backends, authentication, and features using the `python -m src.core.cli` command. ### Usage `python -m src.core.cli [OPTIONS]` ### Options - `--default-backend `: Specifies the default backend to use (e.g., `openai`, `gemini`). - `--host `: The host address to bind the server to (e.g., `0.0.0.0`). - `--port `: The port number to listen on (e.g., `8000`). - `--force-model `: Forces all requests to a specific model. - `--disable-auth`: Disables authentication. - `--enable-planning-phase`: Enables the planning phase for complex requests. - `--planning-phase-strong-model `: Specifies the model for the planning phase. - `--planning-phase-max-turns `: Sets the maximum number of turns for the planning phase. - `--planning-phase-temperature `: Sets the temperature for the planning phase model. - `--planning-phase-reasoning-effort `: Sets the reasoning effort level for the planning phase (e.g., `high`). - `--model-alias "="`: Defines a pattern for rewriting model names. - `--enable-edit-precision`: Enables edit-precision tuning. - `--edit-precision-temperature `: Sets the temperature for edit-precision tuning. - `--edit-precision-min-top-p `: Sets the minimum top-p value for edit-precision tuning. - `--edit-precision-override-top-p`: Overrides the top-p value for edit-precision tuning. - `--static-route :`: Bypasses backend selection and routes directly to a specific model on a backend. - `--config `: Path to a configuration file. - `--log `: Path to a log file. - `--capture-file `: Path to a file for capturing network traffic. - `--log-level `: Sets the logging level (e.g., `DEBUG`). - `--trusted-ip `: Specifies trusted IP addresses or CIDR ranges. - `--enable-brute-force-protection`: Enables protection against brute-force attacks. - `--auth-max-failed-attempts `: Sets the maximum number of failed authentication attempts. ### Examples **Basic startup with OpenAI backend:** ```bash python -m src.core.cli --default-backend openai ``` **Startup with custom host/port and Gemini backend:** ```bash python -m src.core.cli \ --default-backend gemini-cli-oauth-personal \ --host 0.0.0.0 \ --port 8000 ``` **Force all requests to a specific model and disable authentication:** ```bash python -m src.core.cli \ --default-backend gemini-cli-oauth-personal \ --force-model gemini-2.5-pro \ --disable-auth ``` **Enable wire capture for debugging:** ```bash python -m src.core.cli \ --default-backend openai \ --capture-file logs/wire_capture.log \ --log-level DEBUG ``` **Configure planning phase:** ```bash python -m src.core.cli \ --default-backend openai \ --enable-planning-phase \ --planning-phase-strong-model openai:gpt-4o \ --planning-phase-max-turns 8 \ --planning-phase-temperature 0.2 \ --planning-phase-reasoning-effort high ``` **Model name rewrites for routing:** ```bash python -m src.core.cli \ --default-backend openrouter \ --model-alias '^gpt-(.*)=openrouter:openai/gpt-\1' \ --model-alias '^claude-(.*)=anthropic:claude-\1' ``` **Edit-precision tuning configuration:** ```bash python -m src.core.cli \ --enable-edit-precision \ --edit-precision-temperature 0.08 \ --edit-precision-min-top-p 0.25 \ --edit-precision-override-top-p ``` **Static route configuration:** ```bash python -m src.core.cli \ --static-route gemini-cli-oauth-personal:gemini-2.5-pro ``` **Complete production setup:** ```bash python -m src.core.cli \ --config config/production.yaml \ --default-backend openai \ --host 0.0.0.0 \ --port 8000 \ --log logs/proxy.log \ --capture-file logs/wire.log \ --trusted-ip 192.168.1.0/24 \ --enable-brute-force-protection \ --auth-max-failed-attempts 5 ``` ``` -------------------------------- ### Installing and Logging in Gemini CLI (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Commands to install the Gemini CLI globally and log in to a Google account. This is a prerequisite for using Gemini-related backends with the proxy. ```bash # Install gemini-cli (one-time) npm install -g @google/gemini-cli gemini login ``` -------------------------------- ### Start Proxy with OpenAI Backend Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command starts the proxy service using OpenAI as the default backend. Ensure the OPENAI_API_KEY environment variable is set. ```bash python -m src.core.cli --default-backend openai ``` -------------------------------- ### Example Usage of Planning Phase CLI Flags Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This is a full command-line example demonstrating how to run the llm-proxy with the planning phase enabled. It specifies the default backend, enables the planning phase, sets the strong model to openai:gpt-4o, and configures maximum turns, file writes, temperature, top_p, reasoning effort, and thinking budget. ```bash python -m src.core.cli \ --default-backend openai \ --enable-planning-phase \ --planning-phase-strong-model openai:gpt-4o \ --planning-phase-max-turns 8 \ --planning-phase-max-file-writes 1 \ --planning-phase-temperature 0.2 \ --planning-phase-top-p 0.9 \ --planning-phase-reasoning-effort high \ --planning-phase-thinking-budget 8000 ``` -------------------------------- ### Setup Gemini CLI Agent with ACP (Node.js/Python) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Installs the Gemini CLI globally, logs in, and sets a workspace directory for project-aware agent configurations. This enables the Gemini CLI to operate within a specific project context. ```bash npm install -g @google/gemini-cli gemini login export GEMINI_CLI_WORKSPACE="/path/to/project" python -m src.core.cli --default-backend gemini-cli-acp ``` -------------------------------- ### Start Proxy with Gemini CLI Agent Control Protocol (ACP) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command enables the proxy to run with the Gemini CLI using the Agent Control Protocol. It involves installing and logging into the Gemini CLI, optionally setting a workspace, and then starting the proxy. ```bash # Install and authenticate with Google Gemini CLI (one-time) npm install -g @google/gemini-cli gemini login # Set project directory (optional - defaults to current directory) export GEMINI_CLI_WORKSPACE="/path/to/your/project" # Start the proxy using gemini-cli as an agent python -m src.core.cli --default-backend gemini-cli-acp # Change project directory during conversation with slash command !/project-dir(/path/to/another/project) ``` -------------------------------- ### Example Conversation with Backend/Model Switching (Bash) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Demonstrates a conversational flow where users switch between AI backends and models using in-chat commands, including a one-off request example. This illustrates dynamic runtime control. ```bash # Example conversation with switching User: !/backend(openai) Assistant: [Switched to openai backend] User: !/model(gpt-4) Assistant: [Using model: gpt-4] User: What is 2+2? Assistant: 4 User: !/oneoff(gemini-cli-oauth-personal:gemini-2.5-pro) User: Explain quantum physics Assistant: [Uses Gemini 2.5 Pro for this request only] User: What is 3+3? Assistant: [Back to gpt-4] 6 ``` -------------------------------- ### Start Proxy with Custom Host/Port and Gemini Backend Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This command starts the proxy with a specified backend ('gemini-cli-oauth-personal') and custom host and port settings ('0.0.0.0' and '8000'). This is useful for network accessibility and specific service binding. ```bash python -m src.core.cli \ --default-backend gemini-cli-oauth-personal \ --host 0.0.0.0 \ --port 8000 ``` -------------------------------- ### Main Configuration File Example (YAML) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Provides a sample of the main configuration file for the LLM Interactive Proxy. This file typically defines all features, backends, and global settings for the application. ```yaml ... ``` -------------------------------- ### Example Project Directory Change (Bash) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Demonstrates changing the project directory using the in-chat command, showing the assistant's confirmation and an example of an agent interaction within the specified project context. ```bash # Example with Gemini CLI ACP User: !/project-dir(/home/user/webapp) Assistant: [Project directory changed to /home/user/webapp] User: Show me all Python files in this project Assistant: [gemini-cli agent lists files from /home/user/webapp] ``` -------------------------------- ### Command-Line Interface - Backend Configuration Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Instructions and examples for configuring different LLM backends using environment variables and command-line arguments. ```APIDOC ## Command-Line Interface - Backend Configuration ### Description Configure multiple backends with API keys, OAuth, and provider-specific settings using environment variables and the CLI. ### Configuration Methods Backend configurations are typically set using environment variables for sensitive information like API keys, and then specified via the `--default-backend` or other CLI arguments. ### Examples **OpenAI with API key:** 1. Set the environment variable: ```bash export OPENAI_API_KEY="sk-..." ``` 2. Launch the proxy with OpenAI as the default backend: ```bash python -m src.core.cli --default-backend openai ``` **Anthropic with API key:** 1. Set the environment variable: ```bash export ANTHROPIC_API_KEY="sk-ant-..." ``` 2. Launch the proxy with Anthropic as the default backend: ```bash python -m src.core.cli --default-backend anthropic ``` **Gemini with API key (metered):** 1. Set the environment variable: ```bash export GEMINI_API_KEY="AIza..." ``` 2. Launch the proxy with Gemini as the default backend: ```bash python -m src.core.cli --default-backend gemini ``` ``` -------------------------------- ### Starting the Proxy with a Configuration File (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Command to run the LLM Interactive Proxy using a specified configuration file. This is the standard method for launching the proxy with custom settings. ```bash python -m src.core.cli --config config.yaml ``` -------------------------------- ### Start Proxy with Gemini CLI Cloud Project Backend (GCP) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command starts the proxy using the Gemini CLI with a GCP-billed cloud project backend. It requires setting the GOOGLE_CLOUD_PROJECT environment variable and authenticating with Google Cloud. ```bash export GOOGLE_CLOUD_PROJECT="your-project-id" # Provide Application Default Credentials via one of the following: # Option A: User credentials (interactive) gcloud auth application-default login # Option B: Service account file export GOOGLE_APPLICATION_CREDENTIALS="/absolute/path/to/service-account.json" python -m src.core.cli --default-backend gemini-cli-cloud-project ``` -------------------------------- ### Example Reasoning Mode Usage (Bash) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Illustrates how to use the in-chat commands to adjust reasoning levels for AI models, showing examples for maximum reasoning and disabling reasoning for faster responses. ```bash # Example usage User: !/max User: Solve this complex math problem: ... Assistant: [Uses high reasoning effort] User: !/no-think User: What's 2+2? Assistant: [Fast response without reasoning] 4 ``` -------------------------------- ### Starting Proxy with Gemini CLI ACP Backend (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Command to launch the LLM Interactive Proxy with the `gemini-cli-acp` backend, enabling it to function as a Gemini CLI agent. ```bash python -m src.core.cli --default-backend gemini-cli-acp ``` -------------------------------- ### Install Development Dependencies Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Installs the project's development dependencies using pip, including optional dependencies denoted by '[dev]'. ```bash pip install -e .[dev] ``` -------------------------------- ### Install Project with Development Extras (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/docs/testing.md Installs the project in editable mode along with the 'dev' optional dependencies, which include necessary pytest plugins. This command ensures that all testing tools are available. ```bash python -m pip install -e .[dev] ``` -------------------------------- ### Install Development Dependencies and Run Tests (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command installs the necessary development dependencies, including pytest plugins for async and parallel execution, and then runs the project's test suite. It assumes you are executing these commands within the project's virtual environment. ```bash python -m pip install -e .[dev] python -m pytest ``` -------------------------------- ### Start Proxy with Gemini CLI OAuth Personal Backend Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command initiates the proxy service using the Gemini CLI with personal OAuth authentication. Ensure you have the necessary Gemini CLI and OAuth configurations in place. ```bash python -m src.core.cli --default-backend gemini-cli-oauth-personal ``` -------------------------------- ### Install Pre-Commit Hooks (Windows) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Installs the necessary pre-commit hooks for the repository on a Windows environment, typically within a virtual environment. This ensures that code quality and security checks are performed before each commit. ```shell ./.venv/Scripts/python.exe scripts/install-hooks.py ``` -------------------------------- ### Complete Production Setup Command Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This comprehensive command sets up the proxy for production use. It loads configuration from a file, specifies the default backend, host, port, log file locations, enables brute-force protection with a defined number of failed attempts, and allows trusted IP addresses. ```bash python -m src.core.cli \ --config config/production.yaml \ --default-backend openai \ --host 0.0.0.0 \ --port 8000 \ --log logs/proxy.log \ --capture-file logs/wire.log \ --trusted-ip 192.168.1.0/24 \ --enable-brute-force-protection \ --auth-max-failed-attempts 5 ``` -------------------------------- ### Process Wire Capture Files with jq Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Provides command-line examples using `jq` to query and analyze wire capture log files. These examples demonstrate filtering by direction, extracting specific data like user messages, identifying errors, and calculating token usage. ```bash # Count requests by backend jq -r 'select(.direction=="outbound_request") | .backend' logs/wire_capture.log | sort | uniq -c # Extract all user messages jq -r 'select(.direction=="outbound_request") | .payload.messages[]? | select(.role=="user") | .content' logs/wire_capture.log # Find failed requests (look for error responses) jq 'select(.direction=="inbound_response" and (.payload.error or .payload.choices == null))' logs/wire_capture.log # Calculate token usage by model jq -r 'select(.direction=="inbound_response" and .payload.usage) | "\(.model) \(.payload.usage.total_tokens // (.payload.usage.prompt_tokens + .payload.usage.completion_tokens))"' logs/wire_capture.log ``` -------------------------------- ### Authenticate with Google Gemini CLI (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command is used for the 'gemini-cli-oauth-personal' backend, which utilizes free-tier personal OAuth. It assumes the Google Gemini CLI is installed and guides the user through a one-time authentication process. ```bash # Install and authenticate with the Google Gemini CLI (one-time): gemini auth ``` -------------------------------- ### Minimal Proxy Configuration (YAML) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md A basic YAML configuration file for the LLM Interactive Proxy, setting up a default OpenAI backend and proxy host/port. This serves as a starting point for proxy deployment. ```yaml # config.yaml backends: openai: type: openai default_backend: openai proxy: host: 0.0.0.0 port: 8000 auth: # Set LLM_INTERACTIVE_PROXY_API_KEY env var to enable disable_auth: false ``` -------------------------------- ### Configure OpenAI Backend with API Key Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This command starts the proxy with the OpenAI backend enabled. An environment variable `OPENAI_API_KEY` must be set to provide the necessary API authentication credentials. This is a standard way to configure the proxy for OpenAI services. ```bash export OPENAI_API_KEY="sk-..." python -m src.core.cli --default-backend openai ``` -------------------------------- ### Configure Edit-Precision Tuning via Environment Variables Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This example demonstrates configuring the edit-precision tuning feature using environment variables. It covers enabling/disabling the feature and setting parameters like temperature and top_p. ```shell EDIT_PRECISION_ENABLED=true EDIT_PRECISION_TEMPERATURE=0.1 EDIT_PRECISION_MIN_TOP_P=0.3 EDIT_PRECISION_OVERRIDE_TOP_P=false EDIT_PRECISION_EXCLUDE_AGENTS_REGEX="" ``` -------------------------------- ### Bash: Ensure Starting from 'dev' Branch Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/dev/slash-commands/git-review-merge-dev.md Checks out the 'dev' branch and verifies that the operation successfully reset to 'dev'. It then performs a fast-forward pull to ensure the local 'dev' branch is up-to-date with the remote 'origin/dev'. This is crucial for maintaining a consistent starting point. ```bash git checkout dev 2>/dev/null || git switch dev git rev-parse --abbrev-ref HEAD | grep -qx "dev" || { echo "Failed to reset to dev; aborting."; exit 1; } git pull --ff-only origin dev || { echo "Unable to fast-forward local dev"; exit 1; } ``` -------------------------------- ### Export Provider API Keys Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command exports environment variables for various AI service provider API keys. This step is necessary before starting the proxy if you plan to use these backends. ```bash export OPENAI_API_KEY=... export ANTHROPIC_API_KEY=... export GEMINI_API_KEY=... export OPENROUTER_API_KEY=... export ZAI_API_KEY=... # GCP-based Gemini back-end export GOOGLE_CLOUD_PROJECT=your-project-id ``` -------------------------------- ### Environment-Specific Routing with Environment Variables Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This example illustrates environment-specific model routing using environment variables. It shows how to set the `MODEL_ALIASES` variable differently for development (using free models) and production (using premium models), enabling flexible deployment strategies. ```bash # Development environment - use free models export MODEL_ALIASES='[ {"pattern": "^.*$", "replacement": "gemini-cli-oauth-personal:gemini-1.5-flash"} ]' # Production environment - use premium models export MODEL_ALIASES='[ {"pattern": "^gpt-(.*)", "replacement": "openai:gpt-\1"}, {"pattern": "^claude-(.*)", "replacement": "anthropic:claude-\1"} ]' ``` -------------------------------- ### Run Pytest Suite (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/docs/testing.md Executes the project's test suite using pytest. This command should be run after installing the project with its development dependencies. ```bash python -m pytest ``` -------------------------------- ### Start Proxy with Wire Capture Enabled Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This command initiates the proxy with wire capture enabled, logging all network traffic to a specified file ('logs/wire_capture.log'). It also sets the log level to DEBUG for detailed output, useful for debugging network-related issues. ```bash python -m src.core.cli \ --default-backend openai \ --capture-file logs/wire_capture.log \ --log-level DEBUG ``` -------------------------------- ### Gemini Streaming Generate Content Request Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This example demonstrates how to initiate a streaming content generation request with the Gemini API. It uses a different endpoint for streaming and includes a user prompt. The response is delivered in chunks, each prefixed with 'data:'. ```bash # Streaming Gemini request curl -X POST http://localhost:8000/v1beta/models/gemini-2.5-pro:streamGenerateContent \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-proxy-key" \ -d '{ "contents": [{ "parts": [{"text": "Count to 3"}], "role": "user" }] }' ``` -------------------------------- ### Pytest Output Compression Example Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Demonstrates the transformation of verbose pytest output to a more concise format after compression. The compressed output removes timing details and 'PASSED' test results, focusing on retaining 'FAILED' tests and associated error messages. ```text # Before compression (verbose): test_example.py::test_function PASSED [ 50%] 0.001s setup 0.002s call 0.001s teardown test_example.py::test_failure FAILED [100%] 0.001s setup 0.003s call 0.001s teardown # After compression (concise): test_example.py::test_failure FAILED [100%] ``` -------------------------------- ### Start Proxy to Force a Specific Model Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This command launches the proxy, forcing all requests to use a specific model ('gemini-2.5-pro') regardless of the client's request. Authentication is also disabled for this configuration. This is useful for testing or enforcing a particular model's behavior. ```bash python -m src.core.cli \ --default-backend gemini-cli-oauth-personal \ --force-model gemini-2.5-pro \ --disable-auth ``` -------------------------------- ### Test Stage Integration with ValidatedTestStage Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/src/core/testing/README.md Provides an example of integrating the testing framework into test stages by inheriting from `ValidatedTestStage`. It reiterates the use of safe mock creation and registration methods. ```python from src.core.testing.base_stage import ValidatedTestStage class MyMockStage(ValidatedTestStage): # Inherit from ValidatedTestStage instead of InitializationStage # Use create_safe_*_mock() methods # Use safe_register_instance() method # Automatic validation happens in execute() ``` -------------------------------- ### Buffered JSON Lines Wire Capture Format Example Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Illustrates the structure of a single log entry in the current Buffered JSON Lines format for wire capture. This format is optimized for performance and provides structured data for each HTTP request or response. ```json { "timestamp_iso": "2025-01-10T15:58:41.039145+00:00", "timestamp_unix": 1736524721.039145, "direction": "outbound_request", "source": "127.0.0.1(Cline/1.0)", "destination": "qwen-oauth", "session_id": "session-123", "backend": "qwen-oauth", "model": "qwen3-coder-plus", "key_name": "primary", "content_type": "json", "content_length": 1247, "payload": { "messages": [{"role": "user", "content": "..."}], "model": "qwen3-coder-plus", "temperature": 0.7 }, "metadata": { "client_host": "127.0.0.1", "user_agent": "Cline/1.0", "request_id": "req_abc123" } } ``` -------------------------------- ### Legacy Human-Readable Wire Capture Format Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Provides an example of the legacy human-readable format for wire capture logs. This format is less structured and intended primarily for quick visual inspection of individual requests and responses. ```text ----- REQUEST 2025-01-10T15:58:41Z ----- client=127.0.0.1 agent=Cline/1.0 session=session-123 -> backend=qwen-oauth model=qwen3-coder-plus { "messages": [...], "model": "qwen3-coder-plus" } ----- REPLY 2025-01-10T15:58:42Z ----- client=127.0.0.1 agent=Cline/1.0 session=session-123 -> backend=qwen-oauth model=qwen3-coder-plus { "choices": [...] } ``` -------------------------------- ### Start Proxy with Strict Command Detection (CLI) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This command enables strict command detection mode via the command-line interface. In this mode, commands are only processed if they appear on the last non-blank line of a message. ```bash python -m src.core.cli --strict-command-detection ``` -------------------------------- ### ApplyDiff Handler Example Logic in Python Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Showcases the functionality of the built-in `ApplyDiffHandler`. It monitors for `apply_diff` tool calls and steers the LLM to prefer `patch_file` instead, citing its superiority and QA features. This handler implements per-session rate limiting and allows for a configurable steering message. ```python # The handler automatically steers LLMs from: # tool_call: apply_diff(...) # To a custom response: # "You tried to use apply_diff tool. Please prefer to use patch_file tool instead, # as it is superior to apply_diff and provides automated Python QA checks." ``` -------------------------------- ### Python Migrating Test Class Mock Creation Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/src/core/testing/README.md Guides users on migrating existing tests by adding a GuardedMockCreationMixin to test classes and replacing direct AsyncMock creation with the framework's safe mock creation methods, such as 'create_async_mock'. ```python # Before mock = AsyncMock(spec=IService) # After mock = self.create_async_mock(spec=IService) ``` -------------------------------- ### Configure Planning Phase with CLI Flags Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This list presents command-line interface flags for enabling and configuring the planning phase. It allows specifying the strong model, maximum turns and file writes, and fine-tuning parameters like temperature, top_p, reasoning effort, and thinking budget directly from the command line. ```bash --enable-planning-phase --planning-phase-strong-model BACKEND:MODEL --planning-phase-max-turns N --planning-phase-max-file-writes N --planning-phase-temperature FLOAT --planning-phase-top-p FLOAT --planning-phase-reasoning-effort EFFORT --planning-phase-thinking-budget TOKENS ``` -------------------------------- ### Clone LLM Interactive Proxy Repository Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Clones the LLM Interactive Proxy project repository and navigates into the project directory. This is the first step in setting up the development environment. ```bash git clone https://github.com/matdev83/llm-interactive-proxy.git cd llm-interactive-proxy ``` -------------------------------- ### Configure Planning Phase with Environment Variables Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This list shows environment variable settings for configuring the planning phase. Options include enabling the phase, specifying the strong model backend and name, setting limits for turns and file writes, and adjusting parameters like temperature, top_p, reasoning effort, and thinking budget. ```bash PLANNING_PHASE_ENABLED=true|false PLANNING_PHASE_STRONG_MODEL=backend:model (e.g., openai:gpt-4o) PLANNING_PHASE_MAX_TURNS=10 PLANNING_PHASE_MAX_FILE_WRITES=1 PLANNING_PHASE_TEMPERATURE=0.2 PLANNING_PHASE_TOP_P=0.9 PLANNING_PHASE_REASONING_EFFORT=high PLANNING_PHASE_THINKING_BUDGET=8000 ``` -------------------------------- ### Configure Planning Phase with YAML Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This YAML configuration snippet shows how to enable the planning phase, specify a strong model (openai:gpt-4o), set the maximum number of planning turns to 10, and the maximum number of file writes to 1. It also includes overrides for temperature, top_p, reasoning_effort, and thinking_budget. ```yaml session: planning_phase: enabled: true strong_model: "openai:gpt-4o" max_turns: 10 max_file_writes: 1 overrides: temperature: 0.2 top_p: 0.9 reasoning_effort: "high" thinking_budget: 8000 ``` -------------------------------- ### Check Ruff Linting Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/dev/slash-commands/git-review-merge-dev.md Bash script to check if the 'ruff' linter is installed and then run it to check for linting errors. If 'ruff' is not found, it prints a message and skips the check. ```bash if "$PYTHON_CMD" -m ruff --version >/dev/null 2>&1; then "$PYTHON_CMD" -m ruff check . else echo "ruff not installed; skipping" fi ``` -------------------------------- ### Check Black Formatting Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/dev/slash-commands/git-review-merge-dev.md Bash script to check if the 'black' code formatter is installed and then run it in check mode. If 'black' is not found, it prints a message and skips the check. ```bash if "$PYTHON_CMD" -m black --version >/dev/null 2>&1; then "$PYTHON_CMD" -m black --check . else echo "black not installed; skipping" fi ``` -------------------------------- ### Configure Gemini GCP Project Billing (Python) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Sets up the Google Cloud Project for Gemini CLI usage by configuring the billing project and logging in with application default credentials. This is necessary for using Gemini with GCP resources. ```bash export GOOGLE_CLOUD_PROJECT="your-project-id" gcloud auth application-default login python -m src.core.cli --default-backend gemini-cli-cloud-project ``` -------------------------------- ### Configure ZAI Coding Plan Backend (Python) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Sets the ZAI API key for specialized use with the ZAI Coding Plan, optimized for agent-based coding tasks. This backend leverages Zhipu AI's coding-specific capabilities. ```bash export ZAI_API_KEY="..." python -m src.core.cli --default-backend zai-coding-plan ``` -------------------------------- ### Configure Anthropic Backend with API Key Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This command initializes the proxy to use the Anthropic backend. It requires the `ANTHROPIC_API_KEY` environment variable to be set with your Anthropic API key for authentication. ```bash export ANTHROPIC_API_KEY="sk-ant-..." python -m src.core.cli --default-backend anthropic ``` -------------------------------- ### Configure Gemini API Key and Default Backend (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This snippet shows how to set the GEMINI_API_KEY environment variable and then run the CLI with 'gemini' as the default backend. This is useful for production environments or high-volume usage requiring a metered API key. ```bash export GEMINI_API_KEY="AIza..." python -m src.core.cli --default-backend gemini ``` -------------------------------- ### Configure Proxy for Planning Phase Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This command enables the planning phase for the proxy, specifying a strong model ('openai:gpt-4o') for planning, setting the maximum number of turns to 8, and configuring the temperature and reasoning effort. This is for advanced use cases involving multi-step reasoning. ```bash python -m src.core.cli \ --default-backend openai \ --enable-planning-phase \ --planning-phase-strong-model openai:gpt-4o \ --planning-phase-max-turns 8 \ --planning-phase-temperature 0.2 \ --planning-phase-reasoning-effort high ``` -------------------------------- ### Lint and Format Code with Ruff, Black, and Mypy Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Runs linting and formatting tools on the project's source code. Ruff is used for linting, Black for formatting, and Mypy for static type checking. ```bash # Run ruff python -m ruff check src # Run black python -m black src # Run mypy python -m mypy src ``` -------------------------------- ### Bash: Verify Clean Working Tree Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/dev/slash-commands/git-review-merge-dev.md Checks if the Git working tree has any uncommitted changes. If modifications are found, it prints an error message and exits, ensuring operations start from a clean state. This is a precondition for safe Git operations. ```bash test -z "$(git status --porcelain)" || { echo "Uncommitted changes found. Stash or commit first."; exit 1; } ``` -------------------------------- ### Run LLM Interactive Proxy Application Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Executes the LLM Interactive Proxy application using the Python module system. Supports running with default settings, a custom configuration file, or different backend providers. ```bash # Run with default settings python -m src.core.cli # Run with custom configuration python -m src.core.cli --config path/to/config.yaml # Run with different backends python -m src.core.cli --default-backend openrouter python -m src.core.cli --default-backend gemini python -m src.core.cli --default-backend gemini-cli-oauth-personal python -m src.core.cli --default-backend anthropic ``` -------------------------------- ### YAML Configuration for Context Window Limits Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Defines how to configure context window, max input tokens, and max output tokens for models within the proxy's backend-specific or model default settings. This allows for fine-grained control over request limits. ```yaml # Backend-specific configuration (e.g., config/backends/custom/backend.yaml) models: "your-model-name": limits: context_window: 262144 # Total context window size (tokens) max_input_tokens: 200000 # Input token limit (tokens) max_output_tokens: 62144 # Output token limit (tokens) requests_per_minute: 60 # Rate limits tokens_per_minute: 1000000 # Or in main config via model_defaults model_defaults: "your-model-name": limits: context_window: 128000 # 128K context window max_input_tokens: 100000 # 100K input limit ``` -------------------------------- ### Configure ZAI Backend (Python) Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt Sets the ZAI API key as an environment variable to enable the proxy to use Zhipu AI models. This is for general access to ZAI services. ```bash export ZAI_API_KEY="..." python -m src.core.cli --default-backend zai ``` -------------------------------- ### Configure Model Name Rewrites for Routing Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This command sets up model name aliases to route requests to different providers based on the requested model name. For example, 'gpt-(.*)' is rewritten to 'openrouter:openai/gpt-1', and 'claude-(.*)' to 'anthropic:claude-1'. This allows flexible backend routing. ```bash python -m src.core.cli \ --default-backend openrouter \ --model-alias '^gpt-(.*)=openrouter:openai/gpt-\1' \ --model-alias '^claude-(.*)=anthropic:claude-\1' ``` -------------------------------- ### Wire Capture Configuration Options Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Details various YAML configuration parameters for tuning wire capture behavior, including buffer size, flush intervals, maximum entries per flush, and file rotation settings. ```yaml logging: capture_file: "logs/wire_capture.log" # Performance tuning capture_buffer_size: 65536 # 64KB buffer (default) capture_flush_interval: 1.0 # Flush every 1 second capture_max_entries_per_flush: 100 # Max entries per flush # Rotation capture_max_bytes: 104857600 # 100MB per file capture_max_files: 5 # Keep 5 rotated files capture_total_max_bytes: 524288000 # 500MB total cap ``` -------------------------------- ### Structured JSON Response with OpenAI API Source: https://context7.com/matdev83/llm-interactive-proxy/llms.txt This example demonstrates how to make a POST request to the OpenAI API endpoint for structured JSON responses. It specifies the desired model, conversation messages, and a JSON schema for the output. The response is expected to be a validated JSON object matching the provided schema. ```shell curl -X POST http://localhost:8000/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-proxy-key" \ -d '{ "model": "gpt-4", "messages": [ {"role": "user", "content": "Extract person info: John Doe, age 30, works at Acme Corp"} ], "response_format": { "type": "json_schema", "json_schema": { "name": "person_info", "strict": true, "schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"}, "employer": {"type": "string"} }, "required": ["name", "age", "employer"], "additionalProperties": false } } } }' ``` -------------------------------- ### Force Model and Context Window with CLI Arguments Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This snippet demonstrates how to use command-line arguments to force a specific model and set a fixed context window size for all requests. It's useful for testing, enforcing model usage, and controlling token costs. ```bash python -m src.core.cli \ --default-backend gemini-cli-oauth-personal \ --force-model gemini-2.5-pro \ --disable-auth \ --port 8000 ``` ```bash python -m src.core.cli \ --default-backend openai \ --force-context-window 8000 \ --disable-auth \ --port 8000 ``` -------------------------------- ### Python Migrating Test Stage Service Registration Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/src/core/testing/README.md Illustrates how to migrate service registration within test stages. It demonstrates moving the service registration logic from the 'execute' method to the '_register_services' method and using safe registration patterns provided by the framework. ```python # Before async def execute(self, services, config): mock = AsyncMock(spec=IService) services.add_instance(IService, mock) # After async def _register_services(self, services, config): mock = self.create_safe_session_service_mock() self.safe_register_instance(services, IService, mock) ``` -------------------------------- ### YAML Configuration for Model Name Rewrites Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/dev/features/model-name-rewrites/PRD.md Example configuration snippet for the `model_aliases` feature in `config.yaml`. This demonstrates how to define rules for rewriting model names using regular expressions and capture groups. The rules are processed sequentially, with the first match determining the rewrite. ```yaml model_aliases: # Statically replace a specific model - pattern: "^claude-3-sonnet-20240229$" replacement: "gemini-cli-oauth-personal:gemini-1.5-flash" # Dynamically replace any GPT model, keeping the version - pattern: "^gpt-(.*)" replacement: "openrouter:openai/gpt-\1" # Catch-all for any other model - pattern: ".*" replacement: "gemini-cli-oauth-personal:gemini-1.5-pro" ``` -------------------------------- ### Setting Environment Variables for Claude Integration (Bash) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Environment variables to set up the proxy to work with Claude models. This includes specifying the Anthropic API URL and an API key. ```bash export ANTHROPIC_API_URL=http://localhost:8001 export ANTHROPIC_API_KEY= ``` -------------------------------- ### Create and Activate Python Virtual Environment Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Creates a Python virtual environment named '.venv' and then activates it. This isolates project dependencies. ```bash python -m venv .venv # Windows: # .\.venv\Scripts\activate # Unix: source .venv/bin/activate ``` -------------------------------- ### Fix DI Violation: Manual Service Instantiation Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/CONTRIBUTING.md Demonstrates the incorrect way of manually instantiating a service ('CommandProcessor') within a method and the correct approach using dependency injection. The corrected version injects the service via the constructor, adhering to SOLID principles. ```python def handle_request(self, request): processor = CommandProcessor(self.config) # VIOLATION! return processor.process(request) ``` ```python def __init__(self, command_processor: ICommandProcessor): self.command_processor = command_processor def handle_request(self, request): return self.command_processor.process(request) # CORRECT ``` -------------------------------- ### Configure Custom Model Backends with Limits (YAML) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md This YAML configuration defines custom backend models with specific limits for context window, input tokens, and requests per minute. It helps manage costs, ensure agent compatibility, and tune performance by setting appropriate thresholds for different models. ```yaml backend_type: "custom" models: "large-context-model": limits: context_window: 262144 # 256K total context window max_input_tokens: 200000 # 200K input limit (leaves room for response) requests_per_minute: 30 # Conservative rate limits "small-fast-model": limits: context_window: 8192 # 8K context window max_input_tokens: 6000 # 6K input limit requests_per_minute: 120 # Higher rate for smaller model ``` -------------------------------- ### Use EnforcedMockFactory for Mock Creation Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/src/core/testing/README.md Shows the recommended way to create mocks using EnforcedMockFactory. These mocks are guaranteed to be properly configured, helping to prevent potential coroutine warnings. ```python from src.core.testing.interfaces import EnforcedMockFactory # These are guaranteed to be properly configured session_service = EnforcedMockFactory.create_session_service_mock() backend_service = EnforcedMockFactory.create_backend_service_mock() ``` -------------------------------- ### Generate Review Summary File Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/dev/slash-commands/git-review-merge-dev.md Bash script to create a temporary file for documenting review notes. It uses a heredoc to pre-fill the file with standard review checklist items and prompts the user to fill in the details. ```bash REVIEW_SUMMARY_FILE=$(mktemp) cat <<'EOF' >"$REVIEW_SUMMARY_FILE" ## Review Notes - Scope alignment (PR description vs diff): - Tests run & coverage decision: - Residual risks / follow-ups: - Checklist verdict: EOF echo "Edit $REVIEW_SUMMARY_FILE and replace each token with your notes before continuing." ``` -------------------------------- ### Configure Planning Phase Strong Model Overrides (Environment Variables) Source: https://github.com/matdev83/llm-interactive-proxy/blob/dev/README.md Environment variables for configuring the planning phase feature. These allow setting the enablement status, the strong model to use, maximum turns and file writes, and specific model parameters like temperature and top_p. These settings are overridden by CLI flags. ```bash export PLANNING_PHASE_ENABLED=true export PLANNING_PHASE_STRONG_MODEL=backend:model export PLANNING_PHASE_MAX_TURNS=10 export PLANNING_PHASE_MAX_FILE_WRITES=1 export PLANNING_PHASE_TEMPERATURE=0.2 export PLANNING_PHASE_TOP_P=0.9 export PLANNING_PHASE_REASONING_EFFORT=high export PLANNING_PHASE_THINKING_BUDGET=8000 ```