### Install and Setup OptiLLM Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/CLAUDE.md Commands for setting up the development environment and installing the package. ```bash # Development setup python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Package installation pip install optillm ``` -------------------------------- ### Install and Run OptiLLM Server Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/README.md Install OptiLLM using pip and start the server. Ensure your OpenAI API key is set as an environment variable. ```bash pip install optillm optillm ``` -------------------------------- ### Setup Development Environment Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/README.md Commands to clone the repository, create a virtual environment, and install dependencies. ```bash git clone https://github.com/algorithmicsuperintelligence/optillm.git cd optillm python -m venv .venv source .venv/bin/activate # or `.venv\Scripts\activate` on Windows pip install -r requirements.txt pip install -r tests/requirements.txt # Run tests python -m pytest tests/ ``` -------------------------------- ### Install OptiLLM and Verify Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/optillm/plugins/proxy/README.md Install the OptiLLM package using pip and verify the installation by checking the version. ```bash pip install optillm ``` ```bash optillm --version ``` -------------------------------- ### Start OptiLLM Server with Proxy Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/optillm/plugins/proxy/README.md Commands to start the OptiLLM server. Option A uses the proxy as the default approach for all requests. Option B starts the server normally, requiring specific configurations for proxy usage. A custom port can also be specified. ```bash optillm --approach proxy ``` ```bash optillm ``` ```bash optillm --approach proxy --port 8000 ``` -------------------------------- ### Run the OptiLLM Server Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/CLAUDE.md Commands to start the server with various configurations or via Docker. ```bash # Basic server (auto approach detection) python optillm.py # With specific approach python optillm.py --approach moa --model gpt-4o-mini # With external endpoint python optillm.py --base_url http://localhost:8080/v1 # Docker docker compose up -d ``` -------------------------------- ### Run OptiLLM Proxy Source: https://github.com/algorithmicsuperintelligence/optillm/wiki/Patchwork Start the OptiLLM proxy server. ```bash python optillm.py ``` -------------------------------- ### Production Example with Corporate CA Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/SSL_CONFIGURATION.md This example demonstrates using a corporate certificate bundle in a production environment. Ensure the path to the bundle is correct. ```bash # Use corporate certificate bundle python optillm.py --ssl-cert-path /etc/ssl/certs/corporate-ca-bundle.crt ``` -------------------------------- ### Start the OptiLLM Proxy Server Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/optillm/plugins/longcepo/README.md Launch the proxy server by specifying the base URL, port, and the directory containing the plugins. ```bash python optillm.py --base-url https://api.cerebras.ai/v1 --port --plugins-dir ./optillm/plugins ``` -------------------------------- ### Start OptiLLM Service Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/README.md Use Docker Compose to build and start the service in detached mode. ```bash docker compose up -d ``` -------------------------------- ### Install Test Dependencies Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/tests/README.md Install the required packages for running the test suite. ```bash pip install -r tests/requirements.txt ``` -------------------------------- ### View LiteLLM Integration Logs Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/README.md Example log output showing OptiLLM utilizing LiteLLM to interface with external providers like Gemini. ```bash 9:43:21 - LiteLLM:INFO: utils.py:2952 - LiteLLM completion() model= gemini-1.5-flash-002; provider = gemini 2024-09-29 19:43:21,011 - INFO - LiteLLM completion() model= gemini-1.5-flash-002; provider = gemini 2024-09-29 19:43:21,481 - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-002:generateContent?key=[redacted] "HTTP/1.1 200 OK" 19:43:21 - LiteLLM:INFO: utils.py:988 - Wrapper: Completed Call, calling success_handler 2024-09-29 19:43:21,483 - INFO - Wrapper: Completed Call, calling success_handler 19:43:21 - LiteLLM:INFO: utils.py:2952 - LiteLLM completion() model= gemini-1.5-flash-002; provider = gemini ``` -------------------------------- ### Create OptiLLM Proxy Configuration Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/optillm/plugins/proxy/README.md Example YAML configuration file for the OptiLLM proxy plugin. It defines providers, routing strategies, timeouts, and queue settings. Ensure sensitive information like API keys are handled securely, for example, using environment variables. ```yaml providers: - name: primary base_url: https://api.openai.com/v1 api_key: ${OPENAI_API_KEY} weight: 2 max_concurrent: 5 # Optional: limit this provider to 5 concurrent requests model_map: gpt-4: gpt-4-turbo-preview # Optional: map model names - name: backup base_url: https://api.openai.com/v1 api_key: ${OPENAI_API_KEY_BACKUP} weight: 1 max_concurrent: 2 # Optional: limit this provider to 2 concurrent requests outing: strategy: weighted # Options: weighted, round_robin, failover timeouts: request: 30 # Maximum seconds to wait for a provider response connect: 5 # Maximum seconds to wait for connection queue: max_concurrent: 100 # Maximum concurrent requests to prevent overload timeout: 60 # Maximum seconds a request can wait in queue ``` -------------------------------- ### View Proxy Logs Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/README.md Example output from the OptiLLM proxy logs indicating the optimization approach and base model in use. ```bash 2024-09-06 08:35:32,597 - INFO - Using approach moa, with gpt-4o-mini 2024-09-06 08:35:35,358 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" 2024-09-06 08:35:39,553 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" 2024-09-06 08:35:44,795 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" 2024-09-06 08:35:44,797 - INFO - 127.0.0.1 - - [06/Sep/2024 08:35:44] "POST /v1/chat/completions HTTP/1.1" 200 - ``` -------------------------------- ### Activate Proxy with Model Prefix Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/optillm/plugins/proxy/README.md Example using curl to send a request where the model name is prefixed with 'proxy-' to activate the proxy plugin. This method is used when the OptiLLM server is not started with the '--approach proxy' flag. ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "proxy-gpt-4", "messages": [{"role": "user", "content": "Hello"}] }' ``` -------------------------------- ### Install dependencies on NVIDIA Jetson Source: https://github.com/algorithmicsuperintelligence/optillm/wiki/Home Use these commands to install specific versions of z3-solver and spacy before running the main requirements installation on Jetson hardware. ```bash pip install 'z3-solver<4.12' pip install spacy --no-binary blis ``` -------------------------------- ### Install Patchwork CLI Source: https://github.com/algorithmicsuperintelligence/optillm/wiki/Patchwork Install the Patchwork CLI package with all dependencies. ```bash pip install 'patchwork-cli[all]' --upgrade ``` -------------------------------- ### Initialize OpenAI Client with OptiLLM Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/README.md Configure the OpenAI client to point to the local OptiLLM proxy server. ```python import os from openai import OpenAI OPENAI_KEY = os.environ.get("OPENAI_API_KEY") OPENAI_BASE_URL = "http://localhost:8000/v1" client = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL) response = client.chat.completions.create( model="moa-gpt-4o", messages=[ { "role": "user", "content": "Write a Python program to build an RL model to recite text from any position that the user provides, using only numpy." } ], temperature=0.2 ) print(response) ``` -------------------------------- ### Run Optillm with CePO Method Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/optillm/cepo/README.md Example command to run Optillm using the CePO method for Qwen3 deployed with VLLM. Ensure the OPENAI_API_KEY is set and the configuration file path is correct. ```bash OPENAI_API_KEY=serving-on-vllm \ python optillm.py \ --base-url http://localhost:8001/v1 \ --approach cepo \ --port 8000 \ --cepo_config_file ./optillm/cepo/cepo_configs/cepo_qwen3.yaml ``` -------------------------------- ### Development Example with Self-Signed Certificate Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/SSL_CONFIGURATION.md This example shows how to disable SSL verification for development when connecting to a local HTTPS endpoint with a self-signed certificate. ```bash # Disable SSL verification temporarily python optillm.py --no-ssl-verify --base-url https://localhost:8443/v1 ``` -------------------------------- ### Python SDK Integration with OptiLLM Proxy Source: https://github.com/algorithmicsuperintelligence/optillm/blob/main/optillm/plugins/proxy/README.md Initialize the OpenAI client with the proxy's base URL and a dummy API key. Requests are automatically handled by the proxy when started with `--approach proxy`. ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:8000/v1", api_key="dummy" # Can be any string when using proxy ) # Method 1: Server started with --approach proxy (recommended) # Just make normal requests - proxy handles everything! response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}] ) ``` ```python # Method 2: Use proxy with model prefix response = client.chat.completions.create( model="proxy-gpt-4", # Use "proxy-" prefix messages=[{"role": "user", "content": "Hello"}] ) ``` ```python # Method 3: Use extra_body response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}], extra_body={ "optillm_approach": "proxy" } ) ``` ```python # Method 4: Proxy wrapping another approach response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}], extra_body={ "optillm_approach": "proxy", "proxy_wrap": "moa" } ) ```