### List Models API

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

Retrieve a list of available models.

```APIDOC
## GET /v1/models

### Description
This endpoint retrieves a list of all available models that can be used with the API.

### Method
GET

### Endpoint
/v1/models

### Parameters
None

### Request Example
(No request body or specific parameters needed for this GET request)

### Response
#### Success Response (200)
- **data** (array) - A list of model objects.
  - **id** (string) - The ID of the model.
  - **object** (string) - The type of object, usually 'model'.
  - **created** (integer) - Unix timestamp of when the model was created.
  - **owned_by** (string) - The entity that owns the model.

#### Response Example
```json
{
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "created": 1677652288,
      "owned_by": "openai"
    },
    {
      "id": "text-embedding-3-small",
      "object": "model",
      "created": 1677652288,
      "owned_by": "openai"
    }
  ],
  "object": "list"
}
```
```

--------------------------------

### Prompt Optimization Example (Python)

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/payg.md

Demonstrates how to optimize prompts for cost reduction in Python. The example contrasts a long, verbose prompt with a concise one, highlighting the potential savings by reducing input token usage.

```python
# Expensive: Long, verbose prompt
prompt = """
I would like you to help me with the following task.
Please read the text below carefully and provide
a detailed summary of the main points...
"""

# Cheaper: Concise prompt
prompt = "Summarize the key points:"
```

--------------------------------

### Generating Responses with System Instructions (Python)

Source: https://github.com/stima-tech/docs/blob/main/docs/references/responses.md

This Python example shows how to provide system-level instructions to the model when generating a response. The `instructions` parameter guides the AI's behavior, ensuring it acts as a helpful coding assistant and includes code examples.

```python
response = client.responses.create(
    model="gpt-4o",
    instructions="You are a helpful coding assistant. Always provide code examples.",
    input="How do I read a file in Python?"
)
```

--------------------------------

### Image Generation API

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

Generate images from text prompts.

```APIDOC
## POST /v1/images/generations

### Description
This endpoint generates images based on a provided text description (prompt).

### Method
POST

### Endpoint
/v1/images/generations

### Parameters
#### Request Body
- **prompt** (string) - Required - A description of the desired image.
- **n** (integer) - Optional - The number of images to generate (default: 1).
- **size** (string) - Optional - The size of the generated images (e.g., '1024x1024', '512x512', '256x256').

### Request Example
```json
{
  "prompt": "A photorealistic cat wearing a tiny hat",
  "n": 1,
  "size": "1024x1024"
}
```

### Response
#### Success Response (200)
- **data** (array) - A list of image objects.
  - **url** (string) - The URL of the generated image.
- **created** (integer) - Unix timestamp of when the response was created.

#### Response Example
```json
{
  "data": [
    {
      "url": "https://api.apertis.ai/v1/images/generations/image1.png"
    }
  ],
  "created": 1677652288
}
```
```

--------------------------------

### Optimize Prompt Token Usage (Python)

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/quota-management.md

Illustrates how to reduce token consumption by creating more concise prompts. The example shows an inefficient, verbose prompt compared to an efficient, direct prompt for a summarization task.

```python
# Inefficient (high token usage)
prompt = """
I would like you to please help me with the following task.
I need you to summarize the following text for me.
Please make sure the summary is comprehensive and detailed.
Here is the text:
{long_text}
"""

# Efficient (lower token usage)
prompt = f"Summarize:\n{long_text}"
```

--------------------------------

### Concise System Message Example (Python)

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/quota-management.md

Demonstrates the effective use of concise system messages in conversational AI. It contrasts a brief, focused system message with an overly verbose one, highlighting the importance of brevity for managing token usage.

```python
# Good - concise system message
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "..."}
]

# Avoid - overly detailed system message
messages = [
    {"role": "system", "content": "You are an extremely helpful assistant..."},  # 500+ tokens
    {"role": "user", "content": "..."}
]
```

--------------------------------

### Text to Speech API

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

Convert text into spoken audio.

```APIDOC
## POST /v1/audio/speech

### Description
This endpoint converts input text into spoken audio in various languages and voices.

### Method
POST

### Endpoint
/v1/audio/speech

### Parameters
#### Request Body
- **model** (string) - Required - The audio model to use (e.g., 'tts-1').
- **input** (string) - Required - The text to synthesize into speech.
- **voice** (string) - Optional - The voice to use for the synthesis (e.g., 'alloy', 'echo', 'fable').
- **response_format** (string) - Optional - The format of the audio output (e.g., 'mp3', 'opus', 'aac').

### Request Example
```json
{
  "model": "tts-1",
  "input": "Hello world, this is a test.",
  "voice": "alloy"
}
```

### Response
#### Success Response (200)
- **audio_content** (binary) - The generated audio content in the specified format.

#### Response Example
(Binary audio data would be returned here, not a JSON object)
```

--------------------------------

### Make API Call with Node.js

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Node.js snippet shows how to make a chat completion API call using the OpenAI SDK. First, install the SDK using 'npm install openai'. Configure the client with your API key and the Apertis API base URL. The code sends a user message and logs the assistant's reply.

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.apertis.ai/v1'
});

async function main() {
  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'Hello! What can you do?' }
    ]
  });

  console.log(response.choices[0].message.content);
}

main();
```

--------------------------------

### Speech to Text API

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

Transcribe spoken audio into text.

```APIDOC
## POST /v1/audio/transcriptions

### Description
This endpoint transcribes spoken audio into written text.

### Method
POST

### Endpoint
/v1/audio/transcriptions

### Parameters
#### Request Body
- **file** (file) - Required - The audio file to transcribe.
- **model** (string) - Required - The audio model to use (e.g., 'whisper-1').
- **language** (string) - Optional - The language of the input audio (ISO-639-1 format).

### Request Example
(This endpoint typically uses multipart/form-data for file uploads)
```
--boundary
Content-Disposition: form-data; name="file"; filename="audio.mp3"
Content-Type: audio/mpeg

<binary audio data>
--boundary
Content-Disposition: form-data; name="model"

whisper-1
--boundary--
```

### Response
#### Success Response (200)
- **text** (string) - The transcribed text.

#### Response Example
```json
{
  "text": "Hello world, this is a transcription test."
}
```
```

--------------------------------

### Image Analysis with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Python example demonstrates image analysis using the Apertis API. The 'messages' payload includes both text and an image URL, allowing the model to interpret the content of the image and respond to questions about it.

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
            ]
        }
    ]
)
```

--------------------------------

### Make API Call with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Python snippet demonstrates how to make a chat completion API call using the OpenAI SDK. Ensure you have the 'openai' package installed. You need to provide your API key and the base URL for the Apertis API. The code sends a user message and prints the assistant's response.

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.apertis.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Hello! What can you do?"}
    ]
)

print(response.choices[0].message.content)
```

--------------------------------

### Chat Completions API

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

Engage in conversational interactions using various models.

```APIDOC
## POST /v1/chat/completions

### Description
This endpoint provides chat-based interactions, allowing you to send messages and receive responses from AI models.

### Method
POST

### Endpoint
/v1/chat/completions

### Parameters
#### Request Body
- **model** (string) - Required - The ID of the model to use for chat completion.
- **messages** (array) - Required - A list of message objects representing the conversation history.
  - **role** (string) - Required - The role of the author of the message ('system', 'user', or 'assistant').
  - **content** (string) - Required - The content of the message.
- **temperature** (number) - Optional - Controls randomness. Lower values make output more deterministic.
- **max_tokens** (integer) - Optional - The maximum number of tokens to generate in the response.

### Request Example
```json
{
  "model": "gpt-4o",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}
```

### Response
#### Success Response (200)
- **choices** (array) - A list of completion choices.
  - **message** (object)
    - **role** (string) - The role of the author ('assistant').
    - **content** (string) - The generated message content.
  - **finish_reason** (string) - The reason the model stopped generating tokens.
- **created** (integer) - Unix timestamp of when the response was created.
- **model** (string) - The model used for the completion.
- **usage** (object) - Usage statistics.
  - **prompt_tokens** (integer)
  - **completion_tokens** (integer)
  - **total_tokens** (integer)

#### Response Example
```json
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Hello there! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1677652288,
  "model": "gpt-4o",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}
```
```

--------------------------------

### Implementing Exponential Backoff (Python)

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/rate-limits.md

Python example demonstrating an exponential backoff strategy with jitter for handling rate limit errors.

```APIDOC
### Retry Strategy

Implement exponential backoff for rate limit errors:

```python
import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.apertis.ai/v1"
)

def make_request_with_retry(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except RateLimitError:
            if attempt == max_retries - 1:
                raise

            # Exponential backoff with jitter
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Waiting {wait_time:.2f}s...")
            time.sleep(wait_time)
```
```

--------------------------------

### Best Practice: Request Queuing

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/rate-limits.md

Example of implementing a request queuing system to manage API call rates effectively.

```APIDOC
### 2. Implement Request Queuing

Use a queue to manage request rate:

```python
import asyncio
from collections import deque

class RateLimitedQueue:
    def __init__(self, max_requests_per_minute=50):
        self.max_rpm = max_requests_per_minute
        self.queue = deque()
        self.request_times = deque()

    async def add_request(self, request_func):
        # Wait if at rate limit
        while len(self.request_times) >= self.max_rpm:
            oldest = self.request_times[0]
            wait_time = 60 - (time.time() - oldest)
            if wait_time > 0:
                await asyncio.sleep(wait_time)
            self.request_times.popleft()

        # Execute request
        self.request_times.append(time.time())
        return await request_func()
```
```

--------------------------------

### Multi-turn Conversations with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Python example illustrates how to manage multi-turn conversations using the Apertis API. A list of messages, including system, user, and assistant roles, is passed to the API to maintain context across multiple interactions.

```python
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What's the population?"}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)
```

--------------------------------

### Implementing Exponential Backoff (Node.js)

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/rate-limits.md

Node.js example demonstrating an exponential backoff strategy with jitter for handling rate limit errors.

```APIDOC
### Node.js Example

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.apertis.ai/v1'
});

async function makeRequestWithRetry(messages, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model: 'gpt-4o',
        messages
      });
    } catch (error) {
      if (error.status !== 429 || attempt === maxRetries - 1) {
        throw error;
      }

      // Exponential backoff with jitter
      const waitTime = Math.pow(2, attempt) + Math.random();
      console.log(`Rate limited. Waiting ${waitTime.toFixed(2)}s...`);
      await new Promise(r => setTimeout(r, waitTime * 1000));
    }
  }
}
```
```

--------------------------------

### Implement Response Caching with Hashing (Python)

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/quota-management.md

This Python example shows how to implement a caching mechanism for API responses using `functools.lru_cache` and `hashlib`. It generates a hash of the prompt to use as a cache key, reducing redundant API calls for identical inputs.

```python
import hashlib
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_cached_response(prompt_hash, model):
    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

# Create hash for caching
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
response = get_cached_response(prompt_hash, "gpt-4o")
```

--------------------------------

### Adapt System Prompts from Anthropic to Apertis Format

Source: https://github.com/stima-tech/docs/blob/main/docs/help/migration-guides.md

This example shows the difference in how system prompts are handled when migrating from Anthropic's API to Apertis, which uses the OpenAI format. Anthropic uses a dedicated `system` parameter, while Apertis incorporates the system prompt as a message with the role `system`.

```python
# Anthropic format:
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Apertis format:
response = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
```

--------------------------------

### Install LangChain and Apertis Integration

Source: https://github.com/stima-tech/docs/blob/main/i18n/en/docusaurus-plugin-content-docs/current/installation/langchain.md

Installs the necessary LangChain libraries and the OpenAI integration package, which is often used for API-based LLMs like Apertis.

```bash
pip install langchain
pip install langchain-openai
```

--------------------------------

### Migrate LangChain ChatOpenAI to Apertis

Source: https://github.com/stima-tech/docs/blob/main/docs/help/migration-guides.md

This example shows how to update LangChain configurations to use Apertis instead of a standard OpenAI model. The `ChatOpenAI` class parameters `openai_api_key` and `openai_api_base` are modified to point to Apertis.

```python
from langchain_openai import ChatOpenAI

# Before
llm = ChatOpenAI(model="gpt-4")

# After
llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-apertis-key",
    openai_api_base="https://api.apertis.ai/v1"
)
```

--------------------------------

### Migrate from Google AI Studio to Apertis Python Client

Source: https://github.com/stima-tech/docs/blob/main/docs/help/migration-guides.md

This Python example illustrates migrating from Google AI Studio's SDK to the Apertis client. Key changes include using the OpenAI client, providing an API key and base URL, and adjusting how the response text is accessed.

```python
import google.generativeai as genai

genai.configure(api_key="google-api-key")
model = genai.GenerativeModel("gemini-1.5-pro")

response = model.generate_content("Hello!")
text = response.text
```

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-apertis-key",
    base_url="https://api.apertis.ai/v1"
)

response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
text = response.choices[0].message.content
```

--------------------------------

### Initialize OpenAI Client with Environment Variable

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Python code initializes the OpenAI client using an API key stored in an environment variable. It ensures that the API key is not hardcoded, enhancing security. The `base_url` is set to the Apertis AI API endpoint.

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("APERTIS_API_KEY"),
    base_url="https://api.apertis.ai/v1"
)
```

--------------------------------

### Authentication Header Example (Bash)

Source: https://github.com/stima-tech/docs/blob/main/docs/help/error-codes.md

Example of how to include the API key in the Authorization header for API requests. This is essential for authenticating requests.

```bash
-H "Authorization: Bearer sk-your-api-key"
```

--------------------------------

### Install and Launch Crush Terminal

Source: https://context7.com/stima-tech/docs/llms.txt

Provides installation commands for the Crush terminal AI agent using both Homebrew (macOS) and NPM (Cross-Platform). After installation, the 'crush' command launches the terminal interface.

```bash
# Homebrew (macOS)
brew install charmbracelet/tap/crush

# NPM (Cross-Platform)
npm install -g @charmland/crush

# Launch
crush
```

--------------------------------

### Test Apertis Integration with Python SDK

Source: https://github.com/stima-tech/docs/blob/main/docs/help/migration-guides.md

This Python code snippet demonstrates how to initialize and test an integration with the Apertis API using the OpenAI SDK. It retrieves API credentials from environment variables and makes a sample chat completion request. This verifies the setup and basic functionality.

```python
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("APERTIS_API_KEY"),
    base_url=os.environ.get("APERTIS_BASE_URL")
)

# Test with a simple request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
```

--------------------------------

### Install Crush using NPM (Cross-Platform)

Source: https://github.com/stima-tech/docs/blob/main/docs/installation/crush.md

Installs the Crush AI coding agent globally across different operating systems using NPM. This method requires Node.js and npm to be installed.

```bash
npm install -g @charmland/crush
```

--------------------------------

### Try Different Models with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Python code demonstrates how to use the Apertis API client to make requests to different AI models like GPT-4o, Claude 3.5 Sonnet, and Gemini Pro. You simply change the 'model' parameter in the `client.chat.completions.create` call.

```python
# OpenAI GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Anthropic Claude 3.5 Sonnet
response = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Google Gemini Pro
response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
```

--------------------------------

### Install Crush using Yay (Arch Linux)

Source: https://github.com/stima-tech/docs/blob/main/docs/installation/crush.md

Installs the Crush AI coding agent on Arch Linux using the `yay` AUR helper. This command fetches and installs the package from the Arch User Repository.

```bash
yay -S crush-bin
```

--------------------------------

### Python: Use Subscription Token for API Requests

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/subscription-plans.md

Demonstrates how to initialize the OpenAI client with a dedicated subscription token and make an API call. This token is linked to your subscription and its quota automatically syncs and resets with your billing cycle. It is managed separately from regular API tokens.

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-sub-your-subscription-key",
    base_url="https://api.apertis.ai/v1"
)

# Quota is tracked against your subscription
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

--------------------------------

### Configure Continue Dev with LLM Models and Custom Commands (JSON)

Source: https://github.com/stima-tech/docs/blob/main/i18n/en/docusaurus-plugin-content-docs/current/installation/continue.md

This JSON configuration file sets up Continue Dev with various Large Language Models (LLMs) including Claude, GPT, and Gemini. It specifies API endpoints, model identifiers, and provider details. Additionally, it defines custom commands, such as 'test' for generating unit tests, and configures tab autocomplete and telemetry settings. Users need to replace placeholder API keys with their actual keys.

```json
{
  "models": [
    {
      "model": "claude-3-5-sonnet-20241022",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "Claude 3.5",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    },
    {
      "model": "claude-3-5-haiku-20241022",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "Claude 3.5",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    },
    {
      "model": "claude-3-5-sonnet-20240620",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "Claude 3.5",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    },
    {
      "model": "gpt-4o",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "GPT-4o",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    },
    {
      "model": "gpt-4-turbo",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "GPT-4-Turbo",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    },
    {
      "model": "gpt-3.5-turbo",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "GPT-3.5-Turbo",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    },
    {
      "model": "gemini-1.5-pro-latest",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "gemini-1.5-pro-latest",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    },
    {
      "model": "gemini-1.5-flash-latest",
      "apiBase": "https://api.apertis.ai/v1",
      "title": "gemini-1.5-flash-latest",
      "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
      "provider": "openai",
      "description": "Explain in details"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "allowAnonymousTelemetry": true,
  "embeddingsProvider": {
    "provider": "free-trial"
  },
  "tabAutocompleteModel": {
    "model": "gpt-4o",
    "apiBase": "https://api.apertis.ai/v1",
    "title": "GPT-4o",
    "apiKey": "sk-xxxxxxxxxxxxxxxxxxxxx",
    "provider": "openai"
  },
  "tabAutocompleteOptions": {
    "useCopyBuffer": false,
    "maxPromptTokens": 400,
    "prefixPercentage": 0.5
  },
  "reranker": {
    "name": "free-trial"
  }
}

```

--------------------------------

### Python Example for Apertis API Client

Source: https://github.com/stima-tech/docs/blob/main/docs/authentication/api-keys.md

Shows how to initialize and use the OpenAI Python client to interact with the Apertis API. This example requires the 'openai' library and uses your API key and base URL for configuration. The response from the chat completion is printed to the console.

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.apertis.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
```

--------------------------------

### Install OpenCode CLI with Apertis API Support

Source: https://context7.com/stima-tech/docs/llms.txt

Set up the OpenCode CLI tool for terminal-based AI coding assistance using Apertis API models. Installation can be done via a curl script or by using npm.

```bash
# Installation
curl -fsSL https://opencode.ai/install | bash
# or
npm install -g opencode-ai
```

--------------------------------

### Text Embeddings API

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

Generate text embeddings using the specified model.

```APIDOC
## POST /v1/embeddings

### Description
This endpoint generates vector representations (embeddings) for a given piece of text using a specified model.

### Method
POST

### Endpoint
/v1/embeddings

### Parameters
#### Request Body
- **model** (string) - Required - The ID of the model to use for embedding.
- **input** (string or array of strings) - Required - The input text(s) to embed.

### Request Example
```json
{
  "model": "text-embedding-3-small",
  "input": "Hello, world!"
}
```

### Response
#### Success Response (200)
- **data** (array) - A list of embedding objects.
  - **embedding** (array of floats) - The generated embedding vector.
  - **index** (integer) - The index of the input text in the request.
- **model** (string) - The model used for embedding.
- **object** (string) - The type of object returned, usually 'list'.
- **usage** (object) - Usage statistics for the request.
  - **prompt_tokens** (integer) - The number of tokens in the prompt.
  - **total_tokens** (integer) - The total number of tokens processed.

#### Response Example
```json
{
  "data": [
    {
      "embedding": [
        0.0023123,
        -0.0045678,
        // ... other dimensions
      ],
      "index": 0
    }
  ],
  "model": "text-embedding-3-small",
  "object": "list",
  "usage": {
    "prompt_tokens": 1,
    "total_tokens": 1
  }
}
```
```

--------------------------------

### Node.js Example for Apertis API Client

Source: https://github.com/stima-tech/docs/blob/main/docs/authentication/api-keys.md

Provides a Node.js example using the 'openai' package to communicate with the Apertis API. It demonstrates setting up the client with your API key and base URL, making a chat completion request, and logging the response.

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.apertis.ai/v1'
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(response.choices[0].message.content);
```

--------------------------------

### Handling HTTP 429 Rate Limit Errors

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/rate-limits.md

Explains how the API responds when rate limits are exceeded and provides an example error JSON.

```APIDOC
## Handling Rate Limits

### HTTP 429 Response

When you exceed the rate limit, the API returns:

```json
{
  "error": {
    "message": "Rate limit exceeded. Please wait before making more requests.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
```
```

--------------------------------

### Import OpenAI SDK in Node.js (ES Modules vs CommonJS)

Source: https://github.com/stima-tech/docs/blob/main/docs/help/troubleshooting.md

Provides examples of how to import the OpenAI SDK in Node.js, demonstrating both the ES Modules syntax (`import`) and the CommonJS syntax (`require`). This is important for ensuring correct module resolution in different Node.js project configurations.

```javascript
// ES Modules
import OpenAI from 'openai';

// CommonJS
const OpenAI = require('openai');
```

--------------------------------

### List Available Models Request (HTTP)

Source: https://github.com/stima-tech/docs/blob/main/docs/help/error-codes.md

An example HTTP GET request to list available models. This can be used to resolve 'model_not_found' errors.

```http
GET /v1/models
```

--------------------------------

### Set Environment Variable for API Key

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This snippet shows how to set the Apertis API key as an environment variable using bash. This is the recommended practice for production environments to avoid hardcoding sensitive credentials.

```bash
# Set environment variable
export APERTIS_API_KEY="sk-your-api-key"
```

--------------------------------

### Chat Completions API Endpoint

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This endpoint allows you to interact with various AI models to generate text completions for chat-based applications. You can specify the model, messages, and other parameters to tailor the AI's response.

```APIDOC
## POST /v1/chat/completions

### Description
Generates chat completions for a given set of messages using a specified AI model.

### Method
POST

### Endpoint
`/v1/chat/completions`

### Parameters
#### Request Body
- **model** (string) - Required - The AI model to use for completion (e.g., `gpt-4o`, `claude-3-5-sonnet-20241022`).
- **messages** (array) - Required - A list of message objects representing the conversation history. Each object should have `role` (e.g., `system`, `user`, `assistant`) and `content`.
- **stream** (boolean) - Optional - If set to `true`, the response will be streamed in chunks.

### Request Example
```json
{
  "model": "gpt-4o",
  "messages": [
    {"role": "user", "content": "Hello! What can you do?"}
  ]
}
```

### Response
#### Success Response (200)
- **id** (string) - Unique identifier for this completion.
- **object** (string) - The type of object, typically `chat.completion`.
- **created** (integer) - Unix timestamp of when the completion was created.
- **model** (string) - The model used for the completion.
- **choices** (array) - A list of completion choices.
  - **index** (integer) - The index of the choice.
  - **message** (object) - The message from the assistant.
    - **role** (string) - The role of the message sender, typically `assistant`.
    - **content** (string) - The AI's response content.
  - **finish_reason** (string) - The reason the model stopped generating tokens (e.g., `stop`, `length`).
- **usage** (object) - Information about token usage.
  - **prompt_tokens** (integer) - Number of tokens in the prompt.
  - **completion_tokens** (integer) - Number of tokens in the completion.
  - **total_tokens** (integer) - Total tokens used.

#### Response Example
```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1703894400,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm an AI assistant. I can help you with..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 45,
    "total_tokens": 57
  }
}
```
```

--------------------------------

### Python Example: Interact with Apertis LLM using LangChain

Source: https://github.com/stima-tech/docs/blob/main/i18n/en/docusaurus-plugin-content-docs/current/installation/langchain.md

Demonstrates how to configure and use LangChain with Apertis's API for LLM interactions. It includes functions for single responses and streaming output, along with examples of different prompt types and model configurations.

```python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

CONFIG = {
    "api_key": "APERTIS_API_KEY",
    "base_url": "https://api.apertis.ai/v1",
    "model": "gpt-4o-mini",
    "temperature": 0.7,
    "request_timeout": 30,
}

def get_llm(**kwargs):
    config = CONFIG.copy()
    config.update(kwargs)
    return ChatOpenAI(**config)

def ask(message, **kwargs):
    llm = get_llm(**kwargs)
    
    try:
        response = llm.invoke(message)
        return response.content
    except:
        response = ""
        for chunk in llm.stream(message):
            response += chunk.content
        return response

def ask_stream(message, **kwargs):
    llm = get_llm(**kwargs)
    for chunk in llm.stream(message):
        print(chunk.content, end="", flush=True)
    print()

if __name__ == "__main__":
    response = ask("Hi, introduce yourself")
    print(f"Response: {response}\n")
    
    messages = [
        SystemMessage(content="You are Python expert"),
        HumanMessage(content="What is LangChain?")
    ]
    response = ask(messages)
    print(f"Expert Response: {response}\n")
    
    creative_response = ask("Write a poem", temperature=0.9)
    print(f"Response: {creative_response}\n")
    
    print("Streaming: ", end="")
    ask_stream("Explain AI")
    
    # Switch Model
    fast_response = ask("1+1=?", model="grok-4-fast")
    print(f"\nResponse: {fast_response}")

```

--------------------------------

### Stream Long Responses with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/billing/rate-limits.md

Utilize streaming to enhance the perceived performance of long responses from the API. This Python example demonstrates how to set `stream=True` and iterate through response chunks, printing them incrementally without waiting for the entire response.

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a long story"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
```

--------------------------------

### Use LlamaIndex with Apertis API for LLM Completions

Source: https://github.com/stima-tech/docs/blob/main/i18n/en/docusaurus-plugin-content-docs/current/installation/llamaindex.md

Demonstrates how to initialize and use the OpenAI LLM interface from LlamaIndex to interact with the Apertis API. This requires an API key from Apertis and specifies the model and API base URL. The output is the completion generated by the language model.

```python
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", # Please replace with your API Key
    api_base="https://api.apertis.ai/v1")
ret=llm.complete("Donald Trump is ")
print(ret)
```

--------------------------------

### Generate Text Embeddings with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This snippet demonstrates how to generate text embeddings using the Apertis AI API client in Python. It requires the `openai` library and an API key. The output includes the embedding vector and its dimension.

```python
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello, world!"
)

embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
```

--------------------------------

### Handle API Errors with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Python snippet demonstrates robust error handling when interacting with the Apertis AI API. It specifically catches `RateLimitError` and general `APIError`, providing informative messages to the user. It requires the `openai` library.

```python
from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.apertis.ai/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

except RateLimitError:
    print("Rate limited! Please wait and retry.")

except APIError as e:
    print(f"API error: {e}")
```

--------------------------------

### Migrate from LiteLLM to Apertis Python Client

Source: https://github.com/stima-tech/docs/blob/main/docs/help/migration-guides.md

This Python snippet shows how to migrate from LiteLLM to the Apertis client. The core change is replacing the `litellm.completion` call with the standard `openai.OpenAI` client initialization and `chat.completions.create` method.

```python
from litellm import completion

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-apertis-key",
    base_url="https://api.apertis.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

--------------------------------

### Enable Streaming Responses with Python

Source: https://github.com/stima-tech/docs/blob/main/docs/getting-started/quick-start.md

This Python snippet shows how to enable streaming for API responses. By setting `stream=True` in the `client.chat.completions.create` method, you receive the response in chunks, allowing for real-time output. The code iterates through the chunks and prints the content.

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True  # Enable streaming
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

--------------------------------

### Install Crush using Nix

Source: https://github.com/stima-tech/docs/blob/main/docs/installation/crush.md

Installs the Crush AI coding agent using the Nix package manager. This command fetches the tool directly from a GitHub repository, ensuring reproducible builds.

```bash
nix run github:numtide/nix-ai-tools#crush
```