### Start Llama Stack Server

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/demos/llama-stack/README.md

Starts the Llama Stack server after it has been set up by the `setup_fresh_llama_stack.py` script. This script is generated within the created directory.

```bash
cd my-demo-dir
uv run python start_server.py
```

--------------------------------

### Setup Llama Stack with Docs2DB RAG

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/demos/llama-stack/README.md

Creates an isolated Llama Stack installation with Docs2DB RAG preconfigured as a tool provider. This script sets up a uv-managed Python environment, configures the RAG provider, and generates a distribution configuration.

```bash
uv run python setup_fresh_llama_stack.py my-demo-dir
```

```python
# setup_fresh_llama_stack.py
# Creates an isolated installation of Llama Stack with Docs2DB-API RAG preconfigured as a tool runtime provider.
The script sets up a complete uv-managed Python environment with all dependencies, configures the RAG provider in `docs2db_rag.yaml`, and creates a distribution config that includes the `docs2db::rag` tool group.

# What setup_fresh_llama_stack.py Does:
# 1. Creates isolated environment - Fresh directory with `uv` virtual environment
# 2. Installs dependencies - `llama-stack`, `llama-stack-client`, and all required packages
# 3. Installs Docs2DB RAG - Editable install with specific dependencies
# 4. Configures providers - Sets up Docs2DB RAG as Llama Stack tool runtime provider
# 5. Creates distribution - YAML configuration for Ollama + Docs2DB RAG integration
# 6. Generates startup script - Ready-to-use server startup script `start_server.py`
```

--------------------------------

### Install Docs2DB-API

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/README.md

Installs the docs2db-api Python library using the 'uv' package manager. This is the first step to using the library in your project.

```bash
uv add docs2db-api
```

--------------------------------

### Create and Query RAG Database

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/README.md

Guides through creating a RAG database using the 'docs2db' tool, restoring a dump, and starting the API service. It involves installing the docs2db tool, running a pipeline to create a database dump, and then managing the database service.

```bash
uv tool install docs2db
docs2db pipeline /path/to/documents

# Start database
uv run docs2db-api db-start

# Restore dump
uv run docs2db-api db-restore ragdb_dump.sql

# Check status
uv run docs2db-api db-status
```

--------------------------------

### UniversalRAGEngine for Document Retrieval (Python)

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

This Python code demonstrates the usage of the UniversalRAGEngine class for programmatic document retrieval. It showcases the two-phase initialization pattern (constructor and async start()), performing a search, and accessing results including document count, features used, and refined questions. The example requires the asyncio library.

```python
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine, RAGConfig

async def main():
    # Initialize with default configuration (auto-detects from environment/database)
    engine = UniversalRAGEngine()
    await engine.start()

    # Search documents with all default settings
    result = await engine.search_documents("How do I configure authentication?")

    print(f"Found {len(result.documents)} documents")
    print(f"Features used: {result.metadata['features_used']}")

    if result.refined_questions:
        print(f"Refined questions:\n{result.refined_questions}")

    for doc in result.documents:
        print(f"\nScore: {doc['similarity_score']:.3f}")
        print(f"Source: {doc['document_path']}")
        print(f"Text: {doc['text'][:300]}...")

asyncio.run(main())
```

--------------------------------

### Docs2DB API Database Management Commands (Bash)

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/README.md

Commands to manage the PostgreSQL database used by the Docs2DB API. These include starting, stopping, destroying, checking status, and restoring the database. Ensure you have Podman or Docker installed for database operations.

```bash
docs2db-api db-start               # Start PostgreSQL with Podman/Docker
docs2db-api db-stop                # Stop PostgreSQL (data preserved)
docs2db-api db-destroy             # Stop and delete all data
docs2db-api db-status              # Check connection and stats
docs2db-api db-restore <file>      # Restore database from dump
docs2db-api manifest               # Generate list of documents
```

--------------------------------

### LlamaStack YAML Configuration for Docs2DB-API

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Example YAML configuration for setting up Docs2DB-API as a LlamaStack tool runtime provider, including inference and toolgroup definitions.

```yaml
# docs2db-distribution.yaml
version: "2"

providers:
  inference:
    - provider_id: ollama
      provider_type: remote::ollama
      config:
        url: http://localhost:11434

  tool_runtime:
    - provider_id: docs2db_rag
      provider_type: inline
      config:
        module: docs2db_api.rag.llama_stack
        config_class: Docs2DBRAGConfig
        model_name: ibm-granite/granite-embedding-30m-english
        similarity_threshold: 0.7
        max_chunks: 10
        enable_question_refinement: true

toolgroups:
  - toolgroup_id: docs2db::rag
    provider_id: docs2db_rag

shields: []

models:
  - model_id: qwen2.5:7b-instruct
    provider_id: ollama
```

--------------------------------

### Configure Docs2DB API using Pydantic Settings and Environment Variables (Python)

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

The Docs2DB API leverages Pydantic Settings for configuration management, allowing settings to be controlled via environment variables. This example demonstrates setting various parameters for LLM, database, RAG, and logging subsystems. Requires 'docs2db-api' and 'pydantic'.

```python
import os
from docs2db_api.config import (
    Settings,
    LLMSettings,
    DatabaseSettings,
    RAGSettings,
    LoggingSettings,
    EmbeddingSettings,
    settings,  # Global settings instance
)

# Environment variable configuration examples
os.environ["DOCS2DB_DB_HOST"] = "postgres.example.com"
os.environ["DOCS2DB_DB_PORT"] = "5432"
os.environ["DOCS2DB_DB_DATABASE"] = "ragdb"
os.environ["DOCS2DB_DB_USER"] = "raguser"
os.environ["DOCS2DB_DB_PASSWORD"] = "secret"

os.environ["DOCS2DB_LLM_BASE_URL"] = "http://localhost:11434"
os.environ["DOCS2DB_LLM_MODEL"] = "qwen2.5:7b-instruct"
os.environ["DOCS2DB_LLM_TIMEOUT"] = "30.0"
os.environ["DOCS2DB_LLM_TEMPERATURE"] = "0.7"
os.environ["DOCS2DB_LLM_MAX_TOKENS"] = "500"

os.environ["DOCS2DB_RAG_SIMILARITY_THRESHOLD"] = "0.7"
os.environ["DOCS2DB_RAG_MAX_CHUNKS"] = "10"
os.environ["DOCS2DB_RAG_ENABLE_QUESTION_REFINEMENT"] = "true"
os.environ["DOCS2DB_RAG_ENABLE_RERANKING"] = "true"

os.environ["DOCS2DB_LOG_LEVEL"] = "INFO"
os.environ["DOCS2DB_OFFLINE"] = "false"

# Create fresh settings instance after setting env vars
fresh_settings = Settings()

```

--------------------------------

### Database Management CLI

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Commands for managing PostgreSQL databases with pgvector, including starting, stopping, restoring, and checking status.

```APIDOC
## Database Management CLI

The CLI provides commands for managing PostgreSQL databases with pgvector. These commands handle starting, stopping, restoring, and checking database status.

### Commands

*   **`docs2db-api db-start`**: Start PostgreSQL database using Podman/Docker compose.
*   **`docs2db-api db-status [--host <host>] [--port <port>] [--db <db>] [--user <user>] [--password <password>]`**: Check database connectivity and display statistics. Optional parameters can specify connection details.
*   **`docs2db-api db-restore <dump_file> [--verbose]`**: Restore database from a docs2db dump file. The `--verbose` flag enables verbose output.
*   **`docs2db-api db-stop`**: Stop database (data preserved in volumes).
*   **`docs2db-api db-destroy`**: Stop and permanently delete all data.
*   **`docs2db-api manifest --output-file <output_file>`**: Generate manifest file listing all documents in the database. The output file path is specified using `--output-file`.
```

--------------------------------

### Manage PostgreSQL Database with Docs2DB-API CLI

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

These commands utilize the docs2db-api CLI for managing PostgreSQL databases with pgvector. They cover starting, stopping, restoring, checking status, and destroying the database. Specific connection parameters can be provided for status checks and restoration.

```bash
# Start PostgreSQL database using Podman/Docker compose
docs2db-api db-start

# Check database connectivity and display statistics
docs2db-api db-status

# Check with specific connection parameters
docs2db-api db-status --host localhost --port 5432 --db myragdb --user postgres --password secret

# Restore database from a docs2db dump file
docs2db-api db-restore ragdb_dump.sql

# Restore with verbose output
docs2db-api db-restore ragdb_dump.sql --verbose

# Stop database (data preserved in volumes)
docs2db-api db-stop

# Stop and permanently delete all data
docs2db-api db-destroy

# Generate manifest file listing all documents in database
docs2db-api manifest --output-file document_list.txt
```

--------------------------------

### Configure UniversalRAGEngine with Custom Settings

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Demonstrates how to initialize the UniversalRAGEngine with custom RAG configurations, database settings, and refinement prompts. It shows how to override settings at query time and print detailed search results, including metadata and document scores. Dependencies include asyncio and specific classes from docs2db_api.rag.engine.

```python
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine, RAGConfig

async def main():
    # Custom RAG configuration
    config = RAGConfig(
        model_name="ibm-granite/granite-embedding-30m-english",
        similarity_threshold=0.75,
        max_chunks=15,
        max_tokens_in_context=8192,
        enable_question_refinement=True,
        enable_reranking=True,
        refinement_questions_count=5,
    )

    # Custom database configuration
    db_config = {
        "host": "localhost",
        "port": "5432",
        "database": "ragdb",
        "user": "postgres",
        "password": "postgres",
    }

    # Custom refinement prompt template
    refinement_prompt = """Generate 5 refined questions based on: {question}
    Focus on technical documentation retrieval."""

    engine = UniversalRAGEngine(
        config=config,
        db_config=db_config,
        refinement_prompt=refinement_prompt,
    )
    await engine.start()

    # Override settings at query time
    result = await engine.search_documents(
        "How do I set up SSH keys?",
        max_chunks=5,
        similarity_threshold=0.8,
        enable_reranking=False,
    )

    print(f"Model: {result.metadata['model_name']}")
    print(f"Dimensions: {result.metadata['model_dimensions']}")
    print(f"Threshold: {result.metadata['similarity_threshold']}")

    for i, doc in enumerate(result.documents, 1):
        print(f"\n[{i}] Score: {doc['similarity_score']:.4f}")
        print(f"    Path: {doc['document_path']}")
        print(f"    RRF: {doc.get('rrf_score', 'N/A')}")
        print(f"    BM25: {doc.get('bm25_rank', 'N/A')}")
        print(f"    Vector: {doc.get('vector_similarity', 'N/A')}")

asyncio.run(main())

```

--------------------------------

### Configure Database Connection

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Demonstrates configuring database connection using individual environment variables, a single URL, or a postgres-compose.yml file. It then retrieves and prints the resolved configuration.

```python
import os
from docs2db_api.database import get_db_config

# Option 1: Individual environment variables (highest priority)
os.environ["DOCS2DB_DB_HOST"] = "db.example.com"
os.environ["DOCS2DB_DB_PORT"] = "5432"
os.environ["DOCS2DB_DB_DATABASE"] = "myragdb"
os.environ["DOCS2DB_DB_USER"] = "admin"
os.environ["DOCS2DB_DB_PASSWORD"] = "secret"

# Option 2: Single URL (parsed automatically)
os.environ["DOCS2DB_DB_URL"] = "postgresql://user:password@host:5432/database"

# Option 3: postgres-compose.yml in current directory (lowest priority)
# Automatically parsed if present

# Get resolved configuration
config = get_db_config()
print(f"Host: {config['host']}")
print(f"Port: {config['port']}")
print(f"Database: {config['database']}")
print(f"User: {config['user']}")
# Password available but not printed for security
```

--------------------------------

### Run All Tests with Make

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/tests/README.md

Executes all automated tests defined in the project using the 'make test' command. This is the primary command for running the full test suite.

```bash
make test
```

--------------------------------

### Access Configuration Settings

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Prints various configuration settings for database, LLM, RAG, logging, and embedding.

```python
print(f"Database: {fresh_settings.database.host}:{fresh_settings.database.port}/{fresh_settings.database.database}")
print(f"LLM: {fresh_settings.llm.base_url} model={fresh_settings.llm.model}")
print(f"RAG threshold: {fresh_settings.rag.similarity_threshold}")
print(f"Log level: {fresh_settings.logging.log_level}")
print(f"Offline mode: {fresh_settings.embedding.offline}")
```

--------------------------------

### Direct Tool Demo with Llama Stack RAG

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/demos/llama-stack/README.md

Demonstrates direct tool calling by invoking the `search_documents` tool via `tool_runtime.invoke_tool()`. This script verifies the basic functionality of the RAG tool.

```bash
uv run python client.py
```

```python
# client.py
# Calls `search_documents` tool directly via `tool_runtime.invoke_tool()`
```

--------------------------------

### Configure Database Connection via CLI and Environment Variables

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/README.md

Illustrates various methods for configuring the database connection for docs2db-api, including CLI arguments, environment variables, and the DOCS2DB_DB_URL.

```bash
# Use defaults
uv run docs2db-api db-status

# Environment variables
export DOCS2DB_DB_HOST=prod.example.com
export DOCS2DB_DB_DATABASE=mydb
uv run docs2db-api db-status

# DOCS2DB_DB_URL (cloud providers)
export DOCS2DB_DB_URL="postgresql://user:pass@host:5432/db"
uv run docs2db-api db-status

# CLI arguments
uv run docs2db-api db-status --host localhost --db mydb
```

--------------------------------

### UniversalRAGEngine with Custom Configuration

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Demonstrates how to configure the UniversalRAGEngine with custom settings for embedding models, similarity thresholds, chunk limits, and feature toggles. It also shows how to override these settings at query time.

```APIDOC
## UniversalRAGEngine with Custom Configuration

### Description
Configure the RAG engine with specific settings for embedding model, similarity thresholds, chunk limits, and feature toggles. The RAGConfig dataclass accepts optional values that fall through the settings hierarchy when set to None.

### Method
POST (Implicit, as this is a configuration setup)

### Endpoint
N/A (This is a code example for client-side configuration)

### Parameters
#### Request Body (Conceptual for RAGConfig and db_config)
- **config** (RAGConfig) - Optional custom RAG configuration.
  - **model_name** (str) - The name of the embedding model to use.
  - **similarity_threshold** (float) - The minimum similarity score for a document to be considered relevant.
  - **max_chunks** (int) - The maximum number of chunks to retrieve per document.
  - **max_tokens_in_context** (int) - The maximum number of tokens allowed in the context window.
  - **enable_question_refinement** (bool) - Whether to enable question refinement.
  - **enable_reranking** (bool) - Whether to enable result reranking.
  - **refinement_questions_count** (int) - The number of refined questions to generate.
- **db_config** (dict) - Optional database connection configuration.
  - **host** (str) - Database host.
  - **port** (str) - Database port.
  - **database** (str) - Database name.
  - **user** (str) - Database username.
  - **password** (str) - Database password.
- **refinement_prompt** (str) - Optional custom prompt template for question refinement.

### Request Example
```python
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine, RAGConfig

async def main():
    # Custom RAG configuration
    config = RAGConfig(
        model_name="ibm-granite/granite-embedding-30m-english",
        similarity_threshold=0.75,
        max_chunks=15,
        max_tokens_in_context=8192,
        enable_question_refinement=True,
        enable_reranking=True,
        refinement_questions_count=5,
    )

    # Custom database configuration
    db_config = {
        "host": "localhost",
        "port": "5432",
        "database": "ragdb",
        "user": "postgres",
        "password": "postgres",
    }

    # Custom refinement prompt template
    refinement_prompt = """Generate 5 refined questions based on: {question}
    Focus on technical documentation retrieval."""

    engine = UniversalRAGEngine(
        config=config,
        db_config=db_config,
        refinement_prompt=refinement_prompt,
    )
    await engine.start()

    # Override settings at query time
    result = await engine.search_documents(
        "How do I set up SSH keys?",
        max_chunks=5,
        similarity_threshold=0.8,
        enable_reranking=False,
    )

    print(f"Model: {result.metadata['model_name']}")
    print(f"Dimensions: {result.metadata['model_dimensions']}")
    print(f"Threshold: {result.metadata['similarity_threshold']}")

    for i, doc in enumerate(result.documents, 1):
        print(f"\n[{i}] Score: {doc['similarity_score']:.4f}")
        print(f"    Path: {doc['document_path']}")
        print(f"    RRF: {doc.get('rrf_score', 'N/A')}")
        print(f"    BM25: {doc.get('bm25_rank', 'N/A')}")
        print(f"    Vector: {doc.get('vector_similarity', 'N/A')}")

asyncio.run(main())
```

### Response
#### Success Response (200)
- **metadata** (dict) - Contains search statistics and configuration details.
- **documents** (list) - A list of dictionaries, each representing a relevant document chunk.
  - **similarity_score** (float) - The similarity score of the chunk to the query.
  - **document_path** (str) - The path to the source document.
  - **rrf_score** (float, optional) - The Reciprocal Rank Fusion score.
  - **bm25_rank** (float, optional) - The BM25 ranking score.
  - **vector_similarity** (float, optional) - The vector similarity score.

#### Response Example
```json
{
  "metadata": {
    "model_name": "ibm-granite/granite-embedding-30m-english",
    "model_dimensions": 1024,
    "similarity_threshold": 0.8,
    "features_used": ["reranking"]
  },
  "documents": [
    {
      "similarity_score": 0.85,
      "document_path": "/path/to/document.md",
      "rrf_score": 0.7,
      "bm25_rank": 0.9,
      "vector_similarity": 0.85,
      "text": "Content of the document chunk..."
    }
  ]
}
```
```

--------------------------------

### Database Operations with DatabaseManager

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Demonstrates basic database operations using the DatabaseManager class. It includes connecting to the database, retrieving statistics, fetching RAG settings, and generating a document manifest. Requires database configuration to be available.

```python
import asyncio
from docs2db_api.database import DatabaseManager, get_db_config

async def database_operations():
    # Auto-detect configuration from environment/compose file
    config = get_db_config()
    print(f"Database config: {config}")

    # Create database manager
    db_manager = DatabaseManager(
        host=config["host"],
        port=int(config["port"]),
        database=config["database"],
        user=config["user"],
        password=config["password"],
    )

    # Get database statistics
    stats = await db_manager.get_stats()
    print(f"\nDatabase Statistics:")
    print(f"  Documents: {stats['documents']}")
    print(f"  Chunks: {stats['chunks']}")
    print(f"  Embedding models: {stats['embedding_models']}")

    # Get RAG settings stored in database
    rag_settings = await db_manager.get_rag_settings()
    if rag_settings:
        print(f"\nRAG Settings from DB:")
        print(f"  Enable refinement: {rag_settings.get('enable_refinement')}")
        print(f"  Enable reranking: {rag_settings.get('enable_reranking')}")
        print(f"  Similarity threshold: {rag_settings.get('similarity_threshold')}")

    # Generate document manifest
    await db_manager.generate_manifest("manifest.txt")
    print("\nManifest generated: manifest.txt")

asyncio.run(database_operations())
```

--------------------------------

### Configuration with Pydantic Settings

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

The library utilizes Pydantic Settings for robust configuration management, allowing subsystems to be controlled via environment variables with specific prefixes.

```APIDOC
## Configuration with Pydantic Settings

### Description
The library uses Pydantic Settings for comprehensive configuration management. Environment variables with specific prefixes control different subsystems.

### Usage

Environment variables can be set to configure various aspects of the application. The `Settings` class provides a unified interface to access these configurations.

**Example Environment Variables:**

- **Database:**
  - `DOCS2DB_DB_HOST`
  - `DOCS2DB_DB_PORT`
  - `DOCS2DB_DB_DATABASE`
  - `DOCS2DB_DB_USER`
  - `DOCS2DB_DB_PASSWORD`

- **LLM:**
  - `DOCS2DB_LLM_BASE_URL`
  - `DOCS2DB_LLM_MODEL`
  - `DOCS2DB_LLM_TIMEOUT`
  - `DOCS2DB_LLM_TEMPERATURE`
  - `DOCS2DB_LLM_MAX_TOKENS`

- **RAG:**
  - `DOCS2DB_RAG_SIMILARITY_THRESHOLD`
  - `DOCS2DB_RAG_MAX_CHUNKS`
  - `DOCS2DB_RAG_ENABLE_QUESTION_REFINEMENT`
  - `DOCS2DB_RAG_ENABLE_RERANKING`

- **Logging:**
  - `DOCS2DB_LOG_LEVEL`

- **General:**
  - `DOCS2DB_OFFLINE`

### Accessing Settings

Import the `settings` object or create a fresh instance of `Settings`:

```python
import os
from docs2db_api.config import Settings, settings  # Global settings instance

# Set environment variables (example)
os.environ["DOCS2DB_DB_HOST"] = "postgres.example.com"
os.environ["DOCS2DB_LLM_MODEL"] = "qwen2.5:7b-instruct"

# Access global settings
print(f"Database Host: {settings.db.host}")
print(f"LLM Model: {settings.llm.model}")

# Create a fresh settings instance after changing environment variables
fresh_settings = Settings()
print(f"Fresh DB Host: {fresh_settings.db.host}")
```

### Configuration Classes

- **`Settings`**: The root settings class.
- **`LLMSettings`**: Configuration for the LLM client.
- **`DatabaseSettings`**: Configuration for the database connection.
- **`RAGSettings`**: Configuration for the Retrieval-Augmented Generation pipeline.
- **`LoggingSettings`**: Configuration for logging.
- **`EmbeddingSettings`**: Configuration for embedding providers.

### Data Types

- Environment variables are parsed into appropriate Python types (e.g., strings, integers, floats, booleans).
- Boolean environment variables like `DOCS2DB_RAG_ENABLE_RERANKING` can be set to `true`, `false`, `1`, `0`, `yes`, `no` (case-insensitive).
```

--------------------------------

### Agent Tool Calling Demo with Llama Stack RAG

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/demos/llama-stack/README.md

Demonstrates agent-based tool calling where a Llama Stack agent is created to interact with the `docs2db::rag` tool. The agent uses tool-calling capabilities to query the RAG tool.

```bash
uv run python agent_tool_calling_client.py
```

```python
# agent_tool_calling_client.py
# Creates a Llama Stack agent that can call the `docs2db::rag` tool
# Queries this new agent, which uses tool-calling to call the RAG tool
```

--------------------------------

### Integrate with OpenAI-Compatible LLMs using LLMClient (Python)

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

The LLMClient class facilitates integration with OpenAI-compatible API endpoints, such as local LLM servers like Ollama. It can be used to generate refined search queries or other text completions based on provided prompts. Requires the 'docs2db-api' library and an running LLM server.

```python
import asyncio
from docs2db_api.rag.engine import LLMClient

async def llm_client_example():
    # Create client with defaults from environment
    # Reads: DOCS2DB_LLM_BASE_URL, DOCS2DB_LLM_MODEL, DOCS2DB_LLM_TIMEOUT, etc.
    client = LLMClient()

    # Or with explicit configuration
    client = LLMClient(
        base_url="http://localhost:11434",  # Ollama default
        model="qwen2.5:7b-instruct",
    )

    # Complete a prompt
    prompt = """Generate 3 refined search queries for: \"How to configure networking\"
    Return only the queries, one per line."""

    response = await client.acomplete(prompt)
    print(f"LLM Response:\n{response}")

    # Clean up
    await client.close()

asyncio.run(llm_client_example())
```

--------------------------------

### Run Pytest with Verbose Output

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/tests/README.md

Runs the test suite using pytest with verbose output enabled, showing detailed information about each test being executed. This is useful for debugging.

```bash
uv run pytest -v
```

--------------------------------

### Configure LLM for Query Refinement

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/README.md

Sets environment variables to configure the Large Language Model (LLM) used for query refinement. This includes specifying the API base URL, model name, timeout, temperature, and maximum tokens.

```bash
export DOCS2DB_LLM_BASE_URL=http://localhost:11434      # OpenAI-compatible API (e.g., Ollama)
export DOCS2DB_LLM_MODEL=qwen2.5:7b-instruct            # Model name
export DOCS2DB_LLM_TIMEOUT=30.0                         # HTTP timeout (seconds)
export DOCS2DB_LLM_TEMPERATURE=0.7                      # Generation temperature
export DOCS2DB_LLM_MAX_TOKENS=500                       # Max tokens per response
```

--------------------------------

### Run Pytest with Coverage

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/tests/README.md

Executes the test suite and generates a code coverage report. This helps identify which parts of the codebase are being tested and which are not.

```bash
uv run pytest --cov
```

--------------------------------

### Restore Database from Dump File (Python)

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Restores a PostgreSQL database from a specified SQL dump file. Requires connection parameters (host, port, db, user, password) and the input file path. The 'verbose' flag controls the display of psql output. Returns a boolean indicating success.

```python
from rhel_lightspeed.docs2db_api import restore_database

success = restore_database(
    input_file="ragdb_dump.sql",
    host="localhost",
    port=5432,
    db="ragdb",
    user="postgres",
    password="postgres",
    verbose=True,  # Show psql output
)
print(f"Restore successful: {success}")
```

--------------------------------

### LlamaStack Integration with Docs2DBRAGAdapter

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Shows how to integrate Docs2DB's RAG engine with LlamaStack by configuring the adapter, listing available tools, and invoking 'search_documents' and 'search_and_generate' tools.

```python
import asyncio
from docs2db_api.rag.llama_stack import (
    Docs2DBRAGConfig,
    Docs2DBRAGAdapter,
    get_provider_impl,
)

async def llama_stack_integration():
    # Configure the RAG provider
    config = Docs2DBRAGConfig(
        model_name="ibm-granite/granite-embedding-30m-english",
        similarity_threshold=0.7,
        max_chunks=10,
        max_tokens_in_context=4096,
        enable_question_refinement=True,
    )

    # Create adapter instance (normally done by LlamaStack)
    adapter = await get_provider_impl(config, deps={})

    # List available tools
    tools_response = await adapter.list_runtime_tools()
    print("Available tools:")
    for tool in tools_response.data:
        print(f"  - {tool.name}: {tool.description}")

    # Invoke search_documents tool
    search_result = await adapter.invoke_tool(
        tool_name="search_documents",
        kwargs={
            "query": "How do I configure SSH on RHEL?",
            "max_chunks": 5,
            "similarity_threshold": 0.6,
        },
    )

    if search_result.error_message:
        print(f"Error: {search_result.error_message}")
    else:
        print(f"\nSearch Result:\n{search_result.content[:500]}...")
        print(f"\nMetadata: {search_result.metadata}")

    # Invoke search_and_generate tool
    generate_result = await adapter.invoke_tool(
        tool_name="search_and_generate",
        kwargs={
            "query": "What are the steps to set up a firewall?",
            "max_chunks": 3,
        },
    )

    print(f"\nGenerate Result:\n{generate_result.content[:500]}...")

asyncio.run(llama_stack_integration())
```

--------------------------------

### UniversalRAGEngine Class

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

The core RAG engine for framework-agnostic document retrieval with hybrid search, reranking, and question refinement.

```APIDOC
## UniversalRAGEngine Class

The core RAG engine provides framework-agnostic document retrieval with hybrid search, reranking, and question refinement. It uses a two-phase initialization pattern where the constructor is lightweight and `start()` performs async initialization.

### Initialization

```python
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine, RAGConfig

async def main():
    # Initialize with default configuration (auto-detects from environment/database)
    engine = UniversalRAGEngine()
    await engine.start()

    # ... rest of your code ...

asyncio.run(main())
```

### Methods

*   **`search_documents(query: str, **kwargs)`**: Performs document search based on the provided query. Accepts optional arguments for customization.

### Example Usage

```python
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine

async def main():
    engine = UniversalRAGEngine()
    await engine.start()

    result = await engine.search_documents("How do I configure authentication?")

    print(f"Found {len(result.documents)} documents")
    print(f"Features used: {result.metadata['features_used']}")

    if result.refined_questions:
        print(f"Refined questions:\n{result.refined_questions}")

    for doc in result.documents:
        print(f"\nScore: {doc['similarity_score']:.3f}")
        print(f"Source: {doc['document_path']}")
        print(f"Text: {doc['text'][:300]}...")

asyncio.run(main())
```

### Response Structure (for `search_documents`)

*   **`documents`** (list) - A list of dictionaries, where each dictionary represents a retrieved document and contains:
    *   `similarity_score` (float) - The similarity score of the document.
    *   `document_path` (string) - The path to the document.
    *   `text` (string) - The content of the document (potentially truncated).
*   **`metadata`** (dict) - Contains information about the search, including:
    *   `features_used` (list) - A list of features utilized during the search (e.g., 'hybrid_search', 'reranking').
*   **`refined_questions`** (string or None) - If question refinement is enabled and performed, this field contains the refined questions; otherwise, it's None.
```

--------------------------------

### Query CLI Command

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Performs hybrid search with optional question refinement, reranking, and configurable output formats.

```APIDOC
## Query CLI Command

The query command performs hybrid search with optional question refinement, reranking, and configurable output formats suitable for shell scripting and LLM integration.

### Usage

`docs2db-api query "<your query>" [--model <model>] [--limit <limit>] [--threshold <threshold>] [--no-refine] [--format <format>] [--max-chars <max_chars>] [--refinement-prompt <prompt>]`

### Parameters

*   **`<your query>`** (string) - Required - The search query.
*   **`--model <model>`** (string) - Optional - The embedding model to use (e.g., `ibm-granite/granite-embedding-30m-english`).
*   **`--limit <limit>`** (integer) - Optional - The maximum number of documents to return.
*   **`--threshold <threshold>`** (float) - Optional - The similarity threshold for results.
*   **`--no-refine`** (boolean) - Optional - Disable question refinement.
*   **`--format <format>`** (string) - Optional - Output format (`text` for shell-friendly output).
*   **`--max-chars <max_chars>`** (integer) - Optional - Truncate output to a maximum number of characters, useful for LLM token limits.
*   **`--refinement-prompt <prompt>`** (string) - Optional - Custom prompt for question refinement. Use `{question}` as a placeholder for the original question.
```

--------------------------------

### Docs2DB API Querying Commands (Bash)

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/README.md

Commands for querying the Docs2DB API. Supports basic keyword searches and advanced options like specifying a model, result limit, similarity threshold, and disabling question refinement. The `--threshold` and `--limit` parameters directly influence the search results.

```bash
# Basic search
docs2db-api query "How do I configure authentication?"

# Advanced options
docs2db-api query "deployment guide" \
  --model granite-30m-english \
  --limit 20 \
  --threshold 0.8 \
  --no-refine                     # Disable question refinement
```

--------------------------------

### Perform Hybrid Search with Docs2DB-API CLI

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

The query command allows for hybrid document search with options for question refinement, reranking, and output formatting. It supports basic searches, advanced configurations with model and threshold settings, and shell-friendly text output for LLM integration, including options to truncate output.

```bash
# Basic document search
docs2db-api query "How do I configure authentication?"

# Advanced search with all options
docs2db-api query "deployment guide" \
  --model ibm-granite/granite-embedding-30m-english \
  --limit 20 \
  --threshold 0.8 \
  --no-refine

# Shell-friendly text output for LLM prompt injection
docs2db-api query "configure SSH access" --format text

# Truncate output for LLM token budget constraints
docs2db-api query "system requirements" --format text --max-chars 4000

# Using custom refinement prompt
docs2db-api query "How to install packages?" --refinement-prompt "Generate 3 specific questions about: {question}"
```

--------------------------------

### Generate Document Manifest (Python)

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Generates a manifest file listing all documents present in the database. Requires database connection parameters and an output file path for the manifest. This is an asynchronous function. Returns a boolean indicating success.

```python
import asyncio
from rhel_lightspeed.docs2db_api import generate_manifest

async def generate_doc_manifest():
    # Generate manifest of all documents in database
    success = await generate_manifest(
        output_file="documents.txt",
        host="localhost",
        port=5432,
        db="ragdb",
        user="postgres",
        password="postgres",
    )
    print(f"Manifest generated: {success}")

asyncio.run(generate_doc_manifest())
```

--------------------------------

### Vector-Only and BM25-Only Search with DatabaseManager

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Demonstrates how to perform pure vector similarity search or pure BM25 lexical search. This is useful when hybrid search is not required. Requires database connection and an embedding provider for vector search.

```python
import asyncio
from docs2db_api.database import DatabaseManager, get_db_config
from docs2db_api.embeddings import GraniteEmbeddingProvider, EMBEDDING_CONFIGS

async def search_methods():
    config = get_db_config()
    db_manager = DatabaseManager(
        host=config["host"],
        port=int(config["port"]),
        database=config["database"],
        user=config["user"],
        password=config["password"],
    )

    model_name = "ibm-granite/granite-embedding-30m-english"
    embedding_provider = GraniteEmbeddingProvider(
        model_name=model_name,
        config=EMBEDDING_CONFIGS[model_name],
        device="cpu",
    )

    query = "network configuration"

    # Pure vector similarity search (semantic)
    query_embedding = embedding_provider.generate_embeddings([query])[0]
    vector_results = await db_manager.search_vector(
        query_embedding=query_embedding,
        model_name=model_name,
        limit=5,
        similarity_threshold=0.6,
    )

    print("Vector Search Results:")
    for chunk in vector_results:
        print(f"  [{chunk['similarity']:.3f}] {chunk['text'][:80]}...")

    # Pure BM25 full-text search (lexical)
    bm25_results = await db_manager.search_bm25(
        query_text=query,
        limit=5,
    )

    print("\nBM25 Search Results:")
    for chunk in bm25_results:
        print(f"  [{chunk['bm25_rank']:.3f}] {chunk['text'][:80]}...")

asyncio.run(search_methods())
```

--------------------------------

### Python RAG Document Search

Source: https://github.com/rhel-lightspeed/docs2db-api/blob/main/README.md

Demonstrates how to use the UniversalRAGEngine in Python to search documents in a RAG database. It shows initialization with default or custom settings and processing search results.

```python
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine, RAGConfig

async def main():
    # Initialize engine with defaults (auto-detects database from environment)
    engine = UniversalRAGEngine()
    await engine.start()
    
    # # Or with specific settings
    # config = RAGConfig(
    #     model_name="granite-30m-english",
    #     max_chunks=5,
    #     similarity_threshold=0.7
    # )
    # db_config = {
    #     "host": "localhost",
    #     "port": "5432",
    #     "database": "ragdb",
    #     "user": "postgres",
    #     "password": "postgres"
    # }
    # engine = UniversalRAGEngine(config=config, db_config=db_config)
    # await engine.start()
    
    # Search
    result = await engine.search_documents("How do I configure authentication?")
    for doc in result.documents:
        print(f"Score: {doc['similarity_score']:.3f}")
        print(f"Source: {doc['document_path']}")
        print(f"Text: {doc['text'][:200]}...\n")

asyncio.run(main())
```

--------------------------------

### Perform Quick Document Searches with search_documents

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Utilizes the convenience function `search_documents` for simplified document retrieval. This function abstracts away the engine initialization and lifecycle management, allowing for quick searches with default or custom parameters. It demonstrates both a basic search and a search with specific overrides for model, chunk limits, and feature enablement.

```python
import asyncio
from docs2db_api.rag.engine import search_documents

async def quick_search():
    # Simple search with defaults
    result = await search_documents("How do I install packages?")

    for doc in result.documents:
        print(f"[{doc['similarity_score']:.3f}] {doc['text'][:200]}...")

    # Search with custom options
    result = await search_documents(
        "Configure network settings",
        model_name="ibm-granite/granite-embedding-30m-english",
        max_chunks=10,
        similarity_threshold=0.6,
        enable_question_refinement=False,
        enable_reranking=True,
    )

    print(f"\nFound {len(result.documents)} documents")
    print(f"Features: {result.metadata['features_used']}")

asyncio.run(quick_search())

```

--------------------------------

### Database Status Check Utility

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Demonstrates how to use the `check_database_status` utility function to verify the health of the database, including connection, pgvector extension, schema, and statistics.

```python
import asyncio
from docs2db_api.database import (
    check_database_status,
    restore_database,
    generate_manifest,
)

async def database_utilities():
    # Check database status with full diagnostics
    # Verifies: connection, pgvector extension, schema, statistics
    await check_database_status(
        host="localhost",
        port=5432,
        db="ragdb",
        user="postgres",
        password="postgres",
    )
```

--------------------------------

### Convenience Function search_documents

Source: https://context7.com/rhel-lightspeed/docs2db-api/llms.txt

Provides a simplified `search_documents` function that handles the engine initialization and search process in a single call, allowing for quick document retrieval with default or custom options.

```APIDOC
## Convenience Function search_documents

### Description
A simplified function for quick document searches without manually managing engine lifecycle. Creates an engine instance, initializes it, and performs the search in one call.

### Method
GET

### Endpoint
`/search` (Conceptual endpoint for the function's action)

### Parameters
#### Query Parameters
- **query** (str) - Required - The search query string.
- **model_name** (str) - Optional - The name of the embedding model to use.
- **similarity_threshold** (float) - Optional - The minimum similarity score for a document to be considered relevant.
- **max_chunks** (int) - Optional - The maximum number of chunks to retrieve per document.
- **max_tokens_in_context** (int) - Optional - The maximum number of tokens allowed in the context window.
- **enable_question_refinement** (bool) - Optional - Whether to enable question refinement.
- **enable_reranking** (bool) - Optional - Whether to enable result reranking.

### Request Example
```python
import asyncio
from docs2db_api.rag.engine import search_documents

async def quick_search():
    # Simple search with defaults
    result = await search_documents("How do I install packages?")

    for doc in result.documents:
        print(f"[{doc['similarity_score']:.3f}] {doc['text'][:200]}...")

    # Search with custom options
    result = await search_documents(
        "Configure network settings",
        model_name="ibm-granite/granite-embedding-30m-english",
        max_chunks=10,
        similarity_threshold=0.6,
        enable_question_refinement=False,
        enable_reranking=True,
    )

    print(f"\nFound {len(result.documents)} documents")
    print(f"Features: {result.metadata['features_used']}")

asyncio.run(quick_search())
```

### Response
#### Success Response (200)
- **query** (str) - The original search query.
- **documents** (list) - A list of dictionaries, each representing a relevant document chunk.
  - **text** (str) - The content of the document chunk.
  - **similarity_score** (float) - The similarity score of the chunk to the query.
  - **document_path** (str) - The path to the source document.
  - **chunk_index** (int) - The index of the chunk within the document.
  - **rrf_score** (float, optional) - The Reciprocal Rank Fusion score.
  - **vector_similarity** (float, optional) - The vector similarity score.
  - **bm25_rank** (float, optional) - The BM25 ranking score.
  - **metadata** (dict, optional) - Additional metadata for the chunk.
- **refined_questions** (list, optional) - A list of refined questions generated from the original query.
- **metadata** (dict, optional) - Contains search statistics and configuration details.
  - **model_name** (str) - The name of the embedding model used.
  - **model_dimensions** (int) - The dimensions of the embedding model.
  - **similarity_threshold** (float) - The similarity threshold used for the search.
  - **features_used** (list) - A list of features that were enabled during the search.

#### Response Example
```json
{
  "query": "How do I install packages?",
  "documents": [
    {
      "text": "To install packages, use the following command...",
      "similarity_score": 0.88,
      "document_path": "/docs/package_management.md",
      "chunk_index": 1,
      "metadata": {}
    }
  ],
  "metadata": {
    "model_name": "default-embedding-model",
    "model_dimensions": 768,
    "similarity_threshold": 0.7,
    "features_used": []
  }
}
```
```