### APIServer with Route Configuration and Uvicorn Start

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/APIServer.md

Demonstrates the complete setup for running the APIServer, including client initialization, route configuration, and starting the server with uvicorn.

```python
from chatui.api import APIServer
from chatui.chat_client import ChatClient
import uvicorn

client = ChatClient("http://localhost:8000", "llama-3.3-70b")
api_server = APIServer(client)
api_server.configure_routes()

# Start server
uvicorn.run(api_server, host="0.0.0.0", port=8080, workers=1)
```

--------------------------------

### YAML Configuration Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Example of a YAML file structure for AppConfig settings.

```yaml
serverUrl: http://localhost
serverPort: "8000"
serverPrefix: /rag-api/
modelName: llama-3.3-70b
```

--------------------------------

### JSON Configuration Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Example of a JSON file structure for AppConfig settings.

```json
{
  "serverUrl": "http://localhost",
  "serverPort": "8000",
  "serverPrefix": "/rag-api/",
  "modelName": "llama-3.3-70b"
}
```

--------------------------------

### APIServer Initialization Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/APIServer.md

Instantiates the APIServer with a ChatClient and configures its routes. This is a prerequisite for running the server.

```python
from chatui.api import APIServer
from chatui.chat_client import ChatClient

client = ChatClient("http://localhost:8000", "llama-3.3-70b")
app = APIServer(client)
app.configure_routes()

# Run with uvicorn
# uvicorn chatui.api:app --host 0.0.0.0 --port 8080
```

--------------------------------

### Configuration Guide

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

A comprehensive guide to application configuration, detailing AppConfig options, environment variables, and runtime configuration via GraphState.

```APIDOC
## Configuration

### Description
Provides a detailed guide to configuring the Agentic RAG system.

### Configuration Options
- AppConfig options
- Environment variables
- Runtime configuration via GraphState
- Includes example scenarios for different configurations.
```

--------------------------------

### Cloud-Only Configuration Scenario

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/configuration.md

Example JSON configuration for a cloud-only setup. This scenario requires setting NVIDIA and Tavily API keys as environment variables.

```json
{
  "serverUrl": "http://localhost",
  "serverPort": "8000",
  "modelName": "llama-3.3-70b"
}
```

```bash
export NVIDIA_API_KEY="your_nvidia_key"
export TAVILY_API_KEY="your_tavily_key"
```

--------------------------------

### Instantiate CustomChatOpenAI (with GPU Check)

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/CustomChatOpenAI.md

Example of instantiating CustomChatOpenAI with GPU type and count for compatibility checking.

```python
from chatui.utils.nim import CustomChatOpenAI

# With GPU compatibility checking
llm = CustomChatOpenAI(
    custom_endpoint="localhost",
    port="8000",
    model_name="meta/llama-3.3-70b-instruct",
    gpu_type="A100",
    gpu_count="2",
    temperature=0.5
)
```

--------------------------------

### Environment Variables Setup

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

Sets up required and optional environment variables for the application. NVIDIA_API_KEY and TAVILY_API_KEY are mandatory.

```bash
export NVIDIA_API_KEY="your_nvidia_key"
export TAVILY_API_KEY="your_tavily_key"
```

```bash
export CHUNK_SIZE=250
export CHUNK_OVERLAP=0
export TAVILY_K=3
export RECURSION_LIMIT=10
export INTERNAL_API=no
```

--------------------------------

### Instantiate CustomChatOpenAI (Basic)

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/CustomChatOpenAI.md

Example of basic instantiation of CustomChatOpenAI with endpoint, port, model name, and temperature.

```python
from chatui.utils.nim import CustomChatOpenAI

# Basic usage
llm = CustomChatOpenAI(
    custom_endpoint="agentic-rag-local-nim-1",
    port="8000",
    model_name="meta/llama-3.1-8b-instruct",
    temperature=0.7
)
```

--------------------------------

### Initialize GraphState for Usage Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/GraphState.md

This snippet shows how to import and potentially initialize the GraphState for use within the agentic RAG workflow. It demonstrates the basic setup required before passing state through the graph.

```python
from chatui.utils.graph import GraphState

```

--------------------------------

### Sample Queries for Chatbot

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/build-page.md

A list of pre-configured example questions that can be used as clickable buttons in the chat interface.

```python
"How do I add an integration in the CLI?"
"How do I fix an inaccessible remote Location?"
"What are the NVIDIA-provided default base environments?"
"How do I create a support bundle for troubleshooting?"

```

--------------------------------

### Custom Page with Different Client Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/build-page.md

Demonstrates how to create a custom ChatClient and use it to build, customize, and launch a Gradio page.

```python
from chatui import pages
from chatui.chat_client import ChatClient

# Create custom client
client = ChatClient(
    server_url="http://custom.server:9000",
    model_name="custom-model"
)

# Build page
page = pages.converse.build_page(client)

# Customize and launch
page.queue(max_size=20)
page.launch(server_name="0.0.0.0", server_port=8080)

```

--------------------------------

### Set Verbosity to INFO (default)

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Example of running the application with the default INFO logging level.

```bash
# Verbosity = 1 (INFO, default)
python -m chatui
```

--------------------------------

### Use Retriever to Get Documents

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/types.md

Example of obtaining a retriever instance and invoking it to fetch documents based on a query.

```python
from chatui.utils import database

retriever = database.get_retriever()
docs = retriever.invoke("How do I install AI Workbench?")
```

--------------------------------

### Mixed Mode Configuration Scenario

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/configuration.md

Example GraphState configuration for a mixed mode setup, utilizing both local NIM and cloud APIs. This involves setting specific flags and parameters for NIM usage and cloud model IDs.

```python
state = {
    "generator_use_nim": True,          # Use local NIM
    "nim_generator_ip": "localhost",
    "nim_generator_port": "8000",
    
    "router_use_nim": False,            # Use cloud API
    "router_model_id": "meta/llama-3.3-70b-instruct",
    
    ...
}
```

--------------------------------

### Self-Hosted NIM Configuration Scenario

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/configuration.md

Example JSON configuration for a self-hosted NIM setup. This scenario involves overriding GraphState with NIM service details.

```json
{
  "serverUrl": "http://localhost",
  "serverPort": "8000",
  "modelName": "llama-3.1-8b"
}
```

```python
state = {
    "router_use_nim": True,
    "nim_router_ip": "agentic-rag-local-nim-1",
    "nim_router_port": "8000",
    "nim_router_id": "meta/llama-3.1-8b-instruct",
    ...
}
```

--------------------------------

### Document Type Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/types.md

Demonstrates how to create a LangChain Document object, which includes the main text content and associated metadata like the source URL.

```python
from langchain.schema import Document

doc = Document(
    page_content="AI Workbench is NVIDIA's IDE...",
    metadata={"source": "https://docs.nvidia.com/ai-workbench/"}
)
```

--------------------------------

### Handle File Access Errors in Configuration Loading

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/errors.md

Example demonstrating how to handle FileNotFoundError and PermissionError when loading configuration files. Verify file paths and permissions.

```python
from chatui.configuration import AppConfig
import sys

config = AppConfig.from_file("/path/to/config.json")
if config is None:
    print("Failed to load configuration (file not found or permission denied)")
    sys.exit(1)
```

--------------------------------

### GraphState Initialization with Llama 3 Prompts

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/prompts.md

Demonstrates how to initialize a GraphState with specific Llama 3 prompts for various agentic components. This setup is used when invoking the graph workflow.

```python
from chatui.prompts import prompts_llama3

state = {
    "question": "...",
    "prompt_router": prompts_llama3.router_prompt,
    "prompt_retrieval": prompts_llama3.retrieval_prompt,
    "prompt_generator": prompts_llama3.generator_prompt,
    "prompt_hallucination": prompts_llama3.hallucination_prompt,
    "prompt_answer": prompts_llama3.answer_prompt,
    ...
}

result = app.invoke(state)
```

--------------------------------

### Environment Variable Precedence Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Illustrates how environment variables override configuration file settings. This example shows setting environment variables before running a Python script that loads configuration.

```bash
# Environment variable override
export APP_SERVERURL="http://custom.server"
export APP_SERVERPORT="9000"

# Config file values are overridden by env vars
python -m chatui --config /path/to/config.json

# Result: server_url="http://custom.server", server_port="9000"
```

--------------------------------

### Answer Grader Setup

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/code/langgraph_rag_agent_llama3_nvidia_nim.ipynb

Initializes the LLM for the answer grader. This component will be used to assess the quality and correctness of the final generated answer.

```python
# LLM
llm = ChatNVIDIA(model=model_id, temperature=0)
```

--------------------------------

### Usage Example: Redirecting Stdout and Logging

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/logger.md

Demonstrates how to redirect stdout to a file using the Logger class and shows that subsequent print statements are captured in the log file. Ensure the file path is valid.

```python
import sys
from chatui.utils import logger

# Redirect stdout to file
sys.stdout = logger.Logger("/path/to/output.log")

# All print statements now go to file
print("This message is logged to file")
print("Request processing: ", request_data)
```

--------------------------------

### Retrieval Grader Setup

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/code/langgraph_rag_agent_llama3_nvidia_nim.ipynb

Sets up a retrieval grader using LangChain, NVIDIA Chat, and JSON output parsing. It defines a prompt to assess document relevance to a user question.

```python
from langchain.prompts import PromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import JsonOutputParser

model_id = "meta/llama3-70b-instruct"

# LLM
llm = ChatNVIDIA(model=model_id, temperature=0)

prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing relevance 
    of a retrieved document to a user question. If the document contains keywords related to the user question, 
    grade it as relevant. It does not need to be a stringent test. The goal is to filter out erroneous retrievals. 
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. 
    Provide the binary score as a JSON with a single key 'score' and no premable or explanation.
     <|eot_id|><|start_header_id|>user<|end_header_id|>
    Here is the retrieved document: 

 {document} 


    Here is the user question: {question} 
 <|eot_id|><|start_header_id|>assistant<|end_header_id|>    """,
    input_variables=["question", "document"],
)

retrieval_grader = prompt | llm | JsonOutputParser()
question = "agent memory"
docs = retriever.invoke(question)
doc_txt = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))
```

--------------------------------

### URL Validation Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/database.md

Demonstrates how to use the is_valid_url function to check if a given string is a properly formatted URL. It returns True for valid URLs and False otherwise.

```python
from chatui.utils import database

is_valid = database.is_valid_url("https://docs.nvidia.com/ai-workbench/")
print(is_valid)  # True

is_valid = database.is_valid_url("invalid-url")
print(is_valid)  # False
```

--------------------------------

### Safe URL Loading Example

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/database.md

Shows how to use the safe_load function to load and parse content from a URL. It handles potential errors during loading and returns a list of Document objects or None if an error occurs.

```python
from chatui.utils import database

docs = database.safe_load("https://docs.nvidia.com/ai-workbench/")
if docs:
    print(f"Loaded {len(docs)} documents")
else:
    print("Failed to load URL")
```

--------------------------------

### NVIDIA API Key Configuration

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/code/langgraph_rag_agent_llama3_nvidia_nim.ipynb

Prompt the user for their NVIDIA API key if it's not already set as an environment variable. Validates that the key starts with 'nvapi-'.

```python
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import getpass
import os

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key
```

--------------------------------

### Hallucination Grader Setup

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/code/langgraph_rag_agent_llama3_nvidia_nim.ipynb

Configures a hallucination grader using an NVIDIA NIM LLM and a prompt template. This grader assesses if the generated answer is supported by the provided documents.

```python
# LLM
llm = ChatNVIDIA(model=model_id, temperature=0)

# Prompt
prompt = PromptTemplate(
    template=""" <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing whether 
    an answer is grounded in / supported by a set of facts. Give a binary 'yes' or 'no' score to indicate 
    whether the answer is grounded in / supported by a set of facts. Provide the binary score as a JSON with a 
    single key 'score' and no preamble or explanation. <|eot_id|><|start_header_id|>user<|end_header_id|>
    Here are the facts:
    \n ------- \n
    {documents} 
    \n ------- \n
    Here is the answer: {generation}  <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["generation", "documents"],
)

hallucination_grader = prompt | llm | JsonOutputParser()
hallucination_grader.invoke({"documents": docs, "generation": generation})
```

--------------------------------

### Integrate CustomChatOpenAI in a RAG Chain

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/CustomChatOpenAI.md

Demonstrates initializing CustomChatOpenAI with a custom endpoint and integrating it into a Retrieval Augmented Generation (RAG) chain using LangChain. This setup allows for custom LLM backends in RAG pipelines.

```python
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from chatui.utils.nim import CustomChatOpenAI
from langchain.schema import Document

# Initialize LLM with NIM
llm = CustomChatOpenAI(
    custom_endpoint="agentic-rag-local-nim-1",
    port="8000",
    model_name="meta/llama-3.1-8b-instruct",
    temperature=0.7
)

# Create RAG chain
prompt = PromptTemplate(
    template="Context: {context}\n\nQuestion: {question}\n\nAnswer:",
    input_variables=["context", "question"]
)

rag_chain = prompt | llm | StrOutputParser()

# Execute chain
documents = [
    Document(page_content="AI Workbench is NVIDIA's IDE for AI..."),
    Document(page_content="It provides GPU acceleration...")
]

context = "\n".join([doc.page_content for doc in documents])
response = rag_chain.invoke({
    "context": context,
    "question": "What is AI Workbench?"
})

print(response)
```

--------------------------------

### Entrypoint

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

Handles CLI and server bootstrap, including parsing command-line arguments, loading configuration, and initiating the server startup sequence.

```APIDOC
## Entrypoint

### Description
Serves as the main entry point for the application, handling both CLI and server startup.

### Responsibilities
- Parses command-line arguments.
- Loads application configuration.
- Manages the server startup sequence.
```

--------------------------------

### Run Application with Configuration File

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Shows how to specify a configuration file for the application. This allows for custom settings to be loaded.

```bash
python -m chatui --config /path/to/config.json
```

--------------------------------

### Print AppConfig Help

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Prints comprehensive configuration help documentation to a specified output. Use this to understand available configuration options and their defaults.

```python
from chatui.configuration import AppConfig
import sys

AppConfig.print_help(sys.stdout.write)
```

--------------------------------

### Display Configuration Help

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Use this command to print the format help for the configuration file and exit.

```bash
python -m chatui --help-config
```

--------------------------------

### Load AppConfig from File and Initialize ChatClient

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Demonstrates loading application configuration from a file and initializing a ChatClient. Assumes configuration is loaded via APP_CONFIG_FILE environment variable or a default path.

```python
from chatui.configuration import AppConfig
from chatui.chat_client import ChatClient
from chatui import pages
import os

# Load configuration
config_file = os.environ.get("APP_CONFIG_FILE", "/dev/null")
config = AppConfig.from_file(config_file)

if not config:
    raise RuntimeError("Failed to load configuration")

# Build API URL from config
api_url = f"{config.server_url}:{config.server_port}"

# Initialize chat client
client = ChatClient(api_url, config.model_name)

# Build UI pages
blocks = pages.converse.build_page(client)
```

--------------------------------

### AppConfig.print_help

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Prints comprehensive configuration help documentation, detailing each configuration field, its default value, type, and corresponding environment variable.

```APIDOC
## Class Method: print_help

### Description
Prints comprehensive configuration help documentation. It iterates through dataclass fields and displays their JSON key name, default value, help text, type information, and environment variable name.

### Method Signature
```python
@classmethod
def print_help(
    cls,
    help_printer: Callable[[str], Any],
    env_parent: Optional[str] = None,
    json_parent: Optional[Tuple[str, ...]] = None,
) -> None
```

### Parameters
#### Arguments
- **help_printer** (Callable) - Required - Function to write help text (e.g., `sys.stdout.write`)
- **env_parent** (Optional[str]) - Optional - Used internally for recursion
- **json_parent** (Optional[Tuple[str, ...]]) - Optional - Used internally for recursion

### Example
```python
from chatui.configuration import AppConfig
import sys

AppConfig.print_help(sys.stdout.write)
```
```

--------------------------------

### Set Verbosity to DEBUG

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Example of setting the verbosity to DEBUG level using multiple `-v` flags.

```bash
# Verbosity = 2 (DEBUG)
python -m chatui -vv
```

--------------------------------

### Basic Application Execution

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Demonstrates the basic command to run the chat UI application from the command line.

```bash
python -m chatui
```

--------------------------------

### Run ChatUI Application

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

Command-line instructions for running the chatui application with different options, including basic usage, configuration file, help, and debug logging.

```bash
python -m chatui
```

```bash
python -m chatui --config /path/to/config.json
```

```bash
python -m chatui --help-config
```

```bash
python -m chatui -vv  # Debug logging
```

--------------------------------

### Set Verbosity to WARN only

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Example of setting the verbosity to WARN level by decreasing verbosity using the `-q` flag.

```bash
# Verbosity = -1 (WARN only)
python -m chatui -q
```

--------------------------------

### Handle TavilyAPIError in Web Search

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/errors.md

Example of catching TavilyAPIError during a web search operation. Use this to gracefully handle search failures.

```python
from chatui.utils.graph import TavilyAPIError

try:
    state = graph.web_search(state)
except TavilyAPIError as e:
    print(f"Web search failed: {e}")
    # Handle gracefully: skip web search, use only docs
```

--------------------------------

### Increase Graph Recursion Limit

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/errors.md

Example of setting the RECURSION_LIMIT environment variable to manage LangGraph recursion depth. Use this to prevent GraphRecursionError.

```bash
# Increase recursion limit
export RECURSION_LIMIT=20
python -m chatui
```

--------------------------------

### Initialize ChatClient

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/ChatClient.md

Instantiate the ChatClient with the server URL and the desired model name.

```python
from chatui.chat_client import ChatClient

class ChatClient:
    def __init__(self, server_url: str, model_name: str) -> None: ...
```

--------------------------------

### Configure Structured Logging

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/logger.md

Sets up basic structured logging to a file named 'chatui.log' using a specified format and log level. This is configured via bootstrap_logging().

```python
logging.basicConfig(
    filename='chatui.log',
    format=_LOG_FMT,  # "[PID] TIMESTAMP [LEVEL] - LOGGER - MESSAGE"
    level=log_level
)
```

--------------------------------

### Handle RuntimeError in Configuration Loading

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/errors.md

Example of catching RuntimeError when loading application configuration fails. Ensure configuration data is valid and accessible.

```python
from chatui.configuration import AppConfig

try:
    config = AppConfig.from_file("/invalid/path.json")
    if not config:
        raise RuntimeError("Failed to load configuration")
except RuntimeError as e:
    print(f"Configuration error: {e}")
```

--------------------------------

### Handle ValueError for GPU Incompatibility

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/errors.md

Example of catching ValueError when initializing CustomChatOpenAI with incompatible GPU configurations. Verify GPU compatibility and configuration.

```python
from chatui.utils.nim import CustomChatOpenAI

try:
    llm = CustomChatOpenAI(
        custom_endpoint="localhost",
        gpu_type="A100",
        gpu_count="2",
        model_name="meta/llama-3.3-70b-instruct"
    )
except ValueError as e:
    print(f"GPU incompatibility: {e}")
```

--------------------------------

### Load Application Configuration

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Loads the application configuration from a specified file. Exits if configuration loading fails.

```python
config = configuration.AppConfig.from_file(config_file)
if not config:
    sys.exit(1)  # Exit code 1 for config failure
```

--------------------------------

### Set Verbosity to Higher DEBUG Level

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Example of setting the verbosity to a higher DEBUG level (e.g., 3) using multiple `-v` flags.

```bash
# Verbosity = 3 (DEBUG)
python -m chatui -vvv
```

--------------------------------

### Initialize Chroma Vector Store

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/types.md

Instantiate the Chroma vector store, specifying collection name, embedding function, and persistence directory. Defaults are provided for convenience.

```python
from langchain_community.vectorstores import Chroma

class Chroma(VectorStore):
    def __init__(
        self,
        collection_name: str = "rag-chroma",
        embedding_function: Embeddings = None,
        persist_directory: str = "/project/data"
    ): ...
```

--------------------------------

### RAG Chain for Generation

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/code/langgraph_rag_agent_llama3_nvidia_nim.ipynb

Implements the RAG chain for generating answers. It uses a prompt template to guide the LLM with retrieved context and formats the output.

```python
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Prompt
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question} 
    Context: {context} 
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "document"],
)

llm = ChatNVIDIA(model=model_id, temperature=0)


# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


# Chain
rag_chain = prompt | llm | StrOutputParser()

# Run
question = "agent memory"
docs = retriever.invoke(question)
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)
```

--------------------------------

### Assets

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

Handles UI theming, including loading the Kaizen theme and applying CSS styling.

```APIDOC
## Assets

### Description
Manages static assets for UI theming.

### Features
- Loads the Kaizen theme.
- Applies CSS styling for the user interface.
```

--------------------------------

### from_dict Class Method

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Creates an AppConfig instance from a dictionary. Environment variables can override values provided in the dictionary.

```APIDOC
## Class Method: from_dict

### Description
Creates AppConfig from dictionary, with environment variable override.

### Method Signature
```python
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> AppConfig
```

### Parameters
#### Path Parameters
- **data** (Dict[str, Any]) - Required - Configuration dictionary

### Returns
- `AppConfig` — Configuration instance

### Raises
- `RuntimeError` — If data is not a dictionary

### Process
1. Validates input is dictionary
2. Iterates through `cls.envvars()` (list of supported env vars)
3. For each env var set, parses and updates dictionary at configured path
4. Binds LoadMeta with CAMEL case key transformation
5. Creates instance from merged data

### Example
```python
from chatui.configuration import AppConfig

data = {
    "serverUrl": "http://api.local",
    "serverPort": "8080",
    "modelName": "mistral-mixtral"
}

config = AppConfig.from_dict(data)
```
```

--------------------------------

### Configure and Launch Gradio Application

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Enables queuing for the Gradio blocks and launches the application on a specified server address and port. The root path can also be configured.

```python
blocks.queue(max_size=10)
blocks.launch(
    server_name="0.0.0.0",
    server_port=8080,
    root_path=proxy_prefix
)
```

--------------------------------

### Initialize ChatNVIDIA Model

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/types.md

Set up the ChatNVIDIA model for interacting with NVIDIA cloud endpoints. Key parameters include the model ID, temperature, and API key for authentication.

```python
from langchain_nvidia_ai_endpoints import ChatNVIDIA

class ChatNVIDIA(BaseChatModel):
    def __init__(
        self,
        model: str,
        temperature: float = 0.7,
        api_key: str = None
    ): ...
```

--------------------------------

### Configure and Test NIM Container API

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/agentic-rag-docs/self-host.md

Instructions to check NIM container logs and test its API endpoint using curl. The model is configured automatically on container start.

```bash
# Check the container logs

docker logs nim

# Test the API endpoint

curl -X POST http://localhost:<remote-port>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello, NIM!"}
    ]
  }'
```

--------------------------------

### Run API Server with Uvicorn

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/configuration.md

Initialize the APIServer with a ChatClient instance and configure its routes. The server can then be run using uvicorn, specifying host, port, and worker count.

```python
api_server = APIServer(client=chat_client)
api_server.configure_routes()

# Run with uvicorn
# uvicorn chatui.api:app --host 0.0.0.0 --port 8080 --workers 1
```

--------------------------------

### AppConfig Loading

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

Provides methods to load application configuration from a file or a dictionary. Also includes utilities for printing help and accessing environment variables.

```python
AppConfig.from_file(filepath: str)
```

```python
AppConfig.from_dict(data: Dict)
```

```python
AppConfig.print_help(help_printer: Callable)
```

```python
AppConfig.envvars()
```

--------------------------------

### Run Application with SSL Enabled

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Enables SSL for the application server by providing paths to the SSL key and certificate files. This is crucial for secure communication.

```bash
python -m chatui --ssl-keyfile /path/to/key.pem --ssl-certfile /path/to/cert.pem
```

--------------------------------

### configure_routes

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/APIServer.md

Configures all HTTP routes for the server, mounts Gradio applications, and sets up static file serving.

```APIDOC
## configure_routes

### Description
Configures all HTTP routes for the server, mounts Gradio applications, and sets up static file serving.

### Method
configure_routes

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```python
from chatui.api import APIServer
from chatui.chat_client import ChatClient
import uvicorn

client = ChatClient("http://localhost:8000", "llama-3.3-70b")
api_server = APIServer(client)
api_server.configure_routes()

# Start server
uvicorn.run(api_server, host="0.0.0.0", port=8080, workers=1)
```

### Response
#### Success Response (200)
None

#### Response Example
None

### Routes Configured:
- GET `/` → Serves `converse.html` (main chat interface)
- GET `/converse` → Serves `converse.html` (chat page)
- GET `/kb` → Serves `kb.html` (knowledge base management page)
- POST `/content/call/` → Gradio chat interface endpoint
- POST `/content/kb/call/` → Gradio knowledge base endpoint
- GET `/*` → Static file serving from `/chatui/static/`

### Mounted Gradio Apps:
- Path: `/content/` → Chat interface (converse page)
- Path: `/content/kb/` → Knowledge base management (kb page)
```

--------------------------------

### Custom Router Prompt Variant

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/prompts.md

A custom variant of the router prompt, tailored for questions specifically about AI Workbench. It guides the model to choose between web search or a knowledge base based on the nature of the query.

```text
Determine the best data source for this question about AI Workbench:

Web search: Use for recent news, updates, or trending AI/ML topics
Knowledge base: Use for AI Workbench documentation, configuration, and usage

Question: {question}
```

--------------------------------

### Create Chat API Client

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Initializes the chat client using the server URL, port, and model name from the loaded configuration.

```python
api_url = f"{config.server_url}:{config.server_port}"
client = chat_client.ChatClient(api_url, config.model_name)
```

--------------------------------

### Get Retriever from Chroma

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/database.md

Creates and returns a LangChain retriever instance from the Chroma vector store. This retriever is used for querying the vector database and fetching relevant documents. It defaults to retrieving the top 4 results.

```python
from chatui.utils import database

retriever = database.get_retriever()
docs = retriever.invoke("How do I install AI Workbench?")
print(f"Retrieved {len(docs)} documents")

for doc in docs:
    print(doc.page_content)
```

--------------------------------

### from_file Class Method

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Loads application configuration from a specified JSON or YAML file. It automatically detects the file format and merges environment variables for overrides.

```APIDOC
## Class Method: from_file

### Description
Loads configuration from JSON or YAML file with automatic format detection.

### Method Signature
```python
@classmethod
def from_file(cls, filepath: str) -> Optional[AppConfig]
```

### Parameters
#### Path Parameters
- **filepath** (str) - Required - Path to config file (JSON or YAML format)

### Returns
- `Optional[AppConfig]` — AppConfig instance or None if file cannot be loaded

### Process
1. Attempts to open file with UTF-8 encoding
2. Reads and parses JSON or YAML
3. Merges environment variables (highest priority)
4. Creates AppConfig via `from_dict()`
5. Returns config or None on error

### Error Handling
- FileNotFoundError → logs error, returns None
- PermissionError → logs error, returns None
- JSON/YAML parse errors → logs detailed errors
- Missing required fields → logs error, returns None
- Invalid field values → logs error, returns None

### Example
```python
from chatui.configuration import AppConfig

# Load from file
config = AppConfig.from_file("/path/to/config.json")

if config:
    print(f"Server: {config.server_url}:{config.server_port}")
    print(f"Model: {config.model_name}")
else:
    print("Failed to load configuration")
```

### Supported Formats

JSON:
```json
{
  "serverUrl": "http://localhost",
  "serverPort": "8000",
  "serverPrefix": "/rag-api/",
  "modelName": "llama-3.3-70b"
}
```

YAML:
```yaml
serverUrl: http://localhost
serverPort: "8000"
serverPrefix: /rag-api/
modelName: llama-3.3-70b
```
```

--------------------------------

### APIServer Initialization

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

Initializes the APIServer with a ChatClient instance. This server handles API requests.

```python
APIServer(client: ChatClient)
```

--------------------------------

### Main Application Execution Flow

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

The main script entry point for the application. It orchestrates argument parsing, configuration loading, client initialization, UI building, and server launching.

```python
# Main script entry point
if __name__ == "__main__":
    # 1. Parse CLI arguments
    args = parse_args()
    
    # 2. Configure logging verbosity
    os.environ["APP_VERBOSITY"] = f"{args.verbose - args.quiet}"
    os.environ["APP_CONFIG_FILE"] = args.config
    
    # 3. Load config
    from chatui import api, chat_client, configuration, pages
    
    config_file = os.environ.get("APP_CONFIG_FILE", "/dev/null")
    config = configuration.AppConfig.from_file(config_file)
    if not config:
        sys.exit(1)
    
    # 4. Connect to backend
    api_url = f"{config.server_url}:{config.server_port}"
    client = chat_client.ChatClient(api_url, config.model_name)
    proxy_prefix = os.environ.get("PROXY_PREFIX")
    
    # 5. Build and launch UI
    blocks = pages.converse.build_page(client)
    blocks.queue(max_size=10)
    blocks.launch(
        server_name="0.0.0.0",
        server_port=8080,
        root_path=proxy_prefix
    )
```

--------------------------------

### Create AppConfig from Dictionary

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Creates an AppConfig instance from a dictionary. Environment variables are automatically merged to override dictionary values.

```python
from chatui.configuration import AppConfig

data = {
    "serverUrl": "http://api.local",
    "serverPort": "8080",
    "modelName": "mistral-mixtral"
}

config = AppConfig.from_dict(data)
```

--------------------------------

### Run Application on Custom Host and Port

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/entrypoint.md

Specifies a custom hostname and port for the application server. Useful for avoiding conflicts or for specific network configurations.

```bash
python -m chatui --host localhost --port 9000
```

--------------------------------

### CustomChatOpenAI Constructor

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/CustomChatOpenAI.md

Initializes the CustomChatOpenAI model. Supports specifying the custom endpoint, port, model name, GPU details, and temperature. Raises ValueError for incompatible GPU configurations.

```python
def __init__(
    self,
    custom_endpoint: str,
    port: str = "8000",
    model_name: str = "meta/llama-3.1-8b-instruct",
    gpu_type: Optional[str] = None,
    gpu_count: Optional[str] = None,
    temperature: float = 0.0,
    **kwargs
) -> None
```

--------------------------------

### ChatClient Constructor

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/ChatClient.md

Initializes the ChatClient with the server URL and model name.

```APIDOC
## Class: ChatClient

Client for communicating with the Agentic RAG backend API service. Handles document search requests, model inference, and document uploads.

### Constructor

```python
def __init__(self, server_url: str, model_name: str) -> None:
```

#### Parameters

- **server_url** (str) - Required - Base URL of the chat API server, e.g., `http://localhost:8000`
- **model_name** (str) - Required - Friendly name identifier of the LLM model being used
```

--------------------------------

### Load AppConfig from File

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Loads application configuration from a specified file (JSON or YAML). Environment variables are merged with the highest priority. Returns an AppConfig instance or None if loading fails.

```python
from chatui.configuration import AppConfig

# Load from file
config = AppConfig.from_file("/path/to/config.json")

if config:
    print(f"Server: {config.server_url}:{config.server_port}")
    print(f"Model: {config.model_name}")
else:
    print("Failed to load configuration")
```

--------------------------------

### Question Router Prompt Template

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/code/langgraph_rag_agent_llama3_nvidia_nim.ipynb

Sets up a prompt template for a question router. This LLM determines whether to use a vector store or web search based on the user's question. It's configured to output a JSON with a 'datasource' key. Questions about LLM agents, prompt engineering, and adversarial attacks are routed to the vector store.

```python
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at routing a 
    user question to a vectorstore or web search. Use the vectorstore for questions on LLM  agents, 
    prompt engineering, and adversarial attacks. You do not need to be stringent with the keywords 
    in the question related to these topics. Otherwise, use web-search. Give a binary choice 'web_search' 
    or 'vectorstore' based on the question. Return the a JSON with a single key 'datasource' and 
    no premable or explanation. Question to route: {question} <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question"],
)

question_router = prompt | llm | JsonOutputParser()
question = "llm agent memory"
docs = retriever.get_relevant_documents(question)
doc_txt = docs[1].page_content
print(question_router.invoke({"question": question}))
```

--------------------------------

### Retrieve AppConfig Environment Variables

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/api-reference/AppConfig.md

Returns a list of valid environment variables and their corresponding configuration paths. Useful for understanding how environment variables map to configuration settings.

```python
from chatui.configuration import AppConfig

env_vars = AppConfig.envvars()
for env_name, path, field_type in env_vars:
    print(f"{env_name} → {path} (type: {field_type})")
```

--------------------------------

### AppConfig Loading

Source: https://github.com/nvidia/workbench-example-agentic-rag/blob/main/_autodocs/README.md

Provides methods to load application configuration from a file or a dictionary, and utilities for environment variables and help printing.

```APIDOC
## AppConfig

### Description
Provides methods to load application configuration from a file or a dictionary, and utilities for environment variables and help printing.

### Methods
- **from_file(filepath: str)**: Loads configuration from a file.
- **from_dict(data: Dict)**: Loads configuration from a dictionary.
- **print_help(help_printer: Callable)**: Prints configuration help.
- **envvars()**: Retrieves configuration from environment variables.
```