### Running the Inference Gateway Tools Example in Bash

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/tools/README.md

This snippet provides instructions on how to run the Inference Gateway tools example. It shows how to optionally set the LLM model name using an environment variable and then execute the main Python script for the example.

```bash
# Set the model (optional, defaults to openai/gpt-4)
export LLM_NAME="openai/gpt-4"

# Run the example
python examples/tools/main.py
```

--------------------------------

### Running the LLM Listing Example (Bash)

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/list/README.md

This command executes the `main.py` script, which uses the Inference Gateway Python SDK to list available LLMs. It runs the example with default settings, demonstrating basic usage.

```bash
python main.py
```

--------------------------------

### Running the Python Example - Bash

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/chat/README.md

This command executes the main Python script, `main.py`, which contains the chat completion examples. It assumes the necessary environment variables, like `LLM_NAME`, have already been set.

```bash
python main.py
```

--------------------------------

### Basic Chat Completion with Inference Gateway Python SDK

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This example demonstrates how to initialize the Inference Gateway client and perform a simple chat completion. It shows importing necessary classes, setting up the client with a base URL, and sending a basic user message to a specified model (e.g., 'openai/gpt-4') to get a response.

```python
from inference_gateway import InferenceGatewayClient, Message

# Initialize client
client = InferenceGatewayClient("http://localhost:8080/v1")

# Simple chat completion
response = client.create_chat_completion(
    model="openai/gpt-4",
    messages=[
        Message(role="system", content="You are a helpful assistant"),
        Message(role="user", content="Hello!")
    ]
)

print(response.choices[0].message.content)
```

--------------------------------

### Installing Inference Gateway Python SDK

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet provides the command-line instruction for installing the Inference Gateway Python SDK using pip, the standard package installer for Python.

```sh
pip install inference-gateway
```

--------------------------------

### Running the MCP Example Script

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/mcp/README.md

This command executes the main Python script for the Model Context Protocol (MCP) example. It requires the Inference Gateway to be configured with MCP enabled (`MCP_ENABLE=true`) and exposed (`MCP_EXPOSE=true`) environment variables.

```bash
python main.py
```

--------------------------------

### Running the LLM Listing Example with Specific Provider (Bash)

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/list/README.md

This command executes the `main.py` script, setting the `LLM_NAME` environment variable to `openai/gpt-4`. This filters the listed LLMs to only those from the specified provider, showcasing how to filter models.

```bash
LLM_NAME=openai/gpt-4 python main.py
```

--------------------------------

### Setting LLM_NAME Environment Variable - Bash

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/chat/README.md

This command sets the `LLM_NAME` environment variable, which specifies the large language model to be used by the Inference Gateway SDK. This is a prerequisite for running the Python examples.

```bash
export LLM_NAME="groq/meta-llama/llama-4-scout-17b-16e-instruct"
```

--------------------------------

### Listing Models with Inference Gateway Python SDK

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This example demonstrates how to retrieve a list of available models from the Inference Gateway. It shows how to fetch all models and how to filter the list to display models from a specific provider, such as 'openai'.

```python
# List all available models
models = client.list_models()
print("All models:", models)

# Filter by provider
openai_models = client.list_models(provider="openai")
print("OpenAI models:", openai_models)
```

--------------------------------

### Setting LLM Provider Environment Variable (Shell)

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/README.md

This snippet sets the `LLM_NAME` environment variable, specifying the desired Large Language Model provider and model to be used by the Inference Gateway. This configuration is crucial for the examples to function correctly, allowing the gateway to connect to the specified LLM.

```sh
export LLM_NAME=groq/meta-llama/llama-4-scout-17b-16e-instruct
```

--------------------------------

### Performing Standard Chat Completion - Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/chat/README.md

This Python snippet demonstrates how to perform a standard, non-streaming chat completion using the `InferenceGatewayClient`. It sends a system and user message to the specified model and prints the complete response content.

```python
from inference_gateway import InferenceGatewayClient, Message

client = InferenceGatewayClient("http://localhost:8080/v1")

response = client.create_chat_completion(
    model="groq/meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        Message(role="system", content="You are a helpful assistant"),
        Message(role="user", content="Hello! Please introduce yourself briefly."),
    ],
    max_tokens=100,
)

print(response.choices[0].message.content)
```

--------------------------------

### Running Inference Gateway Docker Container (Shell)

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/README.md

This command runs the Inference Gateway as a Docker container, mapping port 8080 from the container to the host. It uses the `.env` file for environment variables and passes the `LLM_NAME` variable, ensuring the gateway is configured with the specified LLM provider.

```sh
docker run --rm -it -p 8080:8080 --env-file .env -e $LLM_NAME ghcr.io/inference-gateway/inference-gateway:0.7.1
```

--------------------------------

### Configuring Custom HTTP Headers for Inference Gateway Client in Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This example demonstrates how to initialize the `InferenceGatewayClient` with custom HTTP headers. This is useful for scenarios requiring specific headers, such as authentication tokens or custom routing information, to be sent with every request made by the client.

```python
# With custom headers
client = InferenceGatewayClient(
    "http://localhost:8080/v1",
    headers={"X-Custom-Header": "value"}
)
```

--------------------------------

### Performing Streaming Chat Completion - Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/chat/README.md

This Python snippet illustrates how to handle streaming chat completions. It iterates over chunks received from the API, parses them as JSON, and attempts to unmarshal them into structured models for type-safe access, printing content as it arrives.

```python
from inference_gateway import InferenceGatewayClient, Message
from inference_gateway.models import SSEvent, CreateChatCompletionStreamResponse
from pydantic import ValidationError
import json

client = InferenceGatewayClient("http://localhost:8080/v1")

stream = client.create_chat_completion_stream(
    model="groq/meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        Message(role="system", content="You are a helpful assistant"),
        Message(role="user", content="Tell me a short story."),
    ],
    max_tokens=200,
)

for chunk in stream:
    if chunk.data:
        try:
            # Parse the raw JSON data
            data = json.loads(chunk.data)

            # Unmarshal to structured model for type safety
            try:
                structured_chunk = CreateChatCompletionStreamResponse.model_validate(data)

                # Use the structured model for better type safety and IDE support
                if structured_chunk.choices and len(structured_chunk.choices) > 0:
                    choice = structured_chunk.choices[0]
                    if hasattr(choice.delta, 'content') and choice.delta.content:
                        print(choice.delta.content, end="", flush=True)

            except ValidationError:
                # Fallback to manual parsing for non-standard chunks
                if "choices" in data and len(data["choices"]) > 0:
                    delta = data["choices"][0].get("delta", {})
                    if "content" in delta and delta["content"]:
                        print(delta["content"], end="", flush=True)

        except json.JSONDecodeError:
            pass
```

--------------------------------

### Handling Inference Gateway Errors - Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/chat/README.md

This snippet demonstrates how to implement error handling for calls to the Inference Gateway SDK. It catches specific exceptions like `InferenceGatewayAPIError` (for API-related issues) and `InferenceGatewayError` (for SDK-related issues) to gracefully manage potential failures.

```python
from inference_gateway.client import InferenceGatewayAPIError, InferenceGatewayError

try:
    response = client.create_chat_completion(...)
except (InferenceGatewayAPIError, InferenceGatewayError) as e:
    print(f"Error: {e}")
```

--------------------------------

### Making Proxy Requests with Inference Gateway Python SDK

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet illustrates how to use the `proxy_request` method to forward a direct API call to a specific AI provider (e.g., OpenAI) through the Inference Gateway. It demonstrates making a GET request to list models on the OpenAI API.

```python
# Proxy request to OpenAI's API
response = client.proxy_request(
    provider="openai",
    path="/v1/models",
    method="GET"
)

print("OpenAI models:", response)
```

--------------------------------

### Defining a Weather Tool using Pydantic in Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/examples/tools/README.md

This Python snippet demonstrates how to define a 'weather' tool using type-safe Pydantic models from the Inference Gateway SDK. It specifies the tool's name, description, and parameters (location, unit) using a JSON schema-like structure, ensuring type safety, IDE support, and automatic validation for the tool definition.

```python
from inference_gateway.models import ChatCompletionTool, FunctionObject, FunctionParameters

weather_tool = ChatCompletionTool(
    type="function",
    function=FunctionObject(
        name="get_current_weather",
        description="Get the current weather in a given location",
        parameters=FunctionParameters(
            type="object",
            properties={
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use"
                }
            },
            required=["location"]
        )
    )
)
```

--------------------------------

### Performing Health Checks with Inference Gateway Python SDK

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This example demonstrates how to check the operational status of the Inference Gateway API using the `health_check` method. It returns a boolean value indicating whether the API is currently healthy and available.

```python
if client.health_check():
    print("API is healthy")
else:
    print("API is unavailable")
```

--------------------------------

### Performing Streaming Chat Completion with Inference Gateway Python SDK

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This example demonstrates how to handle streaming chat completions, iterating over server-sent events (SSEvent objects). It includes robust error handling for JSON parsing and uses Pydantic models for type-safe unmarshalling of structured chunks, with a fallback for non-standard or partial data.

```python
from inference_gateway.models import CreateChatCompletionStreamResponse
from pydantic import ValidationError
import json

# Streaming returns SSEvent objects
for chunk in client.create_chat_completion_stream(
    model="ollama/llama2",
    messages=[
        Message(role="user", content="Tell me a story")
    ]
):
    if chunk.data:
        try:
            # Parse the raw JSON data
            data = json.loads(chunk.data)

            # Unmarshal to structured model for type safety
            try:
                structured_chunk = CreateChatCompletionStreamResponse.model_validate(data)

                # Use the structured model for better type safety and IDE support
                if structured_chunk.choices and len(structured_chunk.choices) > 0:
                    choice = structured_chunk.choices[0]
                    if hasattr(choice.delta, 'content') and choice.delta.content:
                        print(choice.delta.content, end="", flush=True)

            except ValidationError:
                # Fallback to manual parsing for non-standard chunks
                if "choices" in data and len(data["choices"]) > 0:
                    delta = data["choices"][0].get("delta", {})
                    if "content" in delta and delta["content"]:
                        print(delta["content"], end="", flush=True)

        except json.JSONDecodeError:
            pass
```

--------------------------------

### Configuring Inference Gateway Python Client

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet illustrates various ways to configure the Inference Gateway client. It covers basic initialization, adding an API token for authentication, setting a custom request timeout, and optionally configuring the client to use 'httpx' instead of the default 'requests' library for HTTP operations.

```python
from inference_gateway import InferenceGatewayClient

# Basic configuration
client = InferenceGatewayClient("http://localhost:8080/v1")

# With authentication
client = InferenceGatewayClient(
    "http://localhost:8080/v1",
    token="your-api-token",
    timeout=60.0  # Custom timeout
)

# Using httpx instead of requests
client = InferenceGatewayClient(
    "http://localhost:8080/v1",
    use_httpx=True
)
```

--------------------------------

### Listing Available MCP Tools in Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet illustrates how to retrieve a list of available Model Context Protocol (MCP) tools from the Inference Gateway. This functionality is particularly useful for UI applications that need to display connected tools to users, but it requires specific server-side configurations (`MCP_ENABLE` and `MCP_EXPOSE`) on the gateway.

```python
# List available MCP tools (requires MCP_ENABLE and MCP_EXPOSE to be set on the gateway)
tools = client.list_tools()
print("Available tools:", tools)
```

--------------------------------

### Configuring Proxy Settings for Inference Gateway Client in Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet shows how to configure proxy settings when initializing the `InferenceGatewayClient`. By specifying a proxy, all client requests will be routed through the designated proxy server, which is essential for network environments requiring proxy usage.

```python
# With proxy settings
client = InferenceGatewayClient(
    "http://localhost:8080/v1",
    proxies={"http": "http://proxy.example.com"}
)
```

--------------------------------

### Defining a Chat Completion Tool (Weather) in Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This code defines a type-safe weather tool using Pydantic models provided by the SDK, specifically `ChatCompletionTool`, `FunctionObject`, and `FunctionParameters`. It specifies the tool's name, description, and the structure of its input parameters, including types, descriptions, and required fields, enabling the model to understand and utilize the tool.

```python
# Define a weather tool using type-safe Pydantic models
from inference_gateway.models import ChatCompletionTool, FunctionObject, FunctionParameters

weather_tool = ChatCompletionTool(
    type="function",
    function=FunctionObject(
        name="get_current_weather",
        description="Get the current weather in a given location",
        parameters=FunctionParameters(
            type="object",
            properties={
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use"
                }
            },
            required=["location"]
        )
    )
)
```

--------------------------------

### Using Defined Tools in Chat Completion with Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet demonstrates how to integrate a previously defined tool into a chat completion request by passing it to the `tools` parameter. It also shows how to inspect the model's response to check if a tool call was made and extract the function name and arguments, facilitating dynamic interaction based on the LLM's output.

```python
# Using tools in a chat completion
response = client.create_chat_completion(
    model="openai/gpt-4",
    messages=[
        Message(role="system", content="You are a helpful assistant with access to weather information"),
        Message(role="user", content="What is the weather like in New York?")
    ],
    tools=[weather_tool]  # Pass the tool definition
)

print(response.choices[0].message.content)

# Check if the model made a tool call
if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        print(f"Tool called: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")
```

--------------------------------

### Performing Standard Chat Completion with Inference Gateway Python SDK

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet shows how to perform a non-streaming chat completion request using the Inference Gateway client. It defines system and user messages, specifies the model, and sets an optional maximum token limit for the response, then prints the generated content.

```python
from inference_gateway import Message

response = client.create_chat_completion(
    model="openai/gpt-4",
    messages=[
        Message(role="system", content="You are a helpful assistant"),
        Message(role="user", content="Explain quantum computing")
    ],
    max_tokens=500
)

print(response.choices[0].message.content)
```

--------------------------------

### Handling Inference Gateway API Errors in Python

Source: https://github.com/inference-gateway/python-sdk/blob/main/README.md

This snippet demonstrates how to implement robust error handling using the SDK's specific exception types. It shows how to catch `InferenceGatewayAPIError` for API-related issues, `InferenceGatewayValidationError` for input validation failures, and a general `InferenceGatewayError` for other SDK-related problems, providing detailed error information.

```python
try:
    response = client.create_chat_completion(...)
except InferenceGatewayAPIError as e:
    print(f"API Error: {e} (Status: {e.status_code})")
    print("Response:", e.response_data)
except InferenceGatewayValidationError as e:
    print(f"Validation Error: {e}")
except InferenceGatewayError as e:
    print(f"General Error: {e}")
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.