### Start Mock Server with Prism

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Set up a mock server using Prism, which is required for running most tests. Ensure you have npm installed and provide the path to your OpenAPI specification file.

```sh
npx prism mock path/to/your/openapi.yml
```

--------------------------------

### Run Example Script

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Execute your custom example script. This command assumes the script has execute permissions and is located in the examples directory.

```sh
./examples/<your-example>.py
```

--------------------------------

### Make Example Executable

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Grant execute permissions to your example script. This allows you to run it directly from the command line.

```sh
chmod +x examples/<your-example>.py
```

--------------------------------

### Install Llama API Client

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Install the library using pip.

```sh
pip install llama-api-client
```

--------------------------------

### Install with aiohttp backend

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Install the library with the aiohttp extra for improved concurrency.

```sh
# install from the production repo
pip install 'llama_api_client[aiohttp] @ git+ssh://git@github.com/meta-llama/llama-api-python.git'
```

--------------------------------

### Install from Local Wheel File

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Install the library using a locally built wheel file. Replace './path-to-wheel-file.whl' with the actual path to the generated .whl file in the dist/ directory.

```sh
pip install ./path-to-wheel-file.whl
```

--------------------------------

### Add and Run Python Example

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Create a new Python file in the 'examples/' directory and make it executable to run against your API. The shebang line ensures it runs with the correct Python interpreter via Rye.

```python
# add an example to examples/<your-example>.py

#!/usr/bin/env -S rye run python

```

--------------------------------

### Install from Git Repository

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Install the Llama API Python library directly from its Git repository using pip. This is useful for using the latest development version.

```sh
pip install git+ssh://git@github.com/meta-llama/llama-api-python.git
```

--------------------------------

### Run All Tests

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Execute the project's test suite. This command assumes the mock server is running and all necessary dependencies are installed.

```sh
./scripts/test
```

--------------------------------

### Install Dependencies with Pip

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Install development dependencies using pip if not using Rye. Ensure a virtual environment is created and the Python version matches .python-version.

```sh
pip install -r requirements-dev.lock
```

--------------------------------

### Async Client with aiohttp Backend

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Use the async client with aiohttp for improved concurrency. Ensure aiohttp is installed.

```python
import asyncio
from llama_api_client import AsyncLlamaAPIClient, DefaultAioHttpClient

async def main():
    async with AsyncLlamaAPIClient(
        http_client=DefaultAioHttpClient(),
    ) as client:
        response = await client.chat.completions.create(
            model="Llama-3.3-70B-Instruct",
            messages=[{"role": "user", "content": "Hello"}],
        )
        print(response.completion_message.content.text)

asyncio.run(main())
```

--------------------------------

### Sync Dependencies with Rye

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Install all project dependencies, including features, using Rye. Ensure Rye is installed manually if not using the bootstrap script.

```sh
rye sync --all-features
```

--------------------------------

### Determine Installed Llama API Python SDK Version

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Use this snippet to check the currently installed version of the llama_api_client library in your Python environment. Ensure you have the library installed.

```python
import llama_api_client
print(llama_api_client.__version__)
```

--------------------------------

### List and Retrieve Llama Models

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Retrieve a list of all available Llama models and their basic information. You can also get details for a specific model by its ID.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

models = client.models.list()
for model in models:
    print(f"Model: {model.id}")

# Get specific model details
model = client.models.retrieve("Llama-4-Maverick-17B-128E-Instruct-FP8")
print(model)
```

--------------------------------

### Format Code and Fix Issues

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Automatically format the code and fix linting issues using Ruff and Black. This command ensures code consistency and adherence to style guides.

```sh
./scripts/format
```

--------------------------------

### Access Raw Response Data

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Prefix HTTP method calls with `.with_raw_response.` to access the raw `APIResponse` object, which includes headers. Use `.parse()` to get the standard completion object.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()
response = client.chat.completions.with_raw_response.create(
    messages=[{
        "content": "string",
        "role": "user",
    }],
    model="model",
)
print(response.headers.get('X-My-Header'))

completion = response.parse()  # get the object that `chat.completions.create()` would have returned
print(completion.id)
```

--------------------------------

### Bootstrap Project with Rye

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Run this command to bootstrap the project using Rye, which manages dependencies and Python versions automatically.

```sh
./scripts/bootstrap
```

--------------------------------

### Initiate and Upload File

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Initiates an upload session and then uploads a file part. Ensure the file path is correct and the MIME type matches the file content.

```python
from pathlib import Path
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

num_bytes = Path("/path/to/file").stat().st_size

# 1. initiate upload session
r = client.uploads.create(
    bytes=num_bytes,
    filename="simpleqa.jsonl",
    mime_type="application/jsonl",
    purpose="messages_finetune",
)

# 2. upload part
client.uploads.part(
    upload_id=r.id,
    data=Path("/path/to/file"),
)
```

--------------------------------

### Build Distribution Package

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Create distributable package files (.tar.gz and .whl) for the library. This command can be run using Rye or the standard Python build module.

```sh
rye build
```

```sh
python -m build
```

--------------------------------

### Async Client with aiohttp

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Instantiate the asynchronous client with DefaultAioHttpClient for improved concurrency. The API key is read from the LLAMA_API_KEY environment variable by default.

```python
import os
import asyncio
from llama_api_client import DefaultAioHttpClient
from llama_api_client import AsyncLlamaAPIClient


async def main() -> None:
    async with AsyncLlamaAPIClient(
        api_key=os.environ.get("LLAMA_API_KEY"),  # This is the default and can be omitted
        http_client=DefaultAioHttpClient(),
    ) as client:
        create_chat_completion_response = await client.chat.completions.create(
            messages=[
                {
                    "content": "string",
                    "role": "user",
                }
            ],
            model="model",
        )
        print(create_chat_completion_response.completion_message)


asyncio.run(main())
```

--------------------------------

### Upload Files for Fine-tuning

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Upload files for fine-tuning purposes by initiating an upload session, uploading file parts, and then checking the status. Ensure the file path and MIME type are correct.

```python
from pathlib import Path
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

file_path = Path("/path/to/training_data.jsonl")
num_bytes = file_path.stat().st_size

# Step 1: Initiate upload session
upload_response = client.uploads.create(
    bytes=num_bytes,
    filename="training_data.jsonl",
    mime_type="application/jsonl",
    purpose="messages_finetune",
)

print(f"Upload ID: {upload_response.id}")

# Step 2: Upload file data
client.uploads.part(
    upload_id=upload_response.id,
    data=file_path,
)

# Step 3: Check upload status
status = client.uploads.get(upload_response.id)
print(f"Upload status: {status}")
```

--------------------------------

### Run Linter

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Execute the linter to check code quality and style according to the project's standards. This uses Ruff for linting.

```sh
./scripts/lint
```

--------------------------------

### Configure HTTP Client with Proxies and Transports

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Customize the underlying `httpx` client by passing a `DefaultHttpxClient` instance with proxy and transport configurations to the `LlamaAPIClient` constructor.

```python
import httpx
from llama_api_client import LlamaAPIClient, DefaultHttpxClient

client = LlamaAPIClient(
    # Or use the `LLAMA_API_CLIENT_BASE_URL` env var
    base_url="http://my.test.server.example.com:8083",
    http_client=DefaultHttpxClient(
        proxy="http://my.test.proxy.example.com",
        transport=httpx.HTTPTransport(local_address="0.0.0.0"),
    ),
)
```

--------------------------------

### Initialize Llama API Client

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Initialize the LlamaAPIClient. The API key can be provided directly or read from the LLAMA_API_KEY environment variable. Custom base URLs and timeouts can also be configured.

```python
import os
from llama_api_client import LlamaAPIClient

# Using environment variable (recommended)
client = LlamaAPIClient()

# Or explicitly passing the API key
client = LlamaAPIClient(
    api_key=os.environ.get("LLAMA_API_KEY"),
    timeout=20.0,  # Default is 10 minutes
    max_retries=2,  # Default is 2
)

# Configure custom base URL
client = LlamaAPIClient(
    base_url="http://custom.api.endpoint.com",
)
```

--------------------------------

### Configure Custom HTTP Client

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Configure a custom httpx client for advanced use cases, including proxies and custom transports. This allows for fine-grained control over network communication.

```python
import httpx
from llama_api_client import LlamaAPIClient, DefaultHttpxClient

client = LlamaAPIClient(
    base_url="http://my.test.server.example.com:8083",
    http_client=DefaultHttpxClient(
        proxy="http://my.test.proxy.example.com",
        transport=httpx.HTTPTransport(local_address="0.0.0.0"),
    ),
)
```

--------------------------------

### Configure Retries and Timeouts

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Configure retry behavior and timeouts at the client level or per-request using `with_options()`. This allows control over network request resilience and responsiveness.

```python
import httpx
from llama_api_client import LlamaAPIClient

# Client-level configuration
client = LlamaAPIClient(
    max_retries=0,  # Disable retries
    timeout=20.0,   # 20 second timeout
)

# Fine-grained timeout control
client = LlamaAPIClient(
    timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
)

# Per-request override
response = client.with_options(max_retries=5, timeout=30.0).chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[{"role": "user", "content": "Hello"}],
)
```

--------------------------------

### Manage HTTP Resources with Context Managers

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Use context managers to ensure HTTP connections are properly closed after use. This is the recommended way to manage client resources.

```python
from llama_api_client import LlamaAPIClient

with LlamaAPIClient() as client:
    response = client.chat.completions.create(
        model="Llama-4-Maverick-17B-128E-Instruct-FP8",
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(response.completion_message.content.text)

# HTTP client is automatically closed
```

--------------------------------

### Asynchronous Client Usage

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Instantiate the asynchronous client and make a chat completion request using await. The API key is read from the LLAMA_API_KEY environment variable by default.

```python
import os
import asyncio
from llama_api_client import AsyncLlamaAPIClient

client = AsyncLlamaAPIClient(
    api_key=os.environ.get("LLAMA_API_KEY"),  # This is the default and can be omitted
)


async def main() -> None:
    create_chat_completion_response = await client.chat.completions.create(
        messages=[
            {
                "content": "string",
                "role": "user",
            }
        ],
        model="model",
    )
    print(create_chat_completion_response.completion_message)


asyncio.run(main())
```

--------------------------------

### Synchronous Client Usage

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Instantiate the synchronous client and make a chat completion request. The API key is read from the LLAMA_API_KEY environment variable by default.

```python
import os
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient(
    api_key=os.environ.get("LLAMA_API_KEY"),  # This is the default and can be omitted
)

create_chat_completion_response = client.chat.completions.create(
    messages=[
        {
            "content": "string",
            "role": "user",
        }
    ],
    model="model",
)
print(create_chat_completion_response.completion_message)
```

--------------------------------

### Publish Manually to PyPI

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Manually release a package to PyPI by running the 'bin/publish-pypi' script. Ensure the PYPI_TOKEN environment variable is set with your PyPI API token.

```sh
# Ensure PYPI_TOKEN is set in your environment
# Example: export PYPI_TOKEN='your_pypi_token'
# Then run:
# bin/publish-pypi
```

--------------------------------

### Publish to PyPI with GitHub Action

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Utilize the 'Publish PyPI' GitHub action to release the package to PyPI. This requires setting up organization or repository secrets for authentication.

```sh
# This is a conceptual representation, actual execution is via GitHub Actions UI/config
```

--------------------------------

### Enable Logging via Environment Variable

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Enables logging for the Llama API client by setting the LLAMA_API_CLIENT_LOG environment variable. Use 'info' for general logs or 'debug' for verbose output.

```shell
$ export LLAMA_API_CLIENT_LOG=info
```

--------------------------------

### Async Chat Completions

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Utilize the AsyncLlamaAPIClient for non-blocking I/O in asynchronous applications. The async client mirrors the synchronous client's functionality using `await` syntax.

```python
import asyncio
from llama_api_client import AsyncLlamaAPIClient

client = AsyncLlamaAPIClient()

async def main():
    # Non-streaming
    response = await client.chat.completions.create(
        model="Llama-3.3-70B-Instruct",
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(response.completion_message.content.text)

    # Streaming
    stream = await client.chat.completions.create(
        model="Llama-3.3-70B-Instruct",
        messages=[{"role": "user", "content": "Hello"}],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.event.delta.text, end="", flush=True)

asyncio.run(main())
```

--------------------------------

### Manage HTTP Client with Context Manager

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Use `LlamaAPIClient` as a context manager (`with LlamaAPIClient() as client:`) to ensure the underlying HTTP resources are properly closed upon exiting the block.

```python
from llama_api_client import LlamaAPIClient

with LlamaAPIClient() as client:
  # make requests here
  ...
```

--------------------------------

### Tool/Function Calling

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Enable the model to call external functions by defining tools. The model returns structured tool call requests based on user input.

```python
import json
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

def get_weather(location: str) -> str:
    return f"The weather in {location} is sunny."

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country e.g. Bogota, Colombia",
                    }
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    }
]

messages = [{"role": "user", "content": "Is it raining in Bellevue?"}]

# First call - model decides to use tool
response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=messages,
    tools=tools,
    max_completion_tokens=2048,
)

# Process tool calls
messages.append(response.completion_message.model_dump())
for tool_call in response.completion_message.tool_calls:
    if tool_call.function.name == "get_weather":
        args = json.loads(tool_call.function.arguments)
        result = get_weather(**args)
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result,
        })

# Second call - model processes tool results
final_response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=messages,
    tools=tools,
    max_completion_tokens=2048,
)

print(final_response.completion_message.content.text)
```

--------------------------------

### Activate Virtual Environment

Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md

Activate the project's virtual environment to run Python scripts without the 'rye run' prefix. This uses the standard Python venv activation method.

```sh
source .venv/bin/activate
```

--------------------------------

### Streaming Responses (Synchronous)

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Make a chat completion request with streaming enabled and iterate over the response chunks.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

stream = client.chat.completions.create(
    messages=[
        {
            "content": "string",
            "role": "user",
        }
    ],
    model="model",
    stream=True,
)
for chunk in stream:
    print(chunk.event.delta.text, end="", flush=True)
```

--------------------------------

### Streaming Responses (Asynchronous)

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Make an asynchronous chat completion request with streaming enabled and iterate over the response chunks.

```python
from llama_api_client import AsyncLlamaAPIClient

client = AsyncLlamaAPIClient()

stream = await client.chat.completions.create(
    messages=[
        {
            "content": "string",
            "role": "user",
        }
    ],
    model="model",
    stream=True,
)
async for chunk in stream:
    print(chunk.event.delta.text, end="", flush=True)
```

--------------------------------

### Make Undocumented POST Requests

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Use `client.post` for undocumented endpoints. Specify `cast_to=httpx.Response` to receive the raw `httpx.Response` object.

```python
import httpx

response = client.post(
    "/foo",
    cast_to=httpx.Response,
    body={"my_param": True},
)

print(response.headers.get("x-foo"))
```

--------------------------------

### Configure Granular Timeout

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Uses an httpx.Timeout object to configure specific timeout durations for connect, read, and write operations. This provides more control over request timing.

```python
from llama_api_client import LlamaAPIClient
import httpx

client = LlamaAPIClient(
    timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
)
```

--------------------------------

### Handle API Errors

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Demonstrates how to catch and handle different types of API errors, including connection issues and non-success status codes. Specific error types like RateLimitError are also handled.

```python
import llama_api_client
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

try:
    client.chat.completions.create(
        messages=[
            {
                "content": "string",
                "role": "user",
            }
        ],
        model="model",
    )
except llama_api_client.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)  # an underlying Exception, likely raised within httpx.
except llama_api_client.RateLimitError as e:
    print("A 429 status code was received; we should back off a bit.")
except llama_api_client.APIStatusError as e:
    print("Another non-200-range status code was received")
    print(e.status_code)
    print(e.response)
```

--------------------------------

### Models API

Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md

This API provides methods to list available models and retrieve details about a specific model.

```APIDOC
## GET /models/{model}

### Description
Retrieves details about a specific Llama model.

### Method
GET

### Endpoint
/models/{model}

### Parameters
#### Path Parameters
- **model** (string) - Required - The ID of the model to retrieve.

### Response
#### Success Response (200)
- **response** (LlamaModel) - An object containing details about the specified model.

#### Response Example
```json
{
  "id": "llama3-8b",
  "object": "model",
  "created": 1677610602,
  "owned_by": "meta"
}
```
```

```APIDOC
## GET /models

### Description
Lists all available Llama models.

### Method
GET

### Endpoint
/models

### Response
#### Success Response (200)
- **response** (ModelListResponse) - A list of available models.

#### Response Example
```json
{
  "object": "list",
  "data": [
    {
      "id": "llama3-8b",
      "object": "model",
      "created": 1677610602,
      "owned_by": "meta"
    },
    {
      "id": "llama3-70b",
      "object": "model",
      "created": 1677610602,
      "owned_by": "meta"
    }
  ]
}
```
```

--------------------------------

### Differentiate None from Null/Missing Fields

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Explains how to distinguish between a field explicitly set to null and a field that is missing entirely in an API response using the `.model_fields_set` attribute.

```python
if response.my_field is None:
  if 'my_field' not in response.model_fields_set:
    print('Got json like {}, without a "my_field" key present at all.')
  else:
    print('Got json like {"my_field": null}.')
```

--------------------------------

### Chat Completions API

Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md

This API allows you to create chat completions by sending messages to the model. It supports streaming responses for real-time interaction.

```APIDOC
## POST /chat/completions

### Description
Creates a chat completion request. This endpoint can be used to generate text-based responses from the Llama model based on a conversation history.

### Method
POST

### Endpoint
/chat/completions

### Parameters
#### Request Body
- **params** (object) - Required - Parameters for creating a chat completion, including messages, model, and streaming options.

### Request Example
```json
{
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "model": "llama3-8b",
  "stream": false
}
```

### Response
#### Success Response (200)
- **response** (CreateChatCompletionResponse) - The response object containing the chat completion.

#### Response Example
```json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "llama3-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello there! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}
```
```

--------------------------------

### Stream Chat Completion Response

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Enable real-time streaming of chat completion responses using Server-Side Events (SSE). The `stream=True` option returns an iterator yielding response chunks.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

stream = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": "Write a short poem about Python programming.",
        }
    ],
    max_completion_tokens=1024,
    temperature=0.7,
    stream=True,
)

for chunk in stream:
    print(chunk.event.delta.text, end="", flush=True)
# Output streams character by character
```

--------------------------------

### Configure Default Timeout

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Sets a default timeout for all API requests. The timeout value is in seconds and can be a float or an httpx.Timeout object for more granular control.

```python
from llama_api_client import LlamaAPIClient

# Configure the default for all requests:
client = LlamaAPIClient(
    # 20 seconds (default is 10 minutes)
    timeout=20.0,
)
```

--------------------------------

### Uploads API

Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md

This API allows for the creation and retrieval of file uploads, including multi-part uploads.

```APIDOC
## POST /uploads

### Description
Creates a new file upload.

### Method
POST

### Endpoint
/uploads

### Parameters
#### Request Body
- **params** (object) - Required - Parameters for creating an upload, such as file name and purpose.

### Response
#### Success Response (200)
- **response** (UploadCreateResponse) - The response object containing details of the created upload.

#### Response Example
```json
{
  "id": "file-abc123xyz",
  "object": "upload",
  "filename": "my_document.pdf",
  "purpose": "assistants"
}
```
```

```APIDOC
## GET /uploads/{upload_id}

### Description
Retrieves details about a specific file upload.

### Method
GET

### Endpoint
/uploads/{upload_id}

### Parameters
#### Path Parameters
- **upload_id** (string) - Required - The ID of the upload to retrieve.

### Response
#### Success Response (200)
- **response** (UploadGetResponse) - An object containing details about the specified upload.

#### Response Example
```json
{
  "id": "file-abc123xyz",
  "object": "upload",
  "filename": "my_document.pdf",
  "purpose": "assistants",
  "status": "uploaded"
}
```
```

```APIDOC
## POST /uploads/{upload_id}

### Description
Uploads a part of a file for a multi-part upload.

### Method
POST

### Endpoint
/uploads/{upload_id}

### Parameters
#### Path Parameters
- **upload_id** (string) - Required - The ID of the upload to which the part belongs.
#### Request Body
- **params** (object) - Required - Parameters for uploading a part, including the file content and part number.

### Response
#### Success Response (200)
- **response** (UploadPartResponse) - The response object confirming the part upload.

#### Response Example
```json
{
  "id": "file-abc123xyz",
  "object": "upload",
  "part": 1,
  "status": "part_uploaded"
}
```
```

--------------------------------

### Structured Output with JSON Schema

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Force the model to output responses conforming to a JSON schema using Pydantic. This is useful for extracting structured data.

```python
from pydantic import BaseModel
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip: str

response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Summarize the address in a JSON object.",
        },
        {
            "role": "user",
            "content": "123 Main St, Anytown, USA",
        },
    ],
    temperature=0.1,
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Address",
            "schema": Address.model_json_schema(),
        },
    },
)

address = Address.model_validate_json(response.completion_message.content.text)
print(address)
```

--------------------------------

### Generate Chat Completion

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Generate a chat completion response using the Llama model. Specify messages, model, and parameters like max tokens and temperature.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": "Hello, how are you?",
        }
    ],
    max_completion_tokens=1024,
    temperature=0.7,
)

print(response.completion_message.content.text)
# Output: I'm doing well, thank you for asking! How can I assist you today?
```

--------------------------------

### Configure Default Retries

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Sets the default number of retries for all subsequent API requests. Setting max_retries to 0 disables retries.

```python
from llama_api_client import LlamaAPIClient

# Configure the default for all requests:
client = LlamaAPIClient(
    # default is 2
    max_retries=0,
)
```

--------------------------------

### Handle API Errors with Typed Exceptions

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

The SDK provides typed exceptions for different error scenarios based on HTTP status codes, allowing for precise error handling. Catch specific exceptions like APIConnectionError, RateLimitError, AuthenticationError, BadRequestError, and APIStatusError.

```python
import llama_api_client
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

try:
    response = client.chat.completions.create(
        model="Llama-4-Maverick-17B-128E-Instruct-FP8",
        messages=[{"role": "user", "content": "Hello"}],
    )
except llama_api_client.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)
except llama_api_client.RateLimitError as e:
    print("Rate limit exceeded, backing off...")
    print(f"Status: {e.status_code}")
except llama_api_client.AuthenticationError as e:
    print("Invalid API key")
except llama_api_client.BadRequestError as e:
    print("Invalid request parameters")
except llama_api_client.APIStatusError as e:
    print(f"API error: {e.status_code}")
    print(e.response)
```

--------------------------------

### Stream Response Data

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Use `.with_streaming_response.` with a context manager to stream the response body. Read content using methods like `.iter_lines()` after the response is received.

```python
with client.chat.completions.with_streaming_response.create(
    messages=[
        {
            "content": "string",
            "role": "user",
        }
    ],
    model="model",
) as response:
    print(response.headers.get("X-My-Header"))

    for line in response.iter_lines():
        print(line)
```

--------------------------------

### Configure Per-Request Retries

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Overrides the default retry settings for a specific API call. This allows for fine-grained control over retries on a per-request basis.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

# Or, configure per-request:
client.with_options(max_retries=5).chat.completions.create(
    messages=[
        {
            "content": "string",
            "role": "user",
        }
    ],
    model="model",
)
```

--------------------------------

### Vision - Image Analysis

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Analyze images by providing base64-encoded data or URLs. The model can process multiple images and answer questions about their content.

```python
import base64
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

def encode_image(image_path: str) -> str:
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

encoded_image = encode_image("photo.png")

response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is in this image?",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{encoded_image}",
                    },
                },
            ],
        },
    ],
)

print(response.completion_message.content.text)
```

--------------------------------

### Moderations API

Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md

This API allows you to moderate content to ensure it complies with safety guidelines.

```APIDOC
## POST /moderations

### Description
Creates a moderation request to check if content violates safety policies.

### Method
POST

### Endpoint
/moderations

### Parameters
#### Request Body
- **params** (object) - Required - Parameters for moderation, including the input text and model.

### Response
#### Success Response (200)
- **response** (ModerationCreateResponse) - The response object indicating whether the content is flagged and the reasons.

#### Response Example
```json
{
  "id": "mod-abc123xyz",
  "object": "moderation",
  "model": "text-moderation-latest",
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "harassment": false,
        "self-harm": false,
        "sexual": false,
        "violence": false
      },
      "category_scores": {
        "hate": 0.001,
        "harassment": 0.0005,
        "self-harm": 0.0001,
        "sexual": 0.0002,
        "violence": 0.0003
      }
    }
  ]
}
```
```

--------------------------------

### Access Raw HTTP Response Data

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Access raw HTTP response data, including headers, by using the `with_raw_response` prefix. This allows inspection of request IDs and status codes before parsing the response body.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

response = client.chat.completions.with_raw_response.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.headers.get("X-Request-ID"))
print(response.status_code)

# Parse the response body
completion = response.parse()
print(completion.completion_message.content.text)
```

--------------------------------

### Multi-turn Chat Conversation

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Maintain a multi-turn conversation by including previous messages and assistant responses in subsequent requests. The previous completion message can be directly appended to the messages list.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

# First turn
response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

# Second turn - include previous response
response2 = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
        response.completion_message,  # Previous assistant response
        {"role": "user", "content": "What is its population?"},
    ],
    max_completion_tokens=1024,
)

print(response2.completion_message.content.text)
```

--------------------------------

### Override Per-Request Timeout

Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md

Applies a specific timeout duration to a single API call, overriding any default timeout configurations. This is useful for time-sensitive operations.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

# Override per-request:
client.with_options(timeout=5.0).chat.completions.create(
    messages=[
        {
            "content": "string",
            "role": "user",
        }
    ],
    model="model",
)
```

--------------------------------

### Classify Messages for Harmful Content

Source: https://context7.com/meta-llama/llama-api-python/llms.txt

Use the moderation endpoint to analyze messages and classify them for potentially harmful content. This is useful for ensuring user-generated content adheres to safety guidelines.

```python
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

# Safe content
response = client.moderations.create(
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
)
print(response)
# Output: ModerationCreateResponse with safe classification

# Potentially unsafe content
response = client.moderations.create(
    messages=[
        {"role": "user", "content": "How to make dangerous items?"}
    ],
)
print(response)
# Output: ModerationCreateResponse with unsafe classification and categories
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.