### Start Mock Server with Prism Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Set up a mock server using Prism, which is required for running most tests. Ensure you have npm installed and provide the path to your OpenAPI specification file. ```sh npx prism mock path/to/your/openapi.yml ``` -------------------------------- ### Run Example Script Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Execute your custom example script. This command assumes the script has execute permissions and is located in the examples directory. ```sh ./examples/.py ``` -------------------------------- ### Make Example Executable Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Grant execute permissions to your example script. This allows you to run it directly from the command line. ```sh chmod +x examples/.py ``` -------------------------------- ### Install Llama API Client Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Install the library using pip. ```sh pip install llama-api-client ``` -------------------------------- ### Install with aiohttp backend Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Install the library with the aiohttp extra for improved concurrency. ```sh # install from the production repo pip install 'llama_api_client[aiohttp] @ git+ssh://git@github.com/meta-llama/llama-api-python.git' ``` -------------------------------- ### Install from Local Wheel File Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Install the library using a locally built wheel file. Replace './path-to-wheel-file.whl' with the actual path to the generated .whl file in the dist/ directory. ```sh pip install ./path-to-wheel-file.whl ``` -------------------------------- ### Add and Run Python Example Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Create a new Python file in the 'examples/' directory and make it executable to run against your API. The shebang line ensures it runs with the correct Python interpreter via Rye. ```python # add an example to examples/.py #!/usr/bin/env -S rye run python ``` -------------------------------- ### Install from Git Repository Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Install the Llama API Python library directly from its Git repository using pip. This is useful for using the latest development version. ```sh pip install git+ssh://git@github.com/meta-llama/llama-api-python.git ``` -------------------------------- ### Run All Tests Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Execute the project's test suite. This command assumes the mock server is running and all necessary dependencies are installed. ```sh ./scripts/test ``` -------------------------------- ### Install Dependencies with Pip Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Install development dependencies using pip if not using Rye. Ensure a virtual environment is created and the Python version matches .python-version. ```sh pip install -r requirements-dev.lock ``` -------------------------------- ### Async Client with aiohttp Backend Source: https://context7.com/meta-llama/llama-api-python/llms.txt Use the async client with aiohttp for improved concurrency. Ensure aiohttp is installed. ```python import asyncio from llama_api_client import AsyncLlamaAPIClient, DefaultAioHttpClient async def main(): async with AsyncLlamaAPIClient( http_client=DefaultAioHttpClient(), ) as client: response = await client.chat.completions.create( model="Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": "Hello"}], ) print(response.completion_message.content.text) asyncio.run(main()) ``` -------------------------------- ### Sync Dependencies with Rye Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Install all project dependencies, including features, using Rye. Ensure Rye is installed manually if not using the bootstrap script. ```sh rye sync --all-features ``` -------------------------------- ### Determine Installed Llama API Python SDK Version Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Use this snippet to check the currently installed version of the llama_api_client library in your Python environment. Ensure you have the library installed. ```python import llama_api_client print(llama_api_client.__version__) ``` -------------------------------- ### List and Retrieve Llama Models Source: https://context7.com/meta-llama/llama-api-python/llms.txt Retrieve a list of all available Llama models and their basic information. You can also get details for a specific model by its ID. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() models = client.models.list() for model in models: print(f"Model: {model.id}") # Get specific model details model = client.models.retrieve("Llama-4-Maverick-17B-128E-Instruct-FP8") print(model) ``` -------------------------------- ### Format Code and Fix Issues Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Automatically format the code and fix linting issues using Ruff and Black. This command ensures code consistency and adherence to style guides. ```sh ./scripts/format ``` -------------------------------- ### Access Raw Response Data Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Prefix HTTP method calls with `.with_raw_response.` to access the raw `APIResponse` object, which includes headers. Use `.parse()` to get the standard completion object. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() response = client.chat.completions.with_raw_response.create( messages=[{ "content": "string", "role": "user", }], model="model", ) print(response.headers.get('X-My-Header')) completion = response.parse() # get the object that `chat.completions.create()` would have returned print(completion.id) ``` -------------------------------- ### Bootstrap Project with Rye Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Run this command to bootstrap the project using Rye, which manages dependencies and Python versions automatically. ```sh ./scripts/bootstrap ``` -------------------------------- ### Initiate and Upload File Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Initiates an upload session and then uploads a file part. Ensure the file path is correct and the MIME type matches the file content. ```python from pathlib import Path from llama_api_client import LlamaAPIClient client = LlamaAPIClient() num_bytes = Path("/path/to/file").stat().st_size # 1. initiate upload session r = client.uploads.create( bytes=num_bytes, filename="simpleqa.jsonl", mime_type="application/jsonl", purpose="messages_finetune", ) # 2. upload part client.uploads.part( upload_id=r.id, data=Path("/path/to/file"), ) ``` -------------------------------- ### Build Distribution Package Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Create distributable package files (.tar.gz and .whl) for the library. This command can be run using Rye or the standard Python build module. ```sh rye build ``` ```sh python -m build ``` -------------------------------- ### Async Client with aiohttp Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Instantiate the asynchronous client with DefaultAioHttpClient for improved concurrency. The API key is read from the LLAMA_API_KEY environment variable by default. ```python import os import asyncio from llama_api_client import DefaultAioHttpClient from llama_api_client import AsyncLlamaAPIClient async def main() -> None: async with AsyncLlamaAPIClient( api_key=os.environ.get("LLAMA_API_KEY"), # This is the default and can be omitted http_client=DefaultAioHttpClient(), ) as client: create_chat_completion_response = await client.chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", ) print(create_chat_completion_response.completion_message) asyncio.run(main()) ``` -------------------------------- ### Upload Files for Fine-tuning Source: https://context7.com/meta-llama/llama-api-python/llms.txt Upload files for fine-tuning purposes by initiating an upload session, uploading file parts, and then checking the status. Ensure the file path and MIME type are correct. ```python from pathlib import Path from llama_api_client import LlamaAPIClient client = LlamaAPIClient() file_path = Path("/path/to/training_data.jsonl") num_bytes = file_path.stat().st_size # Step 1: Initiate upload session upload_response = client.uploads.create( bytes=num_bytes, filename="training_data.jsonl", mime_type="application/jsonl", purpose="messages_finetune", ) print(f"Upload ID: {upload_response.id}") # Step 2: Upload file data client.uploads.part( upload_id=upload_response.id, data=file_path, ) # Step 3: Check upload status status = client.uploads.get(upload_response.id) print(f"Upload status: {status}") ``` -------------------------------- ### Run Linter Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Execute the linter to check code quality and style according to the project's standards. This uses Ruff for linting. ```sh ./scripts/lint ``` -------------------------------- ### Configure HTTP Client with Proxies and Transports Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Customize the underlying `httpx` client by passing a `DefaultHttpxClient` instance with proxy and transport configurations to the `LlamaAPIClient` constructor. ```python import httpx from llama_api_client import LlamaAPIClient, DefaultHttpxClient client = LlamaAPIClient( # Or use the `LLAMA_API_CLIENT_BASE_URL` env var base_url="http://my.test.server.example.com:8083", http_client=DefaultHttpxClient( proxy="http://my.test.proxy.example.com", transport=httpx.HTTPTransport(local_address="0.0.0.0"), ), ) ``` -------------------------------- ### Initialize Llama API Client Source: https://context7.com/meta-llama/llama-api-python/llms.txt Initialize the LlamaAPIClient. The API key can be provided directly or read from the LLAMA_API_KEY environment variable. Custom base URLs and timeouts can also be configured. ```python import os from llama_api_client import LlamaAPIClient # Using environment variable (recommended) client = LlamaAPIClient() # Or explicitly passing the API key client = LlamaAPIClient( api_key=os.environ.get("LLAMA_API_KEY"), timeout=20.0, # Default is 10 minutes max_retries=2, # Default is 2 ) # Configure custom base URL client = LlamaAPIClient( base_url="http://custom.api.endpoint.com", ) ``` -------------------------------- ### Configure Custom HTTP Client Source: https://context7.com/meta-llama/llama-api-python/llms.txt Configure a custom httpx client for advanced use cases, including proxies and custom transports. This allows for fine-grained control over network communication. ```python import httpx from llama_api_client import LlamaAPIClient, DefaultHttpxClient client = LlamaAPIClient( base_url="http://my.test.server.example.com:8083", http_client=DefaultHttpxClient( proxy="http://my.test.proxy.example.com", transport=httpx.HTTPTransport(local_address="0.0.0.0"), ), ) ``` -------------------------------- ### Configure Retries and Timeouts Source: https://context7.com/meta-llama/llama-api-python/llms.txt Configure retry behavior and timeouts at the client level or per-request using `with_options()`. This allows control over network request resilience and responsiveness. ```python import httpx from llama_api_client import LlamaAPIClient # Client-level configuration client = LlamaAPIClient( max_retries=0, # Disable retries timeout=20.0, # 20 second timeout ) # Fine-grained timeout control client = LlamaAPIClient( timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0), ) # Per-request override response = client.with_options(max_retries=5, timeout=30.0).chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[{"role": "user", "content": "Hello"}], ) ``` -------------------------------- ### Manage HTTP Resources with Context Managers Source: https://context7.com/meta-llama/llama-api-python/llms.txt Use context managers to ensure HTTP connections are properly closed after use. This is the recommended way to manage client resources. ```python from llama_api_client import LlamaAPIClient with LlamaAPIClient() as client: response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[{"role": "user", "content": "Hello"}], ) print(response.completion_message.content.text) # HTTP client is automatically closed ``` -------------------------------- ### Asynchronous Client Usage Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Instantiate the asynchronous client and make a chat completion request using await. The API key is read from the LLAMA_API_KEY environment variable by default. ```python import os import asyncio from llama_api_client import AsyncLlamaAPIClient client = AsyncLlamaAPIClient( api_key=os.environ.get("LLAMA_API_KEY"), # This is the default and can be omitted ) async def main() -> None: create_chat_completion_response = await client.chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", ) print(create_chat_completion_response.completion_message) asyncio.run(main()) ``` -------------------------------- ### Synchronous Client Usage Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Instantiate the synchronous client and make a chat completion request. The API key is read from the LLAMA_API_KEY environment variable by default. ```python import os from llama_api_client import LlamaAPIClient client = LlamaAPIClient( api_key=os.environ.get("LLAMA_API_KEY"), # This is the default and can be omitted ) create_chat_completion_response = client.chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", ) print(create_chat_completion_response.completion_message) ``` -------------------------------- ### Publish Manually to PyPI Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Manually release a package to PyPI by running the 'bin/publish-pypi' script. Ensure the PYPI_TOKEN environment variable is set with your PyPI API token. ```sh # Ensure PYPI_TOKEN is set in your environment # Example: export PYPI_TOKEN='your_pypi_token' # Then run: # bin/publish-pypi ``` -------------------------------- ### Publish to PyPI with GitHub Action Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Utilize the 'Publish PyPI' GitHub action to release the package to PyPI. This requires setting up organization or repository secrets for authentication. ```sh # This is a conceptual representation, actual execution is via GitHub Actions UI/config ``` -------------------------------- ### Enable Logging via Environment Variable Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Enables logging for the Llama API client by setting the LLAMA_API_CLIENT_LOG environment variable. Use 'info' for general logs or 'debug' for verbose output. ```shell $ export LLAMA_API_CLIENT_LOG=info ``` -------------------------------- ### Async Chat Completions Source: https://context7.com/meta-llama/llama-api-python/llms.txt Utilize the AsyncLlamaAPIClient for non-blocking I/O in asynchronous applications. The async client mirrors the synchronous client's functionality using `await` syntax. ```python import asyncio from llama_api_client import AsyncLlamaAPIClient client = AsyncLlamaAPIClient() async def main(): # Non-streaming response = await client.chat.completions.create( model="Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": "Hello"}], ) print(response.completion_message.content.text) # Streaming stream = await client.chat.completions.create( model="Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": "Hello"}], stream=True, ) async for chunk in stream: print(chunk.event.delta.text, end="", flush=True) asyncio.run(main()) ``` -------------------------------- ### Manage HTTP Client with Context Manager Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Use `LlamaAPIClient` as a context manager (`with LlamaAPIClient() as client:`) to ensure the underlying HTTP resources are properly closed upon exiting the block. ```python from llama_api_client import LlamaAPIClient with LlamaAPIClient() as client: # make requests here ... ``` -------------------------------- ### Tool/Function Calling Source: https://context7.com/meta-llama/llama-api-python/llms.txt Enable the model to call external functions by defining tools. The model returns structured tool call requests based on user input. ```python import json from llama_api_client import LlamaAPIClient client = LlamaAPIClient() def get_weather(location: str) -> str: return f"The weather in {location} is sunny." tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogota, Colombia", } }, "required": ["location"], "additionalProperties": False, }, "strict": True, }, } ] messages = [{"role": "user", "content": "Is it raining in Bellevue?"}] # First call - model decides to use tool response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=messages, tools=tools, max_completion_tokens=2048, ) # Process tool calls messages.append(response.completion_message.model_dump()) for tool_call in response.completion_message.tool_calls: if tool_call.function.name == "get_weather": args = json.loads(tool_call.function.arguments) result = get_weather(**args) messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result, }) # Second call - model processes tool results final_response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=messages, tools=tools, max_completion_tokens=2048, ) print(final_response.completion_message.content.text) ``` -------------------------------- ### Activate Virtual Environment Source: https://github.com/meta-llama/llama-api-python/blob/main/CONTRIBUTING.md Activate the project's virtual environment to run Python scripts without the 'rye run' prefix. This uses the standard Python venv activation method. ```sh source .venv/bin/activate ``` -------------------------------- ### Streaming Responses (Synchronous) Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Make a chat completion request with streaming enabled and iterate over the response chunks. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() stream = client.chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", stream=True, ) for chunk in stream: print(chunk.event.delta.text, end="", flush=True) ``` -------------------------------- ### Streaming Responses (Asynchronous) Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Make an asynchronous chat completion request with streaming enabled and iterate over the response chunks. ```python from llama_api_client import AsyncLlamaAPIClient client = AsyncLlamaAPIClient() stream = await client.chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", stream=True, ) async for chunk in stream: print(chunk.event.delta.text, end="", flush=True) ``` -------------------------------- ### Make Undocumented POST Requests Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Use `client.post` for undocumented endpoints. Specify `cast_to=httpx.Response` to receive the raw `httpx.Response` object. ```python import httpx response = client.post( "/foo", cast_to=httpx.Response, body={"my_param": True}, ) print(response.headers.get("x-foo")) ``` -------------------------------- ### Configure Granular Timeout Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Uses an httpx.Timeout object to configure specific timeout durations for connect, read, and write operations. This provides more control over request timing. ```python from llama_api_client import LlamaAPIClient import httpx client = LlamaAPIClient( timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0), ) ``` -------------------------------- ### Handle API Errors Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Demonstrates how to catch and handle different types of API errors, including connection issues and non-success status codes. Specific error types like RateLimitError are also handled. ```python import llama_api_client from llama_api_client import LlamaAPIClient client = LlamaAPIClient() try: client.chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", ) except llama_api_client.APIConnectionError as e: print("The server could not be reached") print(e.__cause__) # an underlying Exception, likely raised within httpx. except llama_api_client.RateLimitError as e: print("A 429 status code was received; we should back off a bit.") except llama_api_client.APIStatusError as e: print("Another non-200-range status code was received") print(e.status_code) print(e.response) ``` -------------------------------- ### Models API Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md This API provides methods to list available models and retrieve details about a specific model. ```APIDOC ## GET /models/{model} ### Description Retrieves details about a specific Llama model. ### Method GET ### Endpoint /models/{model} ### Parameters #### Path Parameters - **model** (string) - Required - The ID of the model to retrieve. ### Response #### Success Response (200) - **response** (LlamaModel) - An object containing details about the specified model. #### Response Example ```json { "id": "llama3-8b", "object": "model", "created": 1677610602, "owned_by": "meta" } ``` ``` ```APIDOC ## GET /models ### Description Lists all available Llama models. ### Method GET ### Endpoint /models ### Response #### Success Response (200) - **response** (ModelListResponse) - A list of available models. #### Response Example ```json { "object": "list", "data": [ { "id": "llama3-8b", "object": "model", "created": 1677610602, "owned_by": "meta" }, { "id": "llama3-70b", "object": "model", "created": 1677610602, "owned_by": "meta" } ] } ``` ``` -------------------------------- ### Differentiate None from Null/Missing Fields Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Explains how to distinguish between a field explicitly set to null and a field that is missing entirely in an API response using the `.model_fields_set` attribute. ```python if response.my_field is None: if 'my_field' not in response.model_fields_set: print('Got json like {}, without a "my_field" key present at all.') else: print('Got json like {"my_field": null}.') ``` -------------------------------- ### Chat Completions API Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md This API allows you to create chat completions by sending messages to the model. It supports streaming responses for real-time interaction. ```APIDOC ## POST /chat/completions ### Description Creates a chat completion request. This endpoint can be used to generate text-based responses from the Llama model based on a conversation history. ### Method POST ### Endpoint /chat/completions ### Parameters #### Request Body - **params** (object) - Required - Parameters for creating a chat completion, including messages, model, and streaming options. ### Request Example ```json { "messages": [ {"role": "user", "content": "Hello!"} ], "model": "llama3-8b", "stream": false } ``` ### Response #### Success Response (200) - **response** (CreateChatCompletionResponse) - The response object containing the chat completion. #### Response Example ```json { "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "llama3-8b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello there! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25 } } ``` ``` -------------------------------- ### Stream Chat Completion Response Source: https://context7.com/meta-llama/llama-api-python/llms.txt Enable real-time streaming of chat completion responses using Server-Side Events (SSE). The `stream=True` option returns an iterator yielding response chunks. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() stream = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[ { "role": "user", "content": "Write a short poem about Python programming.", } ], max_completion_tokens=1024, temperature=0.7, stream=True, ) for chunk in stream: print(chunk.event.delta.text, end="", flush=True) # Output streams character by character ``` -------------------------------- ### Configure Default Timeout Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Sets a default timeout for all API requests. The timeout value is in seconds and can be a float or an httpx.Timeout object for more granular control. ```python from llama_api_client import LlamaAPIClient # Configure the default for all requests: client = LlamaAPIClient( # 20 seconds (default is 10 minutes) timeout=20.0, ) ``` -------------------------------- ### Uploads API Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md This API allows for the creation and retrieval of file uploads, including multi-part uploads. ```APIDOC ## POST /uploads ### Description Creates a new file upload. ### Method POST ### Endpoint /uploads ### Parameters #### Request Body - **params** (object) - Required - Parameters for creating an upload, such as file name and purpose. ### Response #### Success Response (200) - **response** (UploadCreateResponse) - The response object containing details of the created upload. #### Response Example ```json { "id": "file-abc123xyz", "object": "upload", "filename": "my_document.pdf", "purpose": "assistants" } ``` ``` ```APIDOC ## GET /uploads/{upload_id} ### Description Retrieves details about a specific file upload. ### Method GET ### Endpoint /uploads/{upload_id} ### Parameters #### Path Parameters - **upload_id** (string) - Required - The ID of the upload to retrieve. ### Response #### Success Response (200) - **response** (UploadGetResponse) - An object containing details about the specified upload. #### Response Example ```json { "id": "file-abc123xyz", "object": "upload", "filename": "my_document.pdf", "purpose": "assistants", "status": "uploaded" } ``` ``` ```APIDOC ## POST /uploads/{upload_id} ### Description Uploads a part of a file for a multi-part upload. ### Method POST ### Endpoint /uploads/{upload_id} ### Parameters #### Path Parameters - **upload_id** (string) - Required - The ID of the upload to which the part belongs. #### Request Body - **params** (object) - Required - Parameters for uploading a part, including the file content and part number. ### Response #### Success Response (200) - **response** (UploadPartResponse) - The response object confirming the part upload. #### Response Example ```json { "id": "file-abc123xyz", "object": "upload", "part": 1, "status": "part_uploaded" } ``` ``` -------------------------------- ### Structured Output with JSON Schema Source: https://context7.com/meta-llama/llama-api-python/llms.txt Force the model to output responses conforming to a JSON schema using Pydantic. This is useful for extracting structured data. ```python from pydantic import BaseModel from llama_api_client import LlamaAPIClient client = LlamaAPIClient() class Address(BaseModel): street: str city: str state: str zip: str response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[ { "role": "system", "content": "You are a helpful assistant. Summarize the address in a JSON object.", }, { "role": "user", "content": "123 Main St, Anytown, USA", }, ], temperature=0.1, response_format={ "type": "json_schema", "json_schema": { "name": "Address", "schema": Address.model_json_schema(), }, }, ) address = Address.model_validate_json(response.completion_message.content.text) print(address) ``` -------------------------------- ### Generate Chat Completion Source: https://context7.com/meta-llama/llama-api-python/llms.txt Generate a chat completion response using the Llama model. Specify messages, model, and parameters like max tokens and temperature. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[ { "role": "user", "content": "Hello, how are you?", } ], max_completion_tokens=1024, temperature=0.7, ) print(response.completion_message.content.text) # Output: I'm doing well, thank you for asking! How can I assist you today? ``` -------------------------------- ### Configure Default Retries Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Sets the default number of retries for all subsequent API requests. Setting max_retries to 0 disables retries. ```python from llama_api_client import LlamaAPIClient # Configure the default for all requests: client = LlamaAPIClient( # default is 2 max_retries=0, ) ``` -------------------------------- ### Handle API Errors with Typed Exceptions Source: https://context7.com/meta-llama/llama-api-python/llms.txt The SDK provides typed exceptions for different error scenarios based on HTTP status codes, allowing for precise error handling. Catch specific exceptions like APIConnectionError, RateLimitError, AuthenticationError, BadRequestError, and APIStatusError. ```python import llama_api_client from llama_api_client import LlamaAPIClient client = LlamaAPIClient() try: response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[{"role": "user", "content": "Hello"}], ) except llama_api_client.APIConnectionError as e: print("The server could not be reached") print(e.__cause__) except llama_api_client.RateLimitError as e: print("Rate limit exceeded, backing off...") print(f"Status: {e.status_code}") except llama_api_client.AuthenticationError as e: print("Invalid API key") except llama_api_client.BadRequestError as e: print("Invalid request parameters") except llama_api_client.APIStatusError as e: print(f"API error: {e.status_code}") print(e.response) ``` -------------------------------- ### Stream Response Data Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Use `.with_streaming_response.` with a context manager to stream the response body. Read content using methods like `.iter_lines()` after the response is received. ```python with client.chat.completions.with_streaming_response.create( messages=[ { "content": "string", "role": "user", } ], model="model", ) as response: print(response.headers.get("X-My-Header")) for line in response.iter_lines(): print(line) ``` -------------------------------- ### Configure Per-Request Retries Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Overrides the default retry settings for a specific API call. This allows for fine-grained control over retries on a per-request basis. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() # Or, configure per-request: client.with_options(max_retries=5).chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", ) ``` -------------------------------- ### Vision - Image Analysis Source: https://context7.com/meta-llama/llama-api-python/llms.txt Analyze images by providing base64-encoded data or URLs. The model can process multiple images and answer questions about their content. ```python import base64 from llama_api_client import LlamaAPIClient client = LlamaAPIClient() def encode_image(image_path: str) -> str: with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") encoded_image = encode_image("photo.png") response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What is in this image?", }, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{encoded_image}", }, }, ], }, ], ) print(response.completion_message.content.text) ``` -------------------------------- ### Moderations API Source: https://github.com/meta-llama/llama-api-python/blob/main/api.md This API allows you to moderate content to ensure it complies with safety guidelines. ```APIDOC ## POST /moderations ### Description Creates a moderation request to check if content violates safety policies. ### Method POST ### Endpoint /moderations ### Parameters #### Request Body - **params** (object) - Required - Parameters for moderation, including the input text and model. ### Response #### Success Response (200) - **response** (ModerationCreateResponse) - The response object indicating whether the content is flagged and the reasons. #### Response Example ```json { "id": "mod-abc123xyz", "object": "moderation", "model": "text-moderation-latest", "results": [ { "flagged": false, "categories": { "hate": false, "harassment": false, "self-harm": false, "sexual": false, "violence": false }, "category_scores": { "hate": 0.001, "harassment": 0.0005, "self-harm": 0.0001, "sexual": 0.0002, "violence": 0.0003 } } ] } ``` ``` -------------------------------- ### Access Raw HTTP Response Data Source: https://context7.com/meta-llama/llama-api-python/llms.txt Access raw HTTP response data, including headers, by using the `with_raw_response` prefix. This allows inspection of request IDs and status codes before parsing the response body. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() response = client.chat.completions.with_raw_response.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[{"role": "user", "content": "Hello"}], ) print(response.headers.get("X-Request-ID")) print(response.status_code) # Parse the response body completion = response.parse() print(completion.completion_message.content.text) ``` -------------------------------- ### Multi-turn Chat Conversation Source: https://context7.com/meta-llama/llama-api-python/llms.txt Maintain a multi-turn conversation by including previous messages and assistant responses in subsequent requests. The previous completion message can be directly appended to the messages list. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() # First turn response = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[{"role": "user", "content": "What is the capital of France?"}] ) # Second turn - include previous response response2 = client.chat.completions.create( model="Llama-4-Maverick-17B-128E-Instruct-FP8", messages=[ {"role": "user", "content": "What is the capital of France?"}, response.completion_message, # Previous assistant response {"role": "user", "content": "What is its population?"}, ], max_completion_tokens=1024, ) print(response2.completion_message.content.text) ``` -------------------------------- ### Override Per-Request Timeout Source: https://github.com/meta-llama/llama-api-python/blob/main/README.md Applies a specific timeout duration to a single API call, overriding any default timeout configurations. This is useful for time-sensitive operations. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() # Override per-request: client.with_options(timeout=5.0).chat.completions.create( messages=[ { "content": "string", "role": "user", } ], model="model", ) ``` -------------------------------- ### Classify Messages for Harmful Content Source: https://context7.com/meta-llama/llama-api-python/llms.txt Use the moderation endpoint to analyze messages and classify them for potentially harmful content. This is useful for ensuring user-generated content adheres to safety guidelines. ```python from llama_api_client import LlamaAPIClient client = LlamaAPIClient() # Safe content response = client.moderations.create( messages=[ {"role": "user", "content": "Hello, how are you?"} ], ) print(response) # Output: ModerationCreateResponse with safe classification # Potentially unsafe content response = client.moderations.create( messages=[ {"role": "user", "content": "How to make dangerous items?"} ], ) print(response) # Output: ModerationCreateResponse with unsafe classification and categories ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.