### Adding and Running Examples

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Steps to add a new Python example file, make it executable, and run it against the API.

```python
# add an example to examples/<your-example>.py

#!/usr/bin/env -S rye run python
…
```

```sh
$ chmod +x examples/<your-example>.py
# run the example against your api
$ ./examples/<your-example>.py

```

--------------------------------

### Install from Git

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Command to install the SDK directly from a Git repository using pip.

```sh
$ pip install git+ssh://git@github.com/Cerebras/cerebras-cloud-sdk-python.git

```

--------------------------------

### Build and Install from Source

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Commands to build the Python package distributable files (wheel) and install it locally.

```sh
$ rye build
# or
$ python -m build

```

```sh
$ pip install ./path-to-wheel-file.whl

```

--------------------------------

### Setup Environment with Rye

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Commands to bootstrap the project using Rye, sync dependencies, and activate the virtual environment for running Python scripts.

```sh
$ ./scripts/bootstrap

```

```sh
$ rye sync --all-features

```

```sh
# Activate the virtual environment - https://docs.python.org/3/library/venv.html#how-venvs-work
$ source .venv/bin/activate

# now you can omit the `rye run` prefix
$ python script.py

```

--------------------------------

### Mock Server Setup for Tests

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Instructions to set up a mock server using Prism against an OpenAPI specification, which is often required for running tests.

```sh
# you will need npm installed
$ npx prism mock path/to/your/openapi.yml

```

--------------------------------

### Setup Environment without Rye

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Instructions for setting up the project using standard pip, including installing dependencies from a lock file.

```sh
$ pip install -r requirements-dev.lock

```

--------------------------------

### Install and Initialize Async Client

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Installs the Cerebras Cloud SDK with aiohttp support and initializes an asynchronous client with an API key. This client is used to make API calls to the Cerebras Cloud.

```bash
pip install 'cerebras_cloud_sdk[aiohttp] @ git+ssh://git@github.com/Cerebras/cerebras-cloud-sdk-python-private.git'
```

```python
import asyncio
from cerebras.cloud.sdk import DefaultAioHttpClient
from cerebras.cloud.sdk import AsyncCerebras


async def main() -> None:
    async with AsyncCerebras(
        api_key="My API Key",
        http_client=DefaultAioHttpClient(),
    ) as client:
        chat_completion = await client.chat.completions.create(
            messages=[
                {
                    "role": "user",
                    "content": "Why is fast inference important?",
                }
            ],
            model="llama3.1-8b",
        )


asyncio.run(main())
```

--------------------------------

### Install Cerebras Python SDK

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Installs the Cerebras Python SDK using pip. This is the first step to using the SDK for interacting with the Cerebras REST API.

```sh
pip install cerebras_cloud_sdk
```

--------------------------------

### Publishing to PyPI

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Information on publishing packages to PyPI, either through a GitHub workflow or manually.

```sh
# Publish with a GitHub workflow
# You can release to package managers by using [the `Publish PyPI` GitHub action](https://www.github.com/Cerebras/cerebras-cloud-sdk-python/actions/workflows/publish-pypi.yml).

# Publish manually
# If you need to manually release a package, you can run the `bin/publish-pypi` script with a `PYPI_TOKEN` set on the environment.

```

--------------------------------

### Get Cerebras Cloud SDK Version

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

This snippet demonstrates how to import the Cerebras Cloud SDK and print its currently installed version at runtime. This is useful for debugging or verifying an upgrade.

```python
import cerebras.cloud.sdk
print(cerebras.cloud.sdk.__version__)
```

--------------------------------

### Linting and Formatting

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Commands to run linting and formatting checks using Ruff and Black.

```sh
$ ./scripts/lint

```

```sh
$ ./scripts/format

```

--------------------------------

### Running Tests

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/CONTRIBUTING.md

Command to execute the project's test suite.

```sh
$ ./scripts/test

```

--------------------------------

### Enable aiohttp for Async Client

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Instructions to enable aiohttp as the HTTP backend for the asynchronous Cerebras client to improve concurrency performance. This requires installing the aiohttp package.

```sh
pip install aiohttp
```

--------------------------------

### Using TypedDict for Nested Parameters

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Illustrates how to use TypedDict for nested request parameters in the Cerebras SDK. This example shows passing a dictionary for `stream_options` to the `chat.completions.create` method and accessing the response.

```python
from cerebras.cloud.sdk import Cerebras

client = Cerebras()

chat_completion = client.chat.completions.create(
    messages=[
        {
            "content": "content",
            "role": "system",
        }
    ],
    model="model",
    stream_options={},
)
print(chat_completion.stream_options)
```

--------------------------------

### Distinguishing Null vs. Missing Fields in API Responses

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Provides a Python code example demonstrating how to differentiate between a field explicitly set to `null` and a field that is entirely missing from an API response using the `.model_fields_set` attribute.

```python
if response.my_field is None:
  if 'my_field' not in response.model_fields_set:
    print('Got json like {}, without a "my_field" key present at all.')
  else:
    print('Got json like {"my_field": null}.')
```

--------------------------------

### Per-Request HTTP Client Customization

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Shows how to customize the HTTP client on a per-request basis using the `with_options()` method, allowing different configurations for specific calls without altering the main client instance.

```python
from cerebras.cloud.sdk import Cerebras, DefaultHttpxClient
import httpx

client = Cerebras()
client.with_options(http_client=DefaultHttpxClient(proxy="http://my.test.proxy.example.com"))
```

--------------------------------

### Configuring the HTTP Client

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to override the default httpx client to customize aspects like proxies, transports, and other advanced configurations. This allows fine-grained control over network requests.

```python
import httpx
from cerebras.cloud.sdk import Cerebras, DefaultHttpxClient

client = Cerebras(
    # Or use the `CEREBRAS_BASE_URL` env var
    base_url="http://my.test.server.example.com:8083",
    http_client=DefaultHttpxClient(
        proxy="http://my.test.proxy.example.com",
        transport=httpx.HTTPTransport(local_address="0.0.0.0"),
    ),
)
```

--------------------------------

### Managing HTTP Client Resources with Context Manager

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Explains how to manage the lifecycle of the HTTP client, ensuring resources are properly closed. Using a context manager (`with Cerebras() as client:`) guarantees the client is closed upon exiting the block.

```python
from cerebras.cloud.sdk import Cerebras

with Cerebras() as client:
  # make requests here
  pass

# HTTP client is now closed
```

--------------------------------

### Text Completion (Synchronous)

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to perform a text completion request using the synchronous Cerebras client. It initializes the client and provides a prompt, max tokens, and model for text generation.

```python
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),  # This is the default and can be omitted
)

completion = client.completions.create(
    prompt="It was a dark and stormy ",
    max_tokens=100,
    model="llama3.1-8b",
)

print(completion)
```

--------------------------------

### Enabling Logging in Cerebras SDK

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Shows how to enable verbose logging for the Cerebras Python SDK by setting the `CEREBRAS_LOG` environment variable to 'info' or 'debug'.

```shell
$ export CEREBRAS_LOG=info
```

```shell
$ export CEREBRAS_LOG=debug
```

--------------------------------

### Text Completions API

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/api.md

Manages text generation tasks. The `create` method sends a request to the `/v1/completions` endpoint with parameters and returns a `Completion` object.

```APIDOC
POST /v1/completions

client.completions.create(**params) -> Completion

Description:
  Creates a text completion.

Parameters:
  params: A dictionary of parameters for text completion (e.g., prompt, model).

Returns:
  Completion: An object containing the text completion response.
```

--------------------------------

### Synchronous Text Completion Streaming

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to stream text completion responses using the synchronous Cerebras client. It iterates over the stream to print text chunks. The API key is fetched from an environment variable.

```python
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    # This is the default and can be omitted
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

stream = client.completions.create(
    prompt="It was a dark and stormy ",
    max_tokens=100,
    model="llama3.1-8b",
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].text or "", end="")
```

--------------------------------

### Accessing Raw Response Data

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to access the raw HTTP response, including headers, and parse the content. This is useful for inspecting low-level response details or handling custom headers.

```python
from cerebras.cloud.sdk import Cerebras

client = Cerebras()
response = client.chat.completions.with_raw_response.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
)
print(response.headers.get('X-My-Header'))

completion = response.parse()  # get the object that `chat.completions.create()` would have returned
print(completion)
```

--------------------------------

### Chat Completion (Synchronous)

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to perform a chat completion request using the synchronous Cerebras client. It initializes the client with an API key and sends a user message to a specified model.

```python
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),  # This is the default and can be omitted
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
)

print(chat_completion)
```

--------------------------------

### Chat Completion (Asynchronous)

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to perform a chat completion request using the asynchronous Cerebras client. It initializes the client and uses `async`/`await` for non-blocking API calls.

```python
import os
import asyncio
from cerebras.cloud.sdk import AsyncCerebras

client = AsyncCerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),  # This is the default and can be omitted
)


async def main() -> None:
    chat_completion = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Why is fast inference important?",
            }
        ],
        model="llama3.1-8b",
    )
    print(chat_completion)


asyncio.run(main())
```

--------------------------------

### Chat Completions API

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/api.md

Handles chat-based interactions. The `create` method sends a request to the `/v1/chat/completions` endpoint with specified parameters and returns a `ChatCompletion` object.

```APIDOC
POST /v1/chat/completions

client.chat.completions.create(**params) -> ChatCompletion

Description:
  Creates a chat completion.

Parameters:
  params: A dictionary of parameters for chat completion (e.g., messages, model).

Returns:
  ChatCompletion: An object containing the chat completion response.
```

--------------------------------

### Synchronous Chat Completion Streaming

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to stream chat completion responses using the synchronous Cerebras client. It iterates over the stream to print content chunks as they arrive. The API key is fetched from an environment variable.

```python
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    # This is the default and can be omitted
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

stream = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
```

--------------------------------

### Configuring Timeouts for Cerebras SDK Requests

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Explains how to set request timeouts for the Cerebras Python SDK, both globally and per request. It covers using a float for simple timeouts and `httpx.Timeout` for more granular control over connect, read, and write timeouts.

```python
from cerebras.cloud.sdk import Cerebras
import httpx

# Configure the default for all requests:
client = Cerebras(
    # 20 seconds (default is 1 minute)
    timeout=20.0,
)

# More granular control:
client = Cerebras(
    timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
)

# Override per-request:
client.with_options(timeout=5.0).chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
)
```

--------------------------------

### Streaming Response Data

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Shows how to stream response bodies using `.with_streaming_response` for efficient handling of large responses. It requires a context manager and allows reading the response content incrementally.

```python
from cerebras.cloud.sdk import Cerebras

client = Cerebras()
with client.chat.completions.with_streaming_response.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
) as response:
    print(response.headers.get("X-My-Header"))

    for line in response.iter_lines():
        print(line)
```

--------------------------------

### Asynchronous Chat Completion Streaming

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to stream chat completion responses using the asynchronous Cerebras client. It uses an async for loop to process streamed chunks. The API key is fetched from an environment variable.

```python
import os
import asyncio
from cerebras.cloud.sdk import AsyncCerebras

client = AsyncCerebras(
    # This is the default and can be omitted
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)


async def main() -> None:
    stream = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Why is fast inference important?",
            }
        ],
        model="llama3.1-8b",
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")


asyncio.run(main())
```

--------------------------------

### Handling API Errors with Cerebras SDK

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Demonstrates how to catch and handle various API errors, including connection issues, rate limiting, and general status errors, using the Cerebras Python SDK. It shows how to access specific error details like status codes and responses.

```python
import cerebras.cloud.sdk
from cerebras.cloud.sdk import Cerebras

client = Cerebras()

try:
    client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "This should cause an error!",
            }
        ],
        model="some-model-that-doesnt-exist",
    )
except cerebras.cloud.sdk.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)  # an underlying Exception, likely raised within httpx.
except cerebras.cloud.sdk.RateLimitError as e:
    print("A 429 status code was received; we should back off a bit.")
except cerebras.cloud.sdk.APIStatusError as e:
    print("Another non-200-range status code was received")
    print(e.status_code)
    print(e.response)
```

--------------------------------

### Making Undocumented Endpoint Requests

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Illustrates how to make requests to undocumented API endpoints using `client.post` (or other HTTP verbs). This method respects client options like retries and allows specifying the response casting.

```python
import httpx
from cerebras.cloud.sdk import Cerebras

client = Cerebras()
response = client.post(
    "/foo",
    cast_to=httpx.Response,
    body={"my_param": True},
)

print(response.headers.get("x-foo"))
```

--------------------------------

### Models API

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/api.md

Provides functionality to retrieve and list available models. The `retrieve` method fetches details for a specific model by its ID via the `/v1/models/{model_id}` endpoint, while the `list` method retrieves all available models from the `/v1/models` endpoint.

```APIDOC
GET /v1/models/{model_id}

client.models.retrieve(model_id) -> ModelRetrieveResponse

Description:
  Retrieves a specific model by its ID.

Parameters:
  model_id: The unique identifier of the model.

Returns:
  ModelRetrieveResponse: An object containing the details of the specified model.
```

```APIDOC
GET /v1/models

client.models.list() -> ModelListResponse

Description:
  Lists all available models.

Parameters:
  None

Returns:
  ModelListResponse: An object containing a list of available models.
```

--------------------------------

### Configuring Retries for Cerebras SDK Requests

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Illustrates how to manage automatic retries for requests made with the Cerebras Python SDK. It shows how to disable retries globally or configure them on a per-request basis using the `max_retries` option.

```python
from cerebras.cloud.sdk import Cerebras

# Configure the default for all requests:
client = Cerebras(
    # default is 2
    max_retries=0,
)

# Or, configure per-request:
client.with_options(max_retries=5).chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
)
```

--------------------------------

### Set Cerebras API Key

Source: https://github.com/cerebras/cerebras-cloud-sdk-python/blob/main/README.md

Sets the Cerebras API key as an environment variable. This key is required for authenticating requests to the Cerebras REST API.

```sh
export CEREBRAS_API_KEY="your-api-key-here"
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.