### Installation Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Instructions for installing the Cartesia Python SDK v3.0.0, including optional WebSocket support. ```APIDOC ## Installation Install the base SDK: ```bash pip install cartesia==3.0.0 ``` For WebSocket support: ```bash pip install "cartesia[websockets]==3.0.0" ``` ``` -------------------------------- ### Install Cartesia SDK v3.x Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Installation commands for the base SDK and optional WebSocket support using pip. ```bash pip install cartesia==3.0.0 pip install "cartesia[websockets]==3.0.0" ``` -------------------------------- ### Initialize Cartesia Client Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt How to instantiate the synchronous or asynchronous Cartesia client using an API key. Includes examples for custom configuration such as timeouts and base URLs. ```python import os from cartesia import Cartesia, AsyncCartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) async_client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY")) client = Cartesia( api_key=os.getenv("CARTESIA_API_KEY"), timeout=30.0, max_retries=3, base_url="https://api.cartesia.ai" ) ``` -------------------------------- ### Install Cartesia SDK Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Commands to install the Cartesia SDK via pip, including optional dependencies for WebSocket support and improved async performance. ```bash pip install cartesia pip install 'cartesia[websockets]' pip install 'cartesia[aiohttp]' ``` -------------------------------- ### Configure AsyncCartesia with aiohttp Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to use the aiohttp backend for improved concurrency performance in asynchronous Cartesia applications. Requires installing the 'cartesia[aiohttp]' package. ```bash pip install 'cartesia[aiohttp]' ``` ```python import asyncio import os from cartesia import DefaultAioHttpClient from cartesia import AsyncCartesia async def main() -> None: async with AsyncCartesia( api_key=os.getenv("CARTESIA_API_KEY"), http_client=DefaultAioHttpClient(), ) as client: response = await client.tts.generate( model_id="sonic-3", output_format={ "container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100, }, transcript="I have to say that I'd rather stay awake when I'm asleep.", voice={ "mode": "id", "id": "e07c00bc-4134-4eae-9ea4-1a55fb45746b", }, ) asyncio.run(main()) ``` -------------------------------- ### Install Cartesia Python Library Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Commands to install the core Cartesia library and the optional dependencies for WebSocket support using pip. ```bash pip install cartesia pip install 'cartesia[websockets]' ``` -------------------------------- ### Handle API Pagination Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Shows how to iterate through paginated API results using built-in iterators for both synchronous and asynchronous clients. Includes examples for automatic iteration and manual page control. ```python from cartesia import Cartesia client = Cartesia() all_voices = [] for voice in client.voices.list(): all_voices.append(voice) print(all_voices) ``` ```python import asyncio from cartesia import AsyncCartesia client = AsyncCartesia() async def main() -> None: all_voices = [] async for voice in client.voices.list(): all_voices.append(voice) print(all_voices) asyncio.run(main()) ``` -------------------------------- ### GET / Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Retrieve the current status of the Cartesia service. ```APIDOC ## GET / ### Description Retrieves the health and status information of the Cartesia API. ### Method GET ### Endpoint / ### Response #### Success Response (200) - **status** (GetStatusResponse) - The current status object of the service. ``` -------------------------------- ### Voice Management - Get Voice Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates how to retrieve details for a specific voice by its ID. ```APIDOC ## Voice Management - Get Voice ### Description This endpoint retrieves detailed information about a specific voice using its unique ID. ### Method `GET` ### Endpoint `/v1/voices/{voice_id}` (Implied) ### Parameters #### Path Parameters - **voice_id** (string) - Required - The ID of the voice to retrieve. ### Request Example ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) # Get a specific voice voice = client.voices.get("voice-id-here") print(f"Name: {voice.name}, Language: {voice.language}") ``` ### Response #### Success Response (200) - **id** (string) - The unique identifier for the voice. - **name** (string) - The display name of the voice. - **language** (string) - The language the voice speaks. - (Other potential voice-specific details) ``` -------------------------------- ### Getting a Specific Voice by ID in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates how to retrieve details for a specific voice using its unique ID. It prints the name and language of the requested voice. Requires the cartesia library. ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) # voice = client.voices.get("voice-id-here") # print(f"Name: {voice.name}, Language: {voice.language}") ``` -------------------------------- ### Configuring the HTTP Client Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to override the default httpx client to support custom proxies, transports, or base URLs. ```python import httpx from cartesia import Cartesia, DefaultHttpxClient client = Cartesia( base_url="http://my.test.server.example.com:8083", http_client=DefaultHttpxClient( proxy="http://my.test.proxy.example.com", transport=httpx.HTTPTransport(local_address="0.0.0.0"), ), ) ``` -------------------------------- ### Generate Speech with Cartesia Synchronous Client Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to initialize the Cartesia client and perform a standard text-to-speech generation request, saving the output to a WAV file. ```python import os from cartesia import Cartesia client = Cartesia( api_key=os.getenv("CARTESIA_API_KEY"), ) response = client.tts.generate( model_id="sonic-3", output_format={ "container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100, }, transcript="I have to say that I'd rather stay awake when I'm asleep.", voice={ "mode": "id", "id": "e07c00bc-4134-4eae-9ea4-1a55fb45746b", }, ) response.write_to_file("cartesia_generated.wav") ``` -------------------------------- ### Create and Clone Voices (Python) Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Demonstrates how to create voices using embeddings (v2.x) and clone voices from audio clips (v3.x). Cloning from audio is the recommended approach in v3.x. ```python # v2.x - Create from embedding voice = client.voices.create( name="My Voice", description="A custom voice", embedding=[1.0] * 192, # 192-dimensional embedding language="en", ) ``` ```python # v3.x - Clone from audio clip with open("sample.wav", "rb") as clip: voice = client.voices.clone( clip=clip, name="My Voice", description="A custom voice", language="en", ) ``` -------------------------------- ### Get Client Status Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Retrieves the current status of the Cartesia client. Returns a GetStatusResponse object. ```python from cartesia.types import GetStatusResponse status = client.get_status() ``` -------------------------------- ### GET /voices Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Retrieves a paginated list of available voices. The SDK provides automatic pagination iterators. ```APIDOC ## GET /voices ### Description Lists all available voices. The response is paginated, and the SDK provides helpers to iterate through all pages automatically. ### Method GET ### Endpoint /voices ### Parameters #### Query Parameters - **limit** (integer) - Optional - Number of items per page. - **starting_after** (string) - Optional - Cursor for pagination. ### Response #### Success Response (200) - **data** (array) - List of voice objects. - **starting_after** (string) - Cursor for the next page. ### Response Example { "data": [{"id": "voice-1", "name": "Example Voice"}], "starting_after": "next_page_cursor_string" } ``` -------------------------------- ### Handle Server-Sent Events (SSE) Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Demonstrates how v3.x automatically decodes base64 audio chunks in SSE streams, simplifying the processing loop compared to v2.x. ```python # v3.x stream = client.tts.generate_sse( model_id="sonic-3", transcript="Hello, world!", voice={"mode": "id", "id": "voice-id"}, output_format={"container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100}, ) chunks = [] for event in stream: if event.type == "chunk": chunks.append(event.audio) # v3.x puts decoded bytes in event.audio elif event.type == "done": break ``` -------------------------------- ### Handling WebSocket Responses (Python) Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Demonstrates how to handle responses from WebSocket connections in v3.x, including audio chunks, timestamps, completion signals, and errors. ```python for response in connection: if response.type == "chunk": process_audio(response.audio) elif response.type == "timestamps": process_timestamps(response.word_timestamps) elif response.type == "done" or response.done: break elif response.type == "error": raise Exception(response.error) ``` -------------------------------- ### Create Datasets and Fine-Tune Models Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Orchestrates the creation of datasets, file uploads, and the initiation of fine-tuning jobs for custom voice models. ```python dataset = client.datasets.create( name="Custom Voice Dataset", description="Training data for custom voice model", ) client.datasets.files.upload( id=dataset.id, file=Path("/path/to/audio_sample.wav"), ) fine_tune = client.fine_tunes.create( name="My Custom Voice Model", description="Fine-tuned voice model", dataset=dataset.id, model_id="sonic-3", language="en", ) ``` -------------------------------- ### Configuring the HTTP Client Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Instructions on how to directly override the httpx client for advanced customization like proxies, transports, and other features. ```APIDOC ## Configuring the HTTP Client You can directly override the [httpx client](https://www.python-httpx.org/api/#client) to customize it for your use case. ### Method ```python import httpx from cartesia import Cartesia, DefaultHttpxClient client = Cartesia( # Or use the `CARTESIA_BASE_URL` env var base_url="http://my.test.server.example.com:8083", http_client=DefaultHttpxClient( proxy="http://my.test.proxy.example.com", transport=httpx.HTTPTransport(local_address="0.0.0.0"), ), ) ``` You can also customize the client on a per-request basis by using `with_options()`: ```python client.with_options(http_client=DefaultHttpxClient(...)) ``` ``` -------------------------------- ### Cartesia API Error Handling (Python) Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Provides an example of how to implement robust error handling for Cartesia API requests using specific exception classes like BadRequestError, AuthenticationError, and RateLimitError. ```python from cartesia import ( CartesiaError, APIError, APIStatusError, BadRequestError, AuthenticationError, NotFoundError, RateLimitError, ) try: response = client.tts.generate(...) except BadRequestError as e: print(f"Bad request: {e}") except AuthenticationError as e: print(f"Auth failed: {e}") except NotFoundError as e: print(f"Not found: {e}") except RateLimitError as e: print(f"Rate limited: {e}") except APIError as e: print(f"API error: {e}") ``` -------------------------------- ### Infill API Usage (Python) Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Illustrates the usage of the infill API for text-to-speech generation, showing parameter differences between v2.x and v3.x. The v3.x version allows direct file paths or bytes for left/right audio. ```python # v2.x infill_audio, total_audio = client.tts.infill( model_id="sonic-3", language="en", transcript="Infill text", left_audio_path="left.wav", right_audio_path="right.wav", voice={"mode": "id", "id": "voice-id"}, output_format={"container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100}, ) ``` ```python # v3.x response = client.tts.infill( model_id="sonic-3", language="en", transcript="Infill text", left_audio="left.wav", # left_audio and right_audio can be file paths or right_audio="right.wav", # raw audio file bytes. voice_id="voice-id", output_format={"container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100}, ) response.write_to_file("infill_output.wav") ``` -------------------------------- ### Managing HTTP Resources Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Guidance on managing HTTP resources, including automatic closing via garbage collection and manual closing using `.close()` or a context manager. ```APIDOC ## Managing HTTP Resources By default the library closes underlying HTTP connections whenever the client is [garbage collected](https://docs.python.org/3/reference/datamodel.html#object.__del__). You can manually close the client using the `.close()` method if desired, or with a context manager that closes when exiting. ### Method ```python from cartesia import Cartesia with Cartesia() as client: # make requests here ... # HTTP client is now closed ``` ``` -------------------------------- ### Generate Text-to-Speech Audio Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates generating a complete audio file from text using the tts.generate method. Shows how to save the binary output to a file or iterate over bytes. ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) response = client.tts.generate( model_id="sonic-3", transcript="Hello, world! This is a demonstration of text-to-speech synthesis.", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={"container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100}, language="en", generation_config={"speed": 1.0, "emotion": "neutral"} ) response.write_to_file("output.wav") for chunk in response.iter_bytes(): process_audio_chunk(chunk) ``` -------------------------------- ### Voice Management - List Voices Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates how to list available voices using pagination. ```APIDOC ## Voice Management - List Voices ### Description This endpoint retrieves a list of available voices. Pagination is supported to handle large numbers of voices. ### Method `GET` ### Endpoint `/v1/voices` (Implied) ### Parameters #### Query Parameters - **limit** (integer) - Optional - The maximum number of voices to return per page. Defaults to a reasonable value if not specified. - **offset** (integer) - Optional - The number of voices to skip before starting to collect the result set. ### Request Example ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) # List all voices with pagination all_voices = [] for voice in client.voices.list(limit=50): all_voices.append(voice) print(f"Voice: {voice.name} (ID: {voice.id})") ``` ### Response #### Success Response (200) - **voices** (array of objects) - A list of voice objects, each containing: - **id** (string) - The unique identifier for the voice. - **name** (string) - The display name of the voice. - **language** (string) - The language the voice speaks. ``` -------------------------------- ### Async TTS Generation in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates asynchronous text-to-speech generation using the AsyncCartesia client. This is useful for integrating with async frameworks. It generates audio and saves it to a file. Requires cartesia and asyncio. ```python import asyncio import os from cartesia import AsyncCartesia async def generate_speech(): client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY")) response = await client.tts.generate( model_id="sonic-3", transcript="Async text-to-speech generation.", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={ "container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100, }, ) await response.write_to_file("async_output.wav") asyncio.run(generate_speech()) ``` -------------------------------- ### Enable logging for Cartesia Python Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Explains how to enable SDK logging by setting the CARTESIA_LOG environment variable. ```shell $ export CARTESIA_LOG=info ``` -------------------------------- ### Listing Voices with Pagination in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Shows how to list available voices using the Cartesia client, including pagination to handle large numbers of voices. It prints the name and ID of each voice. Requires the cartesia library. ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) all_voices = [] for voice in client.voices.list(limit=50): all_voices.append(voice) print(f"Voice: {voice.name} (ID: {voice.id})") ``` -------------------------------- ### Manage Metrics and Results Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Methods to create, retrieve, and list metrics, associate metrics with agents, and export metric results. ```python metric = client.agents.metrics.create(name="latency") client.agents.metrics.add_to_agent(metric_id="m_1", agent_id="a_1") results = client.agents.metrics.results.list() csv_data = client.agents.metrics.results.export() ``` -------------------------------- ### Async TTS Generation Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates asynchronous text-to-speech generation using the `AsyncCartesia` client. ```APIDOC ## Async TTS Generation ### Description This example shows how to use the asynchronous client to generate speech from text without blocking the event loop. ### Method `POST` ### Endpoint `/v1/tts/generate` (Implied) ### Parameters #### Request Body - **model_id** (string) - Required - The ID of the TTS model to use (e.g., "sonic-3"). - **transcript** (string) - Required - The text to convert to speech. - **voice** (object) - Required - Voice configuration. - **mode** (string) - Required - Mode of voice selection (e.g., "id"). - **id** (string) - Required - The ID of the specific voice. - **output_format** (object) - Required - Specifies the desired audio output format. - **container** (string) - Required - The audio container format (e.g., "wav", "raw"). - **encoding** (string) - Required - The audio encoding format (e.g., "pcm_f32le"). - **sample_rate** (integer) - Required - The sample rate of the audio. ### Request Example ```python import asyncio import os from cartesia import AsyncCartesia async def generate_speech(): client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY")) # Generate audio asynchronously response = await client.tts.generate( model_id="sonic-3", transcript="Async text-to-speech generation.", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={ "container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100, }, ) await response.write_to_file("async_output.wav") asyncio.run(generate_speech()) ``` ### Response #### Success Response (200) - **audio_data** (bytes) - The generated audio data. - **write_to_file** (method) - A method to write the audio data to a file. ``` -------------------------------- ### Manage Voice Operations with Cartesia Python SDK Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates how to clone a voice from an audio file, update metadata, localize voices for different languages, and delete existing voices. ```python with open("sample_voice.wav", "rb") as clip: cloned_voice = client.voices.clone( clip=clip, name="My Custom Voice", description="Cloned from sample recording", language="en", ) print(f"Cloned voice ID: {cloned_voice.id}") updated = client.voices.update( "voice-id-here", name="Updated Voice Name", description="New description", gender="female", ) localized = client.voices.localize( voice_id="original-voice-id", name="Spanish Voice", description="Localized to Spanish", language="es", original_speaker_gender="female", dialect="mx", ) client.voices.delete("voice-id-to-delete") ``` -------------------------------- ### Handle Agent Calls Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Methods for retrieving call details, listing calls with pagination, and downloading audio recordings. ```python call = client.agents.calls.retrieve(call_id="call_123") calls = client.agents.calls.list() client.agents.calls.download_audio(call_id="call_123") ``` -------------------------------- ### WebSocket TTS Streaming with Continuations in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Shows how to stream audio using WebSocket with continuations, suitable for simulating LLM output. It pushes text parts and then receives audio chunks. Requires the cartesia library. ```python from cartesia import Cartesia client = Cartesia(api_key="YOUR_API_KEY") with client.tts.websocket_connect() as connection: ctx = connection.context( model_id="sonic-3", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={ "container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100, }, ) for part in ["The road ", "goes ever ", "on and ", "on."]: ctx.push(part) ctx.no_more_inputs() with open("continuation_output.pcm", "wb") as f: for response in ctx.receive(): if response.type == "chunk" and response.audio: f.write(response.audio) ``` -------------------------------- ### Asynchronous Speech Generation Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates the use of the AsyncCartesia client for non-blocking text-to-speech generation within an asyncio event loop. ```python import asyncio from cartesia import AsyncCartesia client = AsyncCartesia( api_key=os.getenv("CARTESIA_API_KEY"), ) async def main() -> None: response = await client.tts.generate( model_id="sonic-3", output_format={ "container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100, }, transcript="I have to say that I'd rather stay awake when I'm asleep.", voice={ "mode": "id", "id": "e07c00bc-4134-4eae-9ea4-1a55fb45746b", }, ) await response.write_to_file("cartesia_generated.wav") asyncio.run(main()) ``` -------------------------------- ### Making Custom HTTP Requests Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Explains how to perform requests to undocumented endpoints using standard HTTP verbs while maintaining client configuration settings. ```python import httpx response = client.post( "/foo", cast_to=httpx.Response, body={"my_param": True}, ) print(response.headers.get("x-foo")) ``` -------------------------------- ### Handling SSE Events (Python) Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Shows how to process Server-Sent Events (SSE) streams in v3.x, which return typed events for audio chunks, word timestamps, and stream completion. ```python for event in stream: match event.type: case "chunk": # Audio chunk - event.audio contains bytes process_audio(event.audio) case "timestamps": # Word timestamps - event.word_timestamps process_timestamps(event.word_timestamps) case "done": # Stream complete break ``` -------------------------------- ### Managing HTTP Client Lifecycle Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Shows how to use the Cartesia client as a context manager to ensure underlying HTTP connections are closed automatically. ```python from cartesia import Cartesia with Cartesia() as client: # make requests here ... ``` -------------------------------- ### Upload files using Cartesia Python SDK Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to upload files by passing a PathLike object to the client's file upload method. This supports both synchronous and asynchronous clients. ```python from pathlib import Path from cartesia import Cartesia client = Cartesia() client.datasets.files.upload( id="id", file=Path("/path/to/file"), ) ``` -------------------------------- ### Manage Fine-Tunes Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Manage fine-tuning jobs and their associated voices. Provides methods to create, retrieve, list, delete jobs, and list voices for a specific fine-tune. ```python fine_tune = client.fine_tunes.create(**params) fine_tune = client.fine_tunes.retrieve(id="ft_123") fine_tunes = client.fine_tunes.list(**params) client.fine_tunes.delete(id="ft_123") voices = client.fine_tunes.list_voices(id="ft_123", **params) ``` -------------------------------- ### Stream Audio via WebSockets Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Shows how to use the WebSocket connection to stream text inputs and receive audio chunks in real-time, suitable for latency-sensitive applications. ```python import os from cartesia import Cartesia client = Cartesia( api_key=os.getenv("CARTESIA_API_KEY"), ) with client.tts.websocket_connect() as connection: ctx = connection.context( model_id="sonic-3", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={ "container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100, }, ) for part in ["The road ", "goes ever ", "on and ", "on."]: ctx.push(part) ctx.no_more_inputs() filename = f"cartesia_websocket_generated.pcm" with open(filename, "wb") as f: for response in ctx.receive(): if response.type == "chunk" and response.audio: f.write(response.audio) print(f"Saved audio to {filename}") ``` -------------------------------- ### Accessing Raw API Responses Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to access raw HTTP response headers and objects by using the .with_raw_response prefix on API methods. ```python from cartesia import Cartesia client = Cartesia() response = client.voices.with_raw_response.list() print(response.headers.get('X-My-Header')) voice = response.parse() # get the object that `voices.list()` would have returned print(voice.id) ``` -------------------------------- ### Generate Batch TTS Audio Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Comparison of batch TTS generation between v2.x and v3.x, demonstrating the new .generate() method and write_to_file helper. ```python # v3.x response = client.tts.generate( model_id="sonic-3", transcript="Hello, world!", voice={"mode": "id", "id": "voice-id"}, output_format={"container": "wav", "encoding": "pcm_f32le", "sample_rate": 44100}, ) response.write_to_file("output.wav") ``` -------------------------------- ### Create Access Tokens Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Generate a new access token for authentication. This method accepts parameters for token configuration and returns an AccessTokenCreateResponse object. ```python token_response = client.access_token.create(**params) ``` -------------------------------- ### Generate TTS with Nested Parameters Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to perform text-to-speech generation using nested dictionaries for model configuration, output formatting, and voice selection. ```python from cartesia import Cartesia client = Cartesia() response = client.tts.generate( model_id="model_id", output_format={ "encoding": "pcm_f32le", "sample_rate": 8000, }, transcript="transcript", voice={ "id": "id", "mode": "id", }, ) print(response.voice) ``` -------------------------------- ### WebSocket TTS with Continuations Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates streaming text parts to the TTS WebSocket API for continuous synthesis, simulating LLM output. ```APIDOC ## WebSocket TTS with Continuations ### Description This example shows how to use the `context` feature of the TTS WebSocket API to stream multiple text parts sequentially, useful for synthesizing LLM-generated text. ### Method WebSocket Connection with Context ### Endpoint N/A (WebSocket) ### Parameters N/A (Handled within the `websocket_connect` and `context` methods) ### Request Example ```python with client.tts.websocket_connect() as connection: ctx = connection.context( model_id="sonic-3", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={ "container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100, }, ) # Stream text parts (simulating LLM output) for part in ["The road ", "goes ever ", "on and ", "on."]: ctx.push(part) ctx.no_more_inputs() # Signal end of input # Receive audio chunks with open("continuation_output.pcm", "wb") as f: for response in ctx.receive(): if response.type == "chunk" and response.audio: f.write(response.audio) ``` ### Response Audio data is streamed back in chunks and written to a file. ``` -------------------------------- ### Manage Voices and Cloning Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Comprehensive tools for voice management, including listing, updating, deleting, cloning, and localizing voices. These methods interact with the voice registry to provide metadata and voice configurations. ```python from cartesia.types import Voice, VoiceMetadata # Update voice client.voices.update(id, **params) # List voices client.voices.list(**params) # Clone a voice client.voices.clone(**params) # Localize a voice client.voices.localize(**params) # Delete a voice client.voices.delete(id) ``` -------------------------------- ### Concurrent WebSocket TTS with Async Client Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates using the async client to manage multiple concurrent WebSocket TTS contexts. ```APIDOC ## Concurrent WebSocket TTS with Async Client ### Description This example shows how to establish a single WebSocket connection and manage multiple independent TTS contexts concurrently using the asynchronous client. ### Method WebSocket Connection with Multiple Contexts ### Endpoint N/A (WebSocket) ### Parameters N/A (Handled within the `websocket_connect` and `context` methods) ### Request Example ```python import asyncio import os from cartesia import AsyncCartesia async def concurrent_websocket(): client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY")) async with client.tts.websocket_connect() as connection: # Create multiple concurrent contexts ctx1 = connection.context( model_id="sonic-3", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={"container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100}, ) ctx2 = connection.context( model_id="sonic-3", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={"container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100}, ) # Send to both contexts await ctx1.push("First context speaking.") await ctx1.no_more_inputs() await ctx2.push("Second context speaking.") await ctx2.no_more_inputs() # Collect audio from both async for response in ctx1.receive(): if response.type == "chunk" and response.audio: # Process ctx1 audio pass asyncio.run(concurrent_websocket()) ``` ### Response Audio data is streamed back for each context and can be processed independently. ``` -------------------------------- ### Manage Agent Deployments Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Retrieve specific deployment details or list all deployments associated with an agent. These methods require a deployment ID or agent ID respectively. ```python deployment = client.agents.deployments.retrieve(deployment_id="dep_123") deployments = client.agents.deployments.list(agent_id="agent_123") ``` -------------------------------- ### FineTunes API Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Manage fine-tuning jobs. Create, retrieve, update, list, and delete fine-tuning jobs, and list associated voices. ```APIDOC ## POST /fine-tunes/ ### Description Creates a new fine-tuning job. ### Method POST ### Endpoint `/fine-tunes/` ### Parameters #### Request Body - **params** (object) - Required - Parameters for creating the fine-tuning job. Refer to `FineTuneCreateParams` for details. ### Request Example ```json { "training_file_id": "file_abc123", "model": "base-model-name" } ``` ### Response #### Success Response (200) - **FineTune** (object) - The newly created fine-tuning job object. ### Response Example ```json { "id": "ft_abcdef123", "status": "pending", "created_at": "2023-10-27T10:00:00Z" } ``` ``` ```APIDOC ## GET /fine-tunes/{id} ### Description Retrieves a specific fine-tuning job by its unique identifier. ### Method GET ### Endpoint `/fine-tunes/{id}` ### Parameters #### Path Parameters - **id** (string) - Required - The unique identifier of the fine-tuning job. ### Response #### Success Response (200) - **FineTune** (object) - The fine-tuning job object. ### Response Example ```json { "id": "ft_abcdef123", "status": "succeeded", "created_at": "2023-10-27T10:00:00Z" } ``` ``` ```APIDOC ## GET /fine-tunes/ ### Description Lists all fine-tuning jobs, with optional filtering and pagination. ### Method GET ### Endpoint `/fine-tunes/` ### Parameters #### Query Parameters - **params** (object) - Optional - Parameters for listing fine-tuning jobs. Refer to `FineTuneListParams` for details. ### Response #### Success Response (200) - **SyncCursorIDPage[FineTune]** (object) - A paginated list of fine-tuning jobs. ### Response Example ```json { "data": [ { "id": "ft_abcdef123", "status": "succeeded", "created_at": "2023-10-27T10:00:00Z" } ], "has_more": false } ``` ``` ```APIDOC ## DELETE /fine-tunes/{id} ### Description Deletes a fine-tuning job by its unique identifier. ### Method DELETE ### Endpoint `/fine-tunes/{id}` ### Parameters #### Path Parameters - **id** (string) - Required - The unique identifier of the fine-tuning job to delete. ### Response #### Success Response (200) - **None** ### Response Example ```json null ``` ``` ```APIDOC ## GET /fine-tunes/{id}/voices ### Description Lists all voices associated with a specific fine-tuning job. ### Method GET ### Endpoint `/fine-tunes/{id}/voices` ### Parameters #### Path Parameters - **id** (string) - Required - The unique identifier of the fine-tuning job. #### Query Parameters - **params** (object) - Optional - Parameters for listing voices. Refer to `FineTuneListVoicesParams` for details. ### Response #### Success Response (200) - **SyncCursorIDPage[Voice]** (object) - A paginated list of voices. ### Response Example ```json { "data": [ { "id": "voice_abc123", "name": "Custom Voice", "fine_tune_id": "ft_abcdef123" } ], "has_more": false } ``` ``` -------------------------------- ### Basic WebSocket TTS Streaming in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates how to stream audio from text using a WebSocket connection. It sends a single text input and saves the received audio chunks to a file. Dependencies include the cartesia library. ```python from cartesia import Cartesia client = Cartesia(api_key="YOUR_API_KEY") with client.tts.websocket_connect() as connection: connection.send({ "model_id": "sonic-3", "transcript": "Hello from WebSocket streaming!", "voice": {"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, "output_format": { "container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100, }, }) with open("ws_output.pcm", "wb") as f: for response in connection: if response.type == "chunk" and response.audio: f.write(response.audio) elif response.done: break ``` -------------------------------- ### Streaming API Responses Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Shows how to use .with_streaming_response with a context manager to process response bodies line-by-line without eager loading. ```python with client.voices.with_streaming_response.list() as response: print(response.headers.get("X-My-Header")) for line in response.iter_lines(): print(line) ``` -------------------------------- ### Generate Text-to-Speech and Infill Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Methods for generating audio from text using standard generation or SSE streams, as well as performing infill operations. These methods return binary data or handle streaming responses. ```python from cartesia.types import GenerationConfig, GenerationRequest # Generate audio bytes client.tts.generate(**params) # Generate via SSE client.tts.generate_sse(**params) # Perform infill client.tts.infill(**params) ``` -------------------------------- ### Speech-to-Text Transcription with Word Timestamps in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Shows how to transcribe an audio file to text and retrieve word-level timestamps. It handles audio files and prints the transcription details. Requires the cartesia library. ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) with open("recording.wav", "rb") as audio_file: response = client.stt.transcribe( file=audio_file, model="ink-whisper", language="en", timestamp_granularities=["word"], ) print(f"Transcription: {response.text}") print(f"Duration: {response.duration} seconds") if response.words: for word in response.words: print(f" '{word.word}': {word.start:.2f}s - {word.end:.2f}s") ``` -------------------------------- ### Concurrent Async WebSocket TTS in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Illustrates how to manage multiple concurrent WebSocket TTS contexts asynchronously. This allows for simultaneous streaming to different contexts. Requires cartesia and asyncio. ```python import asyncio import os from cartesia import AsyncCartesia async def concurrent_websocket(): client = AsyncCartesia(api_key=os.getenv("CARTESIA_API_KEY")) async with client.tts.websocket_connect() as connection: ctx1 = connection.context( model_id="sonic-3", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={"container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100}, ) ctx2 = connection.context( model_id="sonic-3", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={"container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100}, ) await ctx1.push("First context speaking.") await ctx1.no_more_inputs() await ctx2.push("Second context speaking.") await ctx2.no_more_inputs() async for response in ctx1.receive(): if response.type == "chunk" and response.audio: pass asyncio.run(concurrent_websocket()) ``` -------------------------------- ### Speech-to-Text Transcription with Specific Encoding in Python Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Demonstrates transcribing audio data with a specified encoding and sample rate. This is useful when the audio format is known and needs to be explicitly provided. Requires the cartesia library. ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) # Assuming audio_bytes is defined and contains audio data # response = client.stt.transcribe( # file=audio_bytes, # model="ink-whisper", # encoding="pcm_s16le", # sample_rate=16000, # ) ``` -------------------------------- ### Mix Voices (Python - Deprecated) Source: https://github.com/cartesia-ai/cartesia-python/blob/main/MIGRATING.md Shows how to mix existing voices using weights. This functionality is deprecated in v3.x. ```python # v2.x, deprecated in v3.x output = client.voices.mix( voices=[ {"id": "voice-1", "weight": 0.5}, {"id": "voice-2", "weight": 0.5}, ] ) ``` -------------------------------- ### Manage Dataset Files Source: https://github.com/cartesia-ai/cartesia-python/blob/main/api.md Handle files within a dataset, including listing existing files, uploading new ones, and deleting specific files by ID. ```python files = client.datasets.files.list(id="ds_123", **params) client.datasets.files.upload(id="ds_123", **params) client.datasets.files.delete(file_id="file_123", id="ds_123") ``` -------------------------------- ### Stream TTS via SSE Source: https://context7.com/cartesia-ai/cartesia-python/llms.txt Uses tts.generate_sse to stream audio generation in real-time. Includes handling for audio chunks and word-level timestamps. ```python import os from cartesia import Cartesia client = Cartesia(api_key=os.getenv("CARTESIA_API_KEY")) stream = client.tts.generate_sse( model_id="sonic-3", transcript="The quick brown fox jumps over the lazy dog.", voice={"mode": "id", "id": "6ccbfb76-1fc6-48f7-b71d-91ac6298247b"}, output_format={"container": "raw", "encoding": "pcm_f32le", "sample_rate": 44100}, add_timestamps=True ) audio_chunks = [] for event in stream: if event.type == "chunk" and event.audio: audio_chunks.append(event.audio) elif event.type == "timestamps": print(f"Words: {event.word_timestamps.words}") elif event.type == "done": break elif event.type == "error": raise Exception(f"TTS Error: {event.error}") ``` -------------------------------- ### Accessing Raw Response Data Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to access the raw HTTP response, including headers, and parse the response body. ```APIDOC ## Accessing Raw Response Data The `raw` Response object can be accessed by prefixing `.with_raw_response.` to any HTTP method call. ### Method ```python response = client.voices.with_raw_response.list() print(response.headers.get('X-My-Header')) voice = response.parse() # get the object that `voices.list()` would have returned print(voice.id) ``` These methods return an [`APIResponse`](https://github.com/cartesia-ai/cartesia-python/tree/main/src/cartesia/_response.py) object. The async client returns an [`AsyncAPIResponse`](https://github.com/cartesia-ai/cartesia-python/tree/main/src/cartesia/_response.py) with the same structure, the only difference being `await`able methods for reading the response content. ``` -------------------------------- ### Configure request timeouts in Cartesia Python Source: https://github.com/cartesia-ai/cartesia-python/blob/main/README.md Demonstrates how to adjust request timeout settings globally or per-request using float values or httpx.Timeout objects. ```python from cartesia import Cartesia import httpx # Configure the default for all requests: client = Cartesia( timeout=20.0, ) # More granular control: client = Cartesia( timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0), ) # Override per-request: client.with_options(timeout=5.0).voices.list() ```