### Install Fish Audio Python SDK Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Install the SDK using pip. Include '[utils]' for audio playback utilities. ```bash pip install fish-audio-sdk ``` ```bash # With audio playback utilities pip install fish-audio-sdk[utils] ``` -------------------------------- ### Install and Initialize Fish Audio Python SDK Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Install the SDK using pip. Initialize the client with an API key, either from an environment variable or directly. The client can be used as a context manager for automatic connection closing. Supports both synchronous and asynchronous clients. ```python # Install # pip install fish-audio-sdk # pip install fish-audio-sdk[utils] # with audio playback utilities import os from fishaudio import FishAudio, AsyncFishAudio # From environment variable (FISH_API_KEY) client = FishAudio() # Or pass directly client = FishAudio(api_key="your_api_key_here") # With custom timeout and base URL client = FishAudio( api_key="your_api_key_here", base_url="https://api.fish.audio", timeout=240.0, ) # As a context manager (auto-closes HTTP connection) with FishAudio(api_key="your_api_key_here") as client: audio = client.tts.convert(text="Hello from Fish Audio!") # Async client async_client = AsyncFishAudio(api_key="your_api_key_here") # Async context manager import asyncio async def main(): async with AsyncFishAudio(api_key="your_api_key_here") as client: audio = await client.tts.convert(text="Async hello!") asyncio.run(main()) ``` -------------------------------- ### Install SDK and Set API Key Source: https://github.com/fishaudio/fish-audio-python/blob/main/examples/getting_started.ipynb Install the SDK using pip and set your API key as an environment variable. Alternatively, create a .env file with the API key. ```bash pip install fishaudio export FISH_API_KEY="your_api_key" ``` -------------------------------- ### Client Initialization Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Install the SDK and initialize the client using an API key. The key can be passed directly or set via the FISH_API_KEY environment variable. The client can be used as a context manager for automatic connection closing. ```APIDOC ## Installation ```bash # Install # pip install fish-audio-sdk # pip install fish-audio-sdk[utils] # with audio playback utilities ``` ## Client Initialization ```python import os from fishaudio import FishAudio, AsyncFishAudio # From environment variable (FISH_API_KEY) client = FishAudio() # Or pass directly client = FishAudio(api_key="your_api_key_here") # With custom timeout and base URL client = FishAudio( api_key="your_api_key_here", base_url="https://api.fish.audio", timeout=240.0, ) # As a context manager (auto-closes HTTP connection) with FishAudio(api_key="your_api_key_here") as client: audio = client.tts.convert(text="Hello from Fish Audio!") # Async client async_client = AsyncFishAudio(api_key="your_api_key_here") # Async context manager import asyncio async def main(): async with AsyncFishAudio(api_key="your_api_key_here") as client: audio = await client.tts.convert(text="Async hello!") asyncio.run(main()) ``` ``` -------------------------------- ### Quick Start: Asynchronous TTS and Playback Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Generate audio from text using the asynchronous client within an asyncio event loop. ```python import asyncio from fishaudio import AsyncFishAudio from fishaudio.utils import play, save async def main(): client = AsyncFishAudio() audio = await client.tts.convert(text="Hello, world!") play(audio) save(audio, "output.mp3") asyncio.run(main()) ``` -------------------------------- ### Quick Start: Synchronous TTS and Playback Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Generate audio from text using the synchronous client, then play or save the output. ```python from fishaudio import FishAudio from fishaudio.utils import play, save client = FishAudio() # Generate audio audio = client.tts.convert(text="Hello, world!") # Play or save play(audio) save(audio, "output.mp3") ``` -------------------------------- ### Get Package Information Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Returns details about the current billing package, including the remaining balance and the total package amount. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") package = client.account.get_package() print(f"Balance: {package.balance} / {package.total}") ``` -------------------------------- ### Get Remaining Credits Source: https://github.com/fishaudio/fish-audio-python/blob/main/examples/getting_started.ipynb Retrieve the number of remaining credits on your Fish Audio account. This is useful for monitoring usage. ```python credits = client.account.get_credits() print(f"Remaining credits: {credits.credit}") ``` -------------------------------- ### Get API Credit Balance Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Retrieves the current API credit balance. Optionally checks for the availability of free credits. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") credits = client.account.get_credits() print(f"Available credits: {float(credits.credit)}") # Check free credit availability credits = client.account.get_credits(check_free_credit=True) if credits.has_free_credit: print("Free credits are available!") ``` -------------------------------- ### Get Voice by ID Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Fetches the full metadata for a single voice model using its unique ID. Requires the FishAudio client to be initialized with an API key. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") voice = client.voices.get("802e3bc2b27e49c2995d23ef70e6ac89") print(voice.title) print(voice.description) print(voice.id) ``` -------------------------------- ### client.asr.transcribe() — Speech-to-Text Transcription Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Transcribes audio bytes to text and returns an `ASRResponse` containing the full transcript, audio duration (in milliseconds), and optionally a list of timestamped `ASRSegment` objects with `start`/`end` times in seconds. ```APIDOC ## `client.asr.transcribe()` — Speech-to-Text Transcription Transcribes audio bytes to text and returns an `ASRResponse` containing the full transcript, audio duration (in milliseconds), and optionally a list of timestamped `ASRSegment` objects with `start`/`end` times in seconds. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") # Basic transcription (auto-detect language) with open("audio.wav", "rb") as f: result = client.asr.transcribe(audio=f.read()) print(result.text) # Full transcript print(result.duration) # Duration in milliseconds # With explicit language and timestamps with open("speech.mp3", "rb") as f: result = client.asr.transcribe( audio=f.read(), language="en", # "en", "zh", "ja", etc. include_timestamps=True, # Default: True ) for segment in result.segments: print(f"[{segment.start:.2f}s – {segment.end:.2f}s] {segment.text}") # Output: # [0.00s – 1.23s] Hello, this is a test. # [1.30s – 2.85s] The transcription is working well. # Without timestamps (faster) with open("audio.wav", "rb") as f: result = client.asr.transcribe(audio=f.read(), include_timestamps=False) print(result.text) # result.segments will be empty # Async version import asyncio from fishaudio import AsyncFishAudio import aiofiles async def transcribe_async(): client = AsyncFishAudio(api_key="your_api_key_here") async with aiofiles.open("audio.mp3", "rb") as f: audio_bytes = await f.read() result = await client.asr.transcribe(audio=audio_bytes, language="en") print(result.text) asyncio.run(transcribe_async()) ``` ``` -------------------------------- ### Initialize Fish Audio Client Source: https://github.com/fishaudio/fish-audio-python/blob/main/examples/getting_started.ipynb Load environment variables and initialize the Fish Audio client. Ensure your API key is set before initialization. ```python from dotenv import load_dotenv from fishaudio import FishAudio from fishaudio.utils import play # from fishaudio.utils import save # Uncomment if saving audio to file load_dotenv() client = FishAudio() ``` -------------------------------- ### Authenticate Fish Audio Client Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Set the FISH_API_KEY environment variable or provide the API key directly when initializing the FishAudio client. ```bash export FISH_API_KEY=your_api_key_here ``` ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key") ``` -------------------------------- ### Async Streaming Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Demonstrates how to stream audio asynchronously using the AsyncFishAudio client. It shows how to process audio chunks as they arrive or collect the entire stream. ```APIDOC ## Async streaming ```python import asyncio from fishaudio import AsyncFishAudio async def stream_audio(): client = AsyncFishAudio(api_key="your_api_key_here") audio_stream = await client.tts.stream(text="Async chunk streaming.") async for chunk in audio_stream: # Send each chunk to a WebSocket, queue, etc. print(f"Got chunk: {len(chunk)} bytes") # Or collect stream = await client.tts.stream(text="Collect async.") audio_bytes = await stream.collect() asyncio.run(stream_audio()) ``` ``` -------------------------------- ### Handle Fish Audio SDK Exceptions in Python Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Demonstrates how to use a try-except block to catch and handle specific FishAudioError exceptions. This is useful for gracefully managing API errors, network issues, and invalid requests. ```python from fishaudio import FishAudio from fishaudio.exceptions import ( FishAudioError, AuthenticationError, # HTTP 401 PermissionError, # HTTP 403 NotFoundError, # HTTP 404 RateLimitError, # HTTP 429 ServerError, # HTTP 5xx ValidationError, # Invalid request parameters WebSocketError, # WebSocket connection/streaming failure DependencyError, # Missing optional dependency (e.g., ffplay) ) client = FishAudio(api_key="your_api_key_here") try: audio = client.tts.convert( text="Hello!", reference_id="nonexistent-voice-id", ) except AuthenticationError: print("Invalid or expired API key.") except NotFoundError: print("Voice model not found.") except RateLimitError as e: print(f"Rate limit hit (HTTP {e.status}). Retry later.") except ValidationError as e: print(f"Bad request parameters: {e}") except ServerError as e: print(f"Fish Audio server error: {e.status} — {e.message}") except WebSocketError as e: print(f"WebSocket streaming failed: {e}") except DependencyError as e: print(f"Missing dependency: {e.dependency}") print(f"Install with: {e.install_command}") except FishAudioError as e: print(f"Unexpected SDK error: {e}") ``` -------------------------------- ### Save Audio to File Source: https://github.com/fishaudio/fish-audio-python/blob/main/examples/getting_started.ipynb This commented-out snippet shows how to save the generated audio to an MP3 file. Uncomment and modify as needed. ```python # audio = client.tts.convert(text="This audio will be saved to a file.") # save(audio, "output.mp3") ``` -------------------------------- ### play() and save() — Audio Utilities Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Convenience utilities for playing or persisting audio bytes. `play()` supports FFmpeg (`ffplay`), Jupyter notebooks, and `sounddevice` as fallbacks. `save()` writes audio to disk. ```APIDOC ## Audio Utilities: play() and save() ### Description Convenience utilities for playing or persisting audio bytes. `play()` supports FFmpeg (`ffplay`), Jupyter notebooks, and `sounddevice` as fallbacks. `save()` writes audio to disk. ### Functions - **save(audio_bytes: bytes, filename: str)**: Writes the given audio bytes to a file with the specified filename. - **play(audio_bytes: bytes, notebook: bool = False, use_ffmpeg: bool = True)**: Plays the audio bytes. Supports playback in Jupyter notebooks and via FFmpeg or sounddevice. ### Request Example ```python from fishaudio.utils import play, save audio = client.tts.convert(text="Utilities demo.") # Save to file save(audio, "output.mp3") save(audio, "output.wav") # Play via ffplay (requires ffmpeg installed) play(audio) # Play in Jupyter notebook play(audio, notebook=True) # Play via sounddevice (pip install fish-audio-sdk[utils]) play(audio, use_ffmpeg=False) # Works with streaming iterators too for chunk in client.tts.stream(text="Streamed audio"): pass # process each chunk live save(client.tts.stream(text="Save streamed.").collect(), "streamed.mp3") ``` ``` -------------------------------- ### Instant Voice Cloning with Reference Audio Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Clone a voice on-the-fly by providing reference audio and the text spoken in the reference. This is useful for quick, one-off voice cloning tasks. ```python from fishaudio.types import ReferenceAudio # Clone voice on-the-fly with open("reference.wav", "rb") as f: audio = client.tts.convert( text="Cloned voice speaking", references=[ReferenceAudio( audio=f.read(), text="Text spoken in reference" )] ) ``` -------------------------------- ### Text-to-Speech Conversion (Complete Audio) Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Converts text to speech, returning the complete audio as bytes. Supports custom voices, speed, audio format, and latency settings. Can also perform instant voice cloning using reference audio files. ```python from fishaudio import FishAudio from fishaudio.types import TTSConfig, Prosody, ReferenceAudio from fishaudio.utils import play, save client = FishAudio(api_key="your_api_key_here") # Basic synthesis audio = client.tts.convert(text="Hello, world!") save(audio, "output.mp3") # With a specific voice ID audio = client.tts.convert( text="Speaking with a custom voice.", reference_id="802e3bc2b27e49c2995d23ef70e6ac89" ) # With speed and format control audio = client.tts.convert( text="Speaking faster in WAV format!", speed=1.5, # 0.5–2.0 range format="wav", # "mp3", "wav", "pcm", "opus" latency="balanced", # "normal" or "balanced" ) play(audio) # Reusable TTSConfig across multiple requests config = TTSConfig( reference_id="933563129e564b19a115bedd57b7406a", format="wav", latency="balanced", prosody=Prosody(speed=1.2, volume=-5), temperature=0.7, top_p=0.7, ) audio1 = client.tts.convert(text="First sentence.", config=config) audio2 = client.tts.convert(text="Second sentence.", config=config) # Instant voice cloning with reference audio with open("reference.wav", "rb") as f: audio = client.tts.convert( text="Cloned voice speaking this text.", references=[ReferenceAudio( audio=f.read(), text="Exact words spoken in the reference audio file." )] ) save(audio, "cloned.mp3") ``` -------------------------------- ### Audio Utilities: Play and Save Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Convenience functions for playing audio directly or saving it to a file. Supports various playback methods and file formats. ```python from fishaudio import FishAudio from fishaudio.utils import play, save client = FishAudio(api_key="your_api_key_here") audio = client.tts.convert(text="Utilities demo.") # Save to file save(audio, "output.mp3") save(audio, "output.wav") # Play via ffplay (requires ffmpeg installed) play(audio) # Play in Jupyter notebook play(audio, notebook=True) # Play via sounddevice (pip install fish-audio-sdk[utils]) play(audio, use_ffmpeg=False) # Works with streaming iterators too for chunk in client.tts.stream(text="Streamed audio"): pass # process each chunk live save(client.tts.stream(text="Save streamed.").collect(), "streamed.mp3") ``` -------------------------------- ### client.voices.list() Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Lists available voice models with options for filtering by tags and language, searching by title, and controlling self-only results and pagination. ```APIDOC ## client.voices.list() ### Description Lists available voice models with options for filtering by tags and language, searching by title, and controlling self-only results and pagination. ### Parameters #### Query Parameters - **tags** (list[str]) - Optional - Filters voices by the specified tags. - **language** (str) - Optional - Filters voices by the specified language code. - **sort_by** (str) - Optional - Specifies the sorting order. Can be 'task_count' or 'created_at'. - **self_only** (bool) - Optional - If True, only returns voices owned by the current user. - **page_size** (int) - Optional - The number of results to return per page. - **title** (str) - Optional - Searches for voices by their title. ### Request Example ```python # Filter by tags and language english_male = client.voices.list( tags=["male", "english"], language="en", sort_by="task_count", # or "created_at" ) # Only show your own voices my_voices = client.voices.list(self_only=True, page_size=50) # Search by title results = client.voices.list(title="narrator") ``` ``` -------------------------------- ### client.account.get_package() Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Returns the current billing package details as a Package object with balance and total fields. ```APIDOC ## client.account.get_package() ### Description Returns the current billing package details as a `Package` object with balance and total fields. ### Method GET ### Endpoint `/account/package` ### Request Example ```python package = client.account.get_package() print(f"Balance: {package.balance} / {package.total}") ``` ### Response #### Success Response (200) - **balance** (int) - The current balance of the package. - **total** (int) - The total capacity of the package. ``` -------------------------------- ### client.voices.create() Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Creates a new voice model by uploading one or more audio samples. Returns the created Voice object whose id can immediately be used in TTS calls via reference_id. ```APIDOC ## client.voices.create() ### Description Creates a new voice model by uploading one or more audio samples. Returns the created `Voice` object whose `id` can immediately be used in TTS calls via `reference_id`. ### Method POST ### Endpoint `/voices` ### Parameters #### Request Body - **title** (str) - Required - The title for the new voice model. - **voices** (list[bytes]) - Required - A list of audio samples in bytes. - **description** (str) - Optional - A description for the voice model. - **tags** (list[str]) - Optional - Tags to associate with the voice model. - **visibility** (str) - Optional - Visibility setting for the voice model ('public', 'unlist', or 'private'). Defaults to 'private'. - **enhance_audio_quality** (bool) - Optional - Whether to enhance the audio quality of the samples. Defaults to False. - **texts** (list[str]) - Optional - Transcripts corresponding to the audio samples, used for higher quality cloning. ### Request Example ```python # Single audio sample with open("voice_sample.wav", "rb") as f: voice = client.voices.create( title="My Custom Voice", voices=[f.read()], description="A cloned voice for my assistant", tags=["custom", "english", "assistant"], visibility="private", # "public", "unlist", or "private" enhance_audio_quality=True, ) print(f"Created voice ID: {voice.id}") # Multiple samples with transcripts for better quality with open("sample1.wav", "rb") as f1, open("sample2.wav", "rb") as f2: voice = client.voices.create( title="Multi-Sample Voice", voices=[f1.read(), f2.read()], texts=[ "The quick brown fox jumps over the lazy dog.", "Pack my box with five dozen liquor jugs.", ], tags=["high-quality"], visibility="private", ) # Immediately use the created voice for TTS audio = client.tts.convert( text="Testing my newly cloned voice!", reference_id=voice.id, ) from fishaudio.utils import save save(audio, "cloned_test.mp3") ``` ### Response #### Success Response (200) - **id** (str) - The unique identifier of the newly created voice model. ``` -------------------------------- ### Create a New Voice Model Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Creates a new voice model by uploading one or more audio samples. The created voice's ID can be immediately used for TTS. Supports single or multiple samples with optional transcripts for better quality. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") # Single audio sample with open("voice_sample.wav", "rb") as f: voice = client.voices.create( title="My Custom Voice", voices=[f.read()], description="A cloned voice for my assistant", tags=["custom", "english", "assistant"], visibility="private", # "public", "unlist", or "private" enhance_audio_quality=True, ) print(f"Created voice ID: {voice.id}") # Multiple samples with transcripts for better quality with open("sample1.wav", "rb") as f1, open("sample2.wav", "rb") as f2: voice = client.voices.create( title="Multi-Sample Voice", voices=[f1.read(), f2.read()], texts=[ "The quick brown fox jumps over the lazy dog.", "Pack my box with five dozen liquor jugs.", ], tags=["high-quality"], visibility="private", ) # Immediately use the created voice for TTS audio = client.tts.convert( text="Testing my newly cloned voice!", reference_id=voice.id, ) from fishaudio.utils import save save(audio, "cloned_test.mp3") ``` -------------------------------- ### client.account.get_credits() Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Returns the current API credit balance as a Credits object. Optionally checks whether free credits are available. ```APIDOC ## client.account.get_credits(check_free_credit: bool = False) ### Description Returns the current API credit balance as a `Credits` object. Optionally checks whether free credits are available. ### Method GET ### Endpoint `/account/credits` ### Parameters #### Query Parameters - **check_free_credit** (bool) - Optional - If True, also checks for the availability of free credits. Defaults to False. ### Request Example ```python credits = client.account.get_credits() print(f"Available credits: {float(credits.credit)}") # Check free credit availability credits = client.account.get_credits(check_free_credit=True) if credits.has_free_credit: print("Free credits are available!") ``` ### Response #### Success Response (200) - **credit** (float) - The current API credit balance. - **has_free_credit** (bool) - True if free credits are available (only present if `check_free_credit` is True). ``` -------------------------------- ### client.tts.convert() — Text-to-Speech (Complete Audio) Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Converts text to speech and returns the complete audio as `bytes`. This method accepts optional voice reference IDs, audio format, speed, and a reusable `TTSConfig` object. It also supports instant voice cloning with reference audio. ```APIDOC ## `client.tts.convert()` — Text-to-Speech (Complete Audio) Converts text to speech and returns the complete audio as `bytes`. This is the simplest TTS method — it collects all streamed chunks internally and returns them at once. Accepts optional voice reference IDs, audio format, speed, and a reusable `TTSConfig` object. ```python from fishaudio import FishAudio from fishaudio.types import TTSConfig, Prosody, ReferenceAudio from fishaudio.utils import play, save client = FishAudio(api_key="your_api_key_here") # Basic synthesis audio = client.tts.convert(text="Hello, world!") save(audio, "output.mp3") # With a specific voice ID audio = client.tts.convert( text="Speaking with a custom voice.", reference_id="802e3bc2b27e49c2995d23ef70e6ac89" ) # With speed and format control audio = client.tts.convert( text="Speaking faster in WAV format!", speed=1.5, # 0.5–2.0 range format="wav", # "mp3", "wav", "pcm", "opus" latency="balanced", # "normal" or "balanced" ) play(audio) # Reusable TTSConfig across multiple requests config = TTSConfig( reference_id="933563129e564b19a115bedd57b7406a", format="wav", latency="balanced", prosody=Prosody(speed=1.2, volume=-5), temperature=0.7, top_p=0.7, ) audio1 = client.tts.convert(text="First sentence.", config=config) audio2 = client.tts.convert(text="Second sentence.", config=config) # Instant voice cloning with reference audio with open("reference.wav", "rb") as f: audio = client.tts.convert( text="Cloned voice speaking this text.", references=[ReferenceAudio( audio=f.read(), text="Exact words spoken in the reference audio file." )] ) save(audio, "cloned.mp3") ``` ``` -------------------------------- ### Async Streaming TTS Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Demonstrates asynchronous streaming of text-to-speech audio chunks. Use this for applications requiring real-time audio processing without blocking the main thread. ```python import asyncio from fishaudio import AsyncFishAudio async def stream_audio(): client = AsyncFishAudio(api_key="your_api_key_here") audio_stream = await client.tts.stream(text="Async chunk streaming.") async for chunk in audio_stream: # Send each chunk to a WebSocket, queue, etc. print(f"Got chunk: {len(chunk)} bytes") # Or collect stream = await client.tts.stream(text="Collect async.") audio_bytes = await stream.collect() asyncio.run(stream_audio()) ``` -------------------------------- ### Persistent Voice Model Creation and Usage Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Create a persistent voice model for reuse by uploading a voice sample. This model can then be used for multiple text-to-speech conversions, saving time and resources. ```python # Create voice model for reuse with open("voice_sample.wav", "rb") as f: voice = client.voices.create( title="My Voice", voices=[f.read()], description="Custom voice clone" ) # Use the created model audio = client.tts.convert( text="Using my saved voice", reference_id=voice.id ) ``` -------------------------------- ### Use a Specific Voice Model Source: https://github.com/fishaudio/fish-audio-python/blob/main/examples/getting_started.ipynb Specify a custom voice model for text-to-speech conversion using the `reference_id` parameter. Replace the placeholder with your actual voice model ID. ```python # Replace with your voice model ID # audio = client.tts.convert( # text="Hello from a custom voice!", # reference_id="your-voice-model-id" # ) # play(audio, notebook=True) ``` -------------------------------- ### Convert Text to Speech and Play Source: https://github.com/fishaudio/fish-audio-python/blob/main/examples/getting_started.ipynb Convert a given text string into speech and play it directly within the notebook environment. This is the most basic TTS operation. ```python audio = client.tts.convert(text="Hello! Welcome to Fish Audio.") play(audio, notebook=True) ``` -------------------------------- ### client.voices.list() — List Voice Models Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Lists available voice models from the Fish Audio platform with pagination and filtering support. Returns a `PaginatedResponse[Voice]` with `total` count and `items` list. ```APIDOC ## `client.voices.list()` — List Voice Models Lists available voice models from the Fish Audio platform with pagination and filtering support. Returns a `PaginatedResponse[Voice]` with `total` count and `items` list. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") # List first page of voices voices = client.voices.list(page_size=20, page_number=1) print(f"Total voices: {voices.total}") for voice in voices.items: print(f" {voice.title} — ID: {voice.id}") ``` ``` -------------------------------- ### List Voice Models Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Lists available voice models from the Fish Audio platform with support for pagination and filtering. Returns a `PaginatedResponse[Voice]` containing the total count and a list of voice items. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") # List first page of voices voices = client.voices.list(page_size=20, page_number=1) print(f"Total voices: {voices.total}") for voice in voices.items: print(f" {voice.title} — ID: {voice.id}") ``` -------------------------------- ### client.voices.update() Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Updates the title, description, tags, visibility, or cover image of an existing voice model. ```APIDOC ## client.voices.update(voice_id: str, ...) ### Description Updates the title, description, tags, visibility, or cover image of an existing voice model. ### Method PUT ### Endpoint `/voices/{voice_id}` ### Parameters #### Path Parameters - **voice_id** (str) - Required - The unique identifier of the voice model to update. #### Request Body - **title** (str) - Optional - The new title for the voice model. - **description** (str) - Optional - The new description for the voice model. - **tags** (list[str]) - Optional - The new list of tags for the voice model. - **visibility** (str) - Optional - The new visibility setting ('public', 'unlist', or 'private'). - **cover_image** (bytes) - Optional - The new cover image for the voice model in bytes. ### Request Example ```python # Update title and make public client.voices.update( "802e3bc2b27e49c2995d23ef70e6ac89", title="Refined Voice v2", description="Updated description with improvements.", visibility="public", tags=["english", "narrator", "professional"], ) # Update cover image with open("cover.png", "rb") as f: client.voices.update( "802e3bc2b27e49c2995d23ef70e6ac89", cover_image=f.read(), ) ``` ``` -------------------------------- ### client.voices.get() Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Fetches the full metadata for a single voice model by its ID. ```APIDOC ## client.voices.get(voice_id: str) ### Description Fetches the full metadata for a single voice model by its ID. ### Method GET ### Endpoint `/voices/{voice_id}` ### Parameters #### Path Parameters - **voice_id** (str) - Required - The unique identifier of the voice model. ### Request Example ```python voice = client.voices.get("802e3bc2b27e49c2995d23ef70e6ac89") print(voice.title) print(voice.description) print(voice.id) ``` ### Response #### Success Response (200) - **title** (str) - The title of the voice. - **description** (str) - The description of the voice. - **id** (str) - The unique identifier of the voice. ``` -------------------------------- ### Reusable TTS Configuration Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Define and reuse a TTS configuration object for consistent prosody, voice, format, and latency across multiple generations. ```python from fishaudio.types import TTSConfig, Prosody config = TTSConfig( prosody=Prosody(speed=1.2, volume=-5), reference_id="933563129e564b19a115bedd57b7406a", format="wav", latency="balanced" ) # Reuse across generations audio1 = client.tts.convert(text="First message", config=config) audio2 = client.tts.convert(text="Second message", config=config) ``` -------------------------------- ### Text-to-Speech with Custom Voice Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Generate speech using a specific voice by providing its reference ID. ```python # Use a specific voice by ID audio = client.tts.convert( text="Custom voice", reference_id="802e3bc2b27e49c2995d23ef70e6ac89" ) ``` -------------------------------- ### Text-to-Speech with Speed Control Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Adjust the speaking rate of the generated audio by setting the 'speed' parameter. ```python audio = client.tts.convert( text="Speaking faster!", speed=1.5 # 1.5x speed ) ``` -------------------------------- ### Error Handling for Fish Audio API Calls Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Implement robust error handling for API requests to gracefully manage issues such as authentication failures, rate limits, validation errors, and general API errors. ```python from fishaudio.exceptions import ( AuthenticationError, RateLimitError, ValidationError, FishAudioError ) try: audio = client.tts.convert(text="Hello!") except AuthenticationError: print("Invalid API key") except RateLimitError: print("Rate limit exceeded") except ValidationError as e: print(f"Invalid request: {e}") except FishAudioError as e: print(f"API error: {e}") ``` -------------------------------- ### Text-to-Speech Chunk Streaming Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Stream audio content in chunks as they are generated, suitable for real-time processing or collecting the full audio data. ```python # Stream and process chunks as they arrive for chunk in client.tts.stream(text="Long content..."): send_to_websocket(chunk) # Or collect all chunks audio = client.tts.stream(text="Hello!").collect() ``` -------------------------------- ### Stream Audio for Long Text Source: https://github.com/fishaudio/fish-audio-python/blob/main/examples/getting_started.ipynb Use the `stream()` method for longer text inputs to process audio in chunks as they become available. The `collect()` method gathers all streamed audio. ```python stream = client.tts.stream(text="This is a longer piece of text that will be streamed.") audio = stream.collect() play(audio, notebook=True) ``` -------------------------------- ### List Voices with Filters Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Filter voices by tags and language, or retrieve only your own voices. Voices can also be searched by title. ```python english_male = client.voices.list( tags=["male", "english"], language="en", sort_by="task_count", # or "created_at" ) # Only show your own voices my_voices = client.voices.list(self_only=True, page_size=50) # Search by title results = client.voices.list(title="narrator") ``` -------------------------------- ### Update Voice Metadata Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Updates the title, description, tags, visibility, or cover image of an existing voice model. Requires the voice ID and the FishAudio client initialized with an API key. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") # Update title and make public client.voices.update( "802e3bc2b27e49c2995d23ef70e6ac89", title="Refined Voice v2", description="Updated description with improvements.", visibility="public", tags=["english", "narrator", "professional"], ) # Update cover image with open("cover.png", "rb") as f: client.voices.update( "802e3bc2b27e49c2995d23ef70e6ac89", cover_image=f.read(), ) ``` -------------------------------- ### Synchronous TTS WebSocket Streaming Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Stream audio dynamically generated from text chunks using a synchronous WebSocket connection. The 'play' utility can consume the stream directly. ```python def text_chunks(): yield "Hello, " yield "this is " yield "streaming!" audio_stream = client.tts.stream_websocket(text_chunks(), latency="balanced") play(audio_stream) ``` -------------------------------- ### Asynchronous TTS WebSocket Streaming Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Stream audio dynamically generated from text chunks using an asynchronous WebSocket connection within an asyncio context. ```python async def text_chunks(): yield "Hello, " yield "this is " yield "streaming!" audio_stream = await client.tts.stream_websocket(text_chunks(), latency="balanced") play(audio_stream) ``` -------------------------------- ### Text-to-Speech Conversion Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Generate speech from text using the TTS service. Supports custom voices, speed control, reusable configurations, and streaming. ```APIDOC ## Text-to-Speech Conversion ### Description Converts text into speech. ### Method `client.tts.convert(text: str, reference_id: Optional[str] = None, speed: Optional[float] = None, config: Optional[TTSConfig] = None, format: Optional[str] = None, latency: Optional[str] = None) -> AudioData` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python from fishaudio import FishAudio client = FishAudio() audio = client.tts.convert(text="Hello, world!") ``` ### Response #### Success Response (200) - **audio_data** (AudioData) - The generated audio data. ### Response Example ```json { "audio_data": "..." } ``` ## Text-to-Speech with Custom Voice ### Description Generates speech using a specific voice identified by `reference_id`. ### Method `client.tts.convert(text: str, reference_id: str, ...)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python audio = client.tts.convert( text="Custom voice", reference_id="802e3bc2b27e49c2995d23ef70e6ac89" ) ``` ## Text-to-Speech with Speed Control ### Description Generates speech with adjustable speaking speed. ### Method `client.tts.convert(text: str, speed: float, ...)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python audio = client.tts.convert( text="Speaking faster!", speed=1.5 # 1.5x speed ) ``` ## Text-to-Speech with Reusable Configuration ### Description Applies a reusable TTS configuration for consistent audio generation. ### Method `client.tts.convert(text: str, config: TTSConfig, ...)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python from fishaudio.types import TTSConfig, Prosody config = TTSConfig( prosody=Prosody(speed=1.2, volume=-5), reference_id="933563129e564b19a115bedd57b7406a", format="wav", latency="balanced" ) audio1 = client.tts.convert(text="First message", config=config) audio2 = client.tts.convert(text="Second message", config=config) ``` ## Text-to-Speech Streaming ### Description Streams audio data in chunks as it is generated, suitable for processing large amounts of text or real-time applications. ### Method `client.tts.stream(text: str) -> Iterator[AudioChunk]` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python # Stream and process chunks as they arrive for chunk in client.tts.stream(text="Long content..."): send_to_websocket(chunk) # Or collect all chunks audio = client.tts.stream(text="Hello!").collect() ``` ``` -------------------------------- ### Text-to-Speech Streaming (Chunked Audio) Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Streams TTS audio in chunks, suitable for low-latency applications or incremental file writing. Chunks can be processed as they arrive or collected into a single bytes object. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") # Process chunks as they arrive with open("streamed_output.mp3", "wb") as f: for chunk in client.tts.stream(text="Streaming audio chunk by chunk."): f.write(chunk) # Collect all chunks at once (equivalent to convert()) audio = client.tts.stream( text="Collect me into bytes.", format="wav", reference_id="802e3bc2b27e49c2995d23ef70e6ac89", ).collect() ``` -------------------------------- ### Real-time Streaming TTS Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Streams audio generated from text in real-time, suitable for conversational AI and live applications. ```APIDOC ## Real-time Streaming TTS (Synchronous) ### Description Streams audio generated from a sequence of text chunks using a synchronous client. ### Method `client.tts.stream_websocket(text_chunks: Iterable[str], latency: Optional[str] = None) -> AudioStream` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python def text_chunks(): yield "Hello, " yield "this is " yield "streaming!" audio_stream = client.tts.stream_websocket(text_chunks(), latency="balanced") play(audio_stream) ``` ## Real-time Streaming TTS (Asynchronous) ### Description Streams audio generated from a sequence of text chunks using an asynchronous client. ### Method `await client.tts.stream_websocket(text_chunks: AsyncIterable[str], latency: Optional[str] = None) -> AudioStream` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python async def text_chunks(): yield "Hello, " yield "this is " yield "streaming!" audio_stream = await client.tts.stream_websocket(text_chunks(), latency="balanced") play(audio_stream) ``` ``` -------------------------------- ### client.tts.stream_websocket() — Real-Time WebSocket Streaming Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Streams dynamically generated text into a WebSocket and receives audio output in real-time as an iterator of `bytes`. Ideal for conversational AI pipelines where text is produced token-by-token from an LLM. Accepts plain strings, `TextEvent`, or `FlushEvent` objects. ```APIDOC ## `client.tts.stream_websocket()` — Real-Time WebSocket Streaming Streams dynamically generated text into a WebSocket and receives audio output in real-time as an iterator of `bytes`. Ideal for conversational AI pipelines where text is produced token-by-token from an LLM. Accepts plain strings, `TextEvent`, or `FlushEvent` objects. ```python from fishaudio import FishAudio, TextEvent, FlushEvent, WebSocketOptions from fishaudio.utils import play client = FishAudio(api_key="your_api_key_here") # Generator that yields text chunks (e.g., LLM output) def llm_output(): yield "Hello, " yield "this is real-time " yield "streaming text-to-speech!" # Collect and play the streamed audio audio_chunks = list(client.tts.stream_websocket(llm_output())) play(b""..join(audio_chunks)) # Write chunks to file as they arrive with open("realtime.mp3", "wb") as f: for chunk in client.tts.stream_websocket( llm_output(), reference_id="802e3bc2b27e49c2995d23ef70e6ac89", format="mp3", speed=1.0, latency="balanced", ): f.write(chunk) # Using FlushEvent to force immediate synthesis of buffered text def text_with_flush(): yield TextEvent(text="Urgent line one.") yield FlushEvent() # Force synthesis now yield TextEvent(text="Line two follows.") for chunk in client.tts.stream_websocket(text_with_flush()): pass # process chunk # Long-running generations: increase WebSocket timeout ws_options = WebSocketOptions( keepalive_ping_timeout_seconds=60.0, keepalive_ping_interval_seconds=30.0, max_message_size_bytes=1024 * 1024, # 1 MiB ) for chunk in client.tts.stream_websocket(llm_output(), ws_options=ws_options): pass # Async WebSocket streaming import asyncio from fishaudio import AsyncFishAudio import aiofiles async def async_ws(): client = AsyncFishAudio(api_key="your_api_key_here") async def text_gen(): yield "Async streaming " yield "with WebSocket!" async with aiofiles.open("async_realtime.mp3", "wb") as f: async for chunk in client.tts.stream_websocket(text_gen(), format="mp3"): await f.write(chunk) asyncio.run(async_ws()) ``` ``` -------------------------------- ### Speech-to-Text Transcription Source: https://github.com/fishaudio/fish-audio-python/blob/main/README.md Transcribe an audio file by reading its content and specifying the language. Access the full text and timestamped segments. ```python # Transcribe audio with open("audio.wav", "rb") as f: result = client.asr.transcribe(audio=f.read(), language="en") print(result.text) # Access timestamped segments for segment in result.segments: print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}") ``` -------------------------------- ### client.tts.stream() — Text-to-Speech (Chunked Streaming) Source: https://context7.com/fishaudio/fish-audio-python/llms.txt Streams TTS audio as an `AudioStream`, which is an iterable of `bytes` chunks. This is useful for low-latency processing or incremental writing. The `.collect()` method can be used to gather all chunks into a single `bytes` object. ```APIDOC ## `client.tts.stream()` — Text-to-Speech (Chunked Streaming) Streams TTS audio as an `AudioStream` — an iterable of `bytes` chunks. Useful for low-latency processing, forwarding audio to a WebSocket, or writing chunks incrementally to a file. Calling `.collect()` on the stream collects all chunks into a single `bytes` object. ```python from fishaudio import FishAudio client = FishAudio(api_key="your_api_key_here") # Process chunks as they arrive with open("streamed_output.mp3", "wb") as f: for chunk in client.tts.stream(text="Streaming audio chunk by chunk."): f.write(chunk) # Collect all chunks at once (equivalent to convert()) audio = client.tts.stream( text="Collect me into bytes.", format="wav", reference_id="802e3bc2b27e49c2995d23ef70e6ac89", ).collect() ``` ```