# ElevenLabs Python SDK The ElevenLabs Python SDK is the official client library for accessing ElevenLabs' AI-powered audio generation services. It provides developers with a comprehensive interface to convert text to natural-sounding speech, clone voices, generate sound effects, transcribe audio, dub content into multiple languages, and build conversational AI agents. The SDK supports both synchronous and asynchronous operations, making it suitable for a wide range of applications from simple scripts to production-grade systems. The library offers multiple voice generation models optimized for different use cases: Eleven v3 for dramatic performances across 70+ languages, Eleven Multilingual v2 for stability and accent accuracy in 29 languages, and Flash v2.5/Turbo v2.5 for low-latency applications. Key features include real-time audio streaming, voice cloning from audio samples, pronunciation dictionaries, conversational AI with custom tool integration, and comprehensive history management for all generated content. ## Installation ```bash pip install elevenlabs ``` ## Client Initialization Initialize the ElevenLabs client with your API key to access all SDK features. ```python from elevenlabs import ElevenLabs # Initialize with API key (can also use ELEVENLABS_API_KEY environment variable) client = ElevenLabs( api_key="YOUR_API_KEY", ) ``` ## Text-to-Speech - Convert Convert text into speech using a voice of your choice. Returns audio in the specified format that can be played or saved. ```python from elevenlabs import ElevenLabs from elevenlabs import play, save client = ElevenLabs(api_key="YOUR_API_KEY") # Generate speech from text audio = client.text_to_speech.convert( voice_id="JBFqnCBsd6RMkjVDRZzb", # George voice text="The first move is what sets everything in motion.", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) # Play the generated audio play(audio) # Or save to file save(audio, "output.mp3") ``` ## Text-to-Speech - Stream Stream audio in real-time as it's being generated, ideal for low-latency applications. ```python from elevenlabs import ElevenLabs, stream client = ElevenLabs(api_key="YOUR_API_KEY") # Stream audio in real-time audio_stream = client.text_to_speech.stream( voice_id="JBFqnCBsd6RMkjVDRZzb", text="This is a streaming example that generates audio in real-time.", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) # Play streamed audio stream(audio_stream) # Or process chunks manually for chunk in audio_stream: if isinstance(chunk, bytes): # Process audio chunk (e.g., send to audio player) print(f"Received {len(chunk)} bytes") ``` ## Text-to-Speech with Timestamps Generate speech with precise character-level timing information for audio-text synchronization. ```python from elevenlabs import ElevenLabs client = ElevenLabs(api_key="YOUR_API_KEY") # Generate speech with timing data response = client.text_to_speech.convert_with_timestamps( voice_id="21m00Tcm4TlvDq8ikWAM", text="This is a test for the API of ElevenLabs.", output_format="mp3_44100_128", model_id="eleven_multilingual_v2", ) # Access audio and alignment data audio_base64 = response.audio_base64 alignment = response.alignment # Alignment contains character-level timing print(f"Characters: {alignment.characters}") print(f"Start times: {alignment.character_start_times_ms}") print(f"Durations: {alignment.character_durations_ms}") ``` ## Voice Management - List Voices List all available voices in your account with their settings and metadata. ```python from elevenlabs import ElevenLabs client = ElevenLabs(api_key="YOUR_API_KEY") # Get all available voices response = client.voices.search() for voice in response.voices: print(f"Voice: {voice.name}") print(f" ID: {voice.voice_id}") print(f" Category: {voice.category}") print(f" Labels: {voice.labels}") print() ``` ## Voice Cloning Clone a voice from audio samples to create a custom voice for your applications. ```python from elevenlabs import ElevenLabs from elevenlabs import play client = ElevenLabs(api_key="YOUR_API_KEY") # Create a cloned voice from samples voice = client.voices.ivc.create( name="Alex", description="An old American male voice with a slight hoarseness", files=["./sample_0.mp3", "./sample_1.mp3", "./sample_2.mp3"], ) print(f"Created voice: {voice.name} (ID: {voice.voice_id})") # Use the cloned voice for text-to-speech audio = client.text_to_speech.convert( voice_id=voice.voice_id, text="Hello, this is my cloned voice speaking.", model_id="eleven_multilingual_v2", ) play(audio) ``` ## Text-to-Sound Effects Generate sound effects from text descriptions for videos, games, and other media. ```python from elevenlabs import ElevenLabs from elevenlabs import play client = ElevenLabs(api_key="YOUR_API_KEY") # Generate sound effect from description audio = client.text_to_sound_effects.convert( text="Spacious braam suitable for high-impact movie trailer moments", duration_seconds=5.0, prompt_influence=0.3, ) play(audio) # Generate a looping sound effect looping_audio = client.text_to_sound_effects.convert( text="Gentle forest ambience with birds chirping", duration_seconds=10.0, loop=True, ) play(looping_audio) ``` ## Speech-to-Text Transcribe audio files to text with support for multiple languages and formats. ```python from elevenlabs import ElevenLabs client = ElevenLabs(api_key="YOUR_API_KEY") # Transcribe an audio file with open("audio.mp3", "rb") as audio_file: response = client.speech_to_text.convert( file=audio_file, model_id="scribe_v1", language_code="en", ) print(f"Transcription: {response.text}") print(f"Words: {response.words}") ``` ## Dubbing Dub audio or video content into different languages automatically. ```python from elevenlabs import ElevenLabs import time client = ElevenLabs(api_key="YOUR_API_KEY") # Start a dubbing job with open("video.mp4", "rb") as video_file: response = client.dubbing.dub( file=video_file, source_lang="en", target_lang="es", # Spanish ) dubbing_id = response.dubbing_id print(f"Dubbing job started: {dubbing_id}") # Poll for completion while True: status = client.dubbing.get(dubbing_id=dubbing_id) if status.status == "dubbed": print("Dubbing complete!") break elif status.status == "failed": print("Dubbing failed") break time.sleep(5) # Download the dubbed content dubbed_audio = client.dubbing.audio.get(dubbing_id=dubbing_id, language_code="es") ``` ## History Management Access and manage your generated audio history. ```python from elevenlabs import ElevenLabs client = ElevenLabs(api_key="YOUR_API_KEY") # List recent generations history = client.history.list( page_size=10, sort_direction="desc", ) for item in history.history: print(f"ID: {item.history_item_id}") print(f"Text: {item.text[:50]}...") print(f"Voice: {item.voice_name}") print(f"Created: {item.date_unix}") print() # Get audio for a specific history item audio = client.history.get_audio(history_item_id="VW7YKqPnjY4h39yTbx2L") # Download multiple history items as ZIP audio_zip = client.history.download( history_item_ids=["item1", "item2", "item3"], output_format="wav", ) # Delete a history item client.history.delete(history_item_id="VW7YKqPnjY4h39yTbx2L") ``` ## Models List available TTS models and their capabilities. ```python from elevenlabs import ElevenLabs client = ElevenLabs(api_key="YOUR_API_KEY") # List all available models models = client.models.list() for model in models: print(f"Model: {model.name}") print(f" ID: {model.model_id}") print(f" Description: {model.description}") print(f" Languages: {[lang.language_id for lang in model.languages]}") print(f" Can do TTS: {model.can_do_text_to_speech}") print() # Main models: # - eleven_v3: Dramatic delivery, 70+ languages # - eleven_multilingual_v2: Stability and accuracy, 29 languages # - eleven_flash_v2_5: Ultra-low latency, 32 languages # - eleven_turbo_v2_5: Balanced quality/latency, 32 languages ``` ## Async Client Use the async client for non-blocking API calls in async applications. ```python import asyncio from elevenlabs import AsyncElevenLabs async def generate_speech(): client = AsyncElevenLabs(api_key="YOUR_API_KEY") # List models asynchronously models = await client.models.list() print(f"Found {len(models)} models") # Generate speech asynchronously audio = await client.text_to_speech.convert( voice_id="JBFqnCBsd6RMkjVDRZzb", text="This is async speech generation.", model_id="eleven_multilingual_v2", ) # Save the audio with open("async_output.mp3", "wb") as f: async for chunk in audio: f.write(chunk) print("Audio saved!") asyncio.run(generate_speech()) ``` ## Conversational AI - Basic Usage Build interactive AI agents with real-time audio capabilities using ElevenAgents. ```python from elevenlabs import ElevenLabs from elevenlabs.conversational_ai.conversation import Conversation from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface client = ElevenLabs(api_key="YOUR_API_KEY") # Create audio interface for real-time audio input/output audio_interface = DefaultAudioInterface() # Create conversation with callbacks conversation = Conversation( client=client, agent_id="your-agent-id", requires_auth=True, audio_interface=audio_interface, callback_agent_response=lambda response: print(f"Agent: {response}"), callback_user_transcript=lambda transcript: print(f"User: {transcript}"), ) # Start the conversation conversation.start_session() # The conversation runs until you call: # conversation.end_session() # Wait for session to end and get conversation ID conversation_id = conversation.wait_for_session_end() print(f"Conversation ID: {conversation_id}") ``` ## Conversational AI - Text-Only Chat Mode Use conversational AI in text-only mode without audio interface. ```python from elevenlabs import ElevenLabs from elevenlabs.conversational_ai.conversation import Conversation client = ElevenLabs(api_key="YOUR_API_KEY") # Text-only mode (no audio_interface) conversation = Conversation( client=client, agent_id="your-agent-id", requires_auth=True, callback_agent_response=lambda response: print(f"Agent: {response}"), ) conversation.start_session() # Send text messages conversation.send_user_message("Hello, how are you?") conversation.send_user_message("Tell me about the weather") # Send contextual updates (non-interrupting) conversation.send_contextual_update("User is located in New York") # End session conversation.end_session() ``` ## Conversational AI - Custom Tools Register custom tools that the AI agent can call during conversations. ```python import asyncio from elevenlabs import ElevenLabs from elevenlabs.conversational_ai.conversation import Conversation, ClientTools client = ElevenLabs(api_key="YOUR_API_KEY") # Create client tools client_tools = ClientTools() # Register a sync tool def calculate_sum(params): numbers = params.get("numbers", []) result = sum(numbers) return f"The sum is {result}" client_tools.register("calculate_sum", calculate_sum, is_async=False) # Register an async tool async def fetch_weather(params): location = params.get("location", "Unknown") # Simulate API call await asyncio.sleep(0.1) return f"Weather in {location}: Sunny, 72F" client_tools.register("fetch_weather", fetch_weather, is_async=True) # Use tools in conversation conversation = Conversation( client=client, agent_id="your-agent-id", requires_auth=True, client_tools=client_tools, callback_agent_response=lambda response: print(f"Agent: {response}"), ) conversation.start_session() # The agent can now call registered tools during the conversation ``` ## Conversational AI - Async Conversation Use async conversation for better integration with async applications. ```python import asyncio from elevenlabs import ElevenLabs from elevenlabs.conversational_ai.conversation import AsyncConversation async def run_conversation(): client = ElevenLabs(api_key="YOUR_API_KEY") conversation = AsyncConversation( client=client, agent_id="your-agent-id", requires_auth=True, callback_agent_response=async_callback, callback_user_transcript=async_user_callback, ) await conversation.start_session() # Send messages await conversation.send_user_message("Hello!") # Wait some time for responses await asyncio.sleep(5) # End session await conversation.end_session() conversation_id = await conversation.wait_for_session_end() print(f"Conversation ended: {conversation_id}") async def async_callback(response): print(f"Agent: {response}") async def async_user_callback(transcript): print(f"User: {transcript}") asyncio.run(run_conversation()) ``` ## Pronunciation Dictionaries Create and manage pronunciation dictionaries for consistent speech output. ```python from elevenlabs import ElevenLabs client = ElevenLabs(api_key="YOUR_API_KEY") # Create a pronunciation dictionary from rules dictionary = client.pronunciation_dictionaries.create_from_rules( name="Tech Terms", rules=[ {"type": "alias", "string_to_replace": "API", "alias": "A P I"}, {"type": "alias", "string_to_replace": "SDK", "alias": "S D K"}, {"type": "phoneme", "string_to_replace": "ElevenLabs", "phoneme": "ɪˈlevənlæbz", "alphabet": "ipa"}, ], ) print(f"Dictionary ID: {dictionary.id}") # Use dictionary in TTS audio = client.text_to_speech.convert( voice_id="JBFqnCBsd6RMkjVDRZzb", text="The ElevenLabs API and SDK are powerful tools.", model_id="eleven_multilingual_v2", pronunciation_dictionary_locators=[ {"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id} ], ) ``` ## Voice Settings Customize voice parameters for fine-tuned audio output. ```python from elevenlabs import ElevenLabs from elevenlabs.types import VoiceSettings client = ElevenLabs(api_key="YOUR_API_KEY") # Generate speech with custom voice settings audio = client.text_to_speech.convert( voice_id="JBFqnCBsd6RMkjVDRZzb", text="This is a customized voice output with adjusted settings.", model_id="eleven_multilingual_v2", voice_settings=VoiceSettings( stability=0.5, # 0-1: Lower = more variable, Higher = more consistent similarity_boost=0.75, # 0-1: How closely to match the original voice style=0.0, # 0-1: Style exaggeration (model dependent) use_speaker_boost=True, # Boost speaker clarity ), ) ``` ## User and Subscription Information Access account and subscription details. ```python from elevenlabs import ElevenLabs client = ElevenLabs(api_key="YOUR_API_KEY") # Get user information user = client.user.get() print(f"User ID: {user.xi_api_key}") # Get subscription details subscription = client.user.get_subscription() print(f"Tier: {subscription.tier}") print(f"Character count: {subscription.character_count}") print(f"Character limit: {subscription.character_limit}") print(f"Voice limit: {subscription.voice_limit}") ``` ## Error Handling Handle common errors gracefully in your applications. ```python from elevenlabs import ElevenLabs from elevenlabs.errors import ( BadRequestError, UnauthorizedError, ForbiddenError, NotFoundError, UnprocessableEntityError, ) client = ElevenLabs(api_key="YOUR_API_KEY") try: audio = client.text_to_speech.convert( voice_id="invalid-voice-id", text="Test", model_id="eleven_multilingual_v2", ) except UnauthorizedError: print("Invalid API key") except NotFoundError: print("Voice not found") except BadRequestError as e: print(f"Bad request: {e}") except UnprocessableEntityError as e: print(f"Validation error: {e}") except Exception as e: print(f"Unexpected error: {e}") ``` ## Summary The ElevenLabs Python SDK enables seamless integration of AI-powered voice and audio capabilities into Python applications. Core use cases include text-to-speech conversion with multiple premium voice models, real-time audio streaming for interactive applications, voice cloning for personalized experiences, sound effects generation, speech transcription, and multi-language dubbing. The conversational AI module allows building sophisticated voice agents with custom tool integration for applications like customer service bots, virtual assistants, and interactive voice response systems. Integration typically follows a simple pattern: initialize the client with an API key, select the appropriate voice and model, and call the conversion methods. For real-time applications, the streaming APIs and async client provide low-latency options. The SDK handles authentication, request formatting, and response parsing automatically, allowing developers to focus on application logic. Production deployments can leverage the comprehensive history API for auditing and the pronunciation dictionary system for domain-specific terminology. Enterprise features include zero-retention mode for privacy-sensitive applications and workspace management for team collaboration.