# ElevenLabs Python SDK

The ElevenLabs Python SDK is the official client library for accessing ElevenLabs' AI-powered audio generation services. It provides developers with a comprehensive interface to convert text to natural-sounding speech, clone voices, generate sound effects, transcribe audio, dub content into multiple languages, and build conversational AI agents. The SDK supports both synchronous and asynchronous operations, making it suitable for a wide range of applications from simple scripts to production-grade systems.

The library offers multiple voice generation models optimized for different use cases: Eleven v3 for dramatic performances across 70+ languages, Eleven Multilingual v2 for stability and accent accuracy in 29 languages, and Flash v2.5/Turbo v2.5 for low-latency applications. Key features include real-time audio streaming, voice cloning from audio samples, pronunciation dictionaries, conversational AI with custom tool integration, and comprehensive history management for all generated content.

## Installation

```bash
pip install elevenlabs
```

## Client Initialization

Initialize the ElevenLabs client with your API key to access all SDK features.

```python
from elevenlabs import ElevenLabs

# Initialize with API key (can also use ELEVENLABS_API_KEY environment variable)
client = ElevenLabs(
    api_key="YOUR_API_KEY",
)
```

## Text-to-Speech - Convert

Convert text into speech using a voice of your choice. Returns audio in the specified format that can be played or saved.

```python
from elevenlabs import ElevenLabs
from elevenlabs import play, save

client = ElevenLabs(api_key="YOUR_API_KEY")

# Generate speech from text
audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George voice
    text="The first move is what sets everything in motion.",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
)

# Play the generated audio
play(audio)

# Or save to file
save(audio, "output.mp3")
```

## Text-to-Speech - Stream

Stream audio in real-time as it's being generated, ideal for low-latency applications.

```python
from elevenlabs import ElevenLabs, stream

client = ElevenLabs(api_key="YOUR_API_KEY")

# Stream audio in real-time
audio_stream = client.text_to_speech.stream(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="This is a streaming example that generates audio in real-time.",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
)

# Play streamed audio
stream(audio_stream)

# Or process chunks manually
for chunk in audio_stream:
    if isinstance(chunk, bytes):
        # Process audio chunk (e.g., send to audio player)
        print(f"Received {len(chunk)} bytes")
```

## Text-to-Speech with Timestamps

Generate speech with precise character-level timing information for audio-text synchronization.

```python
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

# Generate speech with timing data
response = client.text_to_speech.convert_with_timestamps(
    voice_id="21m00Tcm4TlvDq8ikWAM",
    text="This is a test for the API of ElevenLabs.",
    output_format="mp3_44100_128",
    model_id="eleven_multilingual_v2",
)

# Access audio and alignment data
audio_base64 = response.audio_base64
alignment = response.alignment

# Alignment contains character-level timing
print(f"Characters: {alignment.characters}")
print(f"Start times: {alignment.character_start_times_ms}")
print(f"Durations: {alignment.character_durations_ms}")
```

## Voice Management - List Voices

List all available voices in your account with their settings and metadata.

```python
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

# Get all available voices
response = client.voices.search()

for voice in response.voices:
    print(f"Voice: {voice.name}")
    print(f"  ID: {voice.voice_id}")
    print(f"  Category: {voice.category}")
    print(f"  Labels: {voice.labels}")
    print()
```

## Voice Cloning

Clone a voice from audio samples to create a custom voice for your applications.

```python
from elevenlabs import ElevenLabs
from elevenlabs import play

client = ElevenLabs(api_key="YOUR_API_KEY")

# Create a cloned voice from samples
voice = client.voices.ivc.create(
    name="Alex",
    description="An old American male voice with a slight hoarseness",
    files=["./sample_0.mp3", "./sample_1.mp3", "./sample_2.mp3"],
)

print(f"Created voice: {voice.name} (ID: {voice.voice_id})")

# Use the cloned voice for text-to-speech
audio = client.text_to_speech.convert(
    voice_id=voice.voice_id,
    text="Hello, this is my cloned voice speaking.",
    model_id="eleven_multilingual_v2",
)

play(audio)
```

## Text-to-Sound Effects

Generate sound effects from text descriptions for videos, games, and other media.

```python
from elevenlabs import ElevenLabs
from elevenlabs import play

client = ElevenLabs(api_key="YOUR_API_KEY")

# Generate sound effect from description
audio = client.text_to_sound_effects.convert(
    text="Spacious braam suitable for high-impact movie trailer moments",
    duration_seconds=5.0,
    prompt_influence=0.3,
)

play(audio)

# Generate a looping sound effect
looping_audio = client.text_to_sound_effects.convert(
    text="Gentle forest ambience with birds chirping",
    duration_seconds=10.0,
    loop=True,
)

play(looping_audio)
```

## Speech-to-Text

Transcribe audio files to text with support for multiple languages and formats.

```python
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

# Transcribe an audio file
with open("audio.mp3", "rb") as audio_file:
    response = client.speech_to_text.convert(
        file=audio_file,
        model_id="scribe_v1",
        language_code="en",
    )

print(f"Transcription: {response.text}")
print(f"Words: {response.words}")
```

## Dubbing

Dub audio or video content into different languages automatically.

```python
from elevenlabs import ElevenLabs
import time

client = ElevenLabs(api_key="YOUR_API_KEY")

# Start a dubbing job
with open("video.mp4", "rb") as video_file:
    response = client.dubbing.dub(
        file=video_file,
        source_lang="en",
        target_lang="es",  # Spanish
    )

dubbing_id = response.dubbing_id
print(f"Dubbing job started: {dubbing_id}")

# Poll for completion
while True:
    status = client.dubbing.get(dubbing_id=dubbing_id)
    if status.status == "dubbed":
        print("Dubbing complete!")
        break
    elif status.status == "failed":
        print("Dubbing failed")
        break
    time.sleep(5)

# Download the dubbed content
dubbed_audio = client.dubbing.audio.get(dubbing_id=dubbing_id, language_code="es")
```

## History Management

Access and manage your generated audio history.

```python
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

# List recent generations
history = client.history.list(
    page_size=10,
    sort_direction="desc",
)

for item in history.history:
    print(f"ID: {item.history_item_id}")
    print(f"Text: {item.text[:50]}...")
    print(f"Voice: {item.voice_name}")
    print(f"Created: {item.date_unix}")
    print()

# Get audio for a specific history item
audio = client.history.get_audio(history_item_id="VW7YKqPnjY4h39yTbx2L")

# Download multiple history items as ZIP
audio_zip = client.history.download(
    history_item_ids=["item1", "item2", "item3"],
    output_format="wav",
)

# Delete a history item
client.history.delete(history_item_id="VW7YKqPnjY4h39yTbx2L")
```

## Models

List available TTS models and their capabilities.

```python
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

# List all available models
models = client.models.list()

for model in models:
    print(f"Model: {model.name}")
    print(f"  ID: {model.model_id}")
    print(f"  Description: {model.description}")
    print(f"  Languages: {[lang.language_id for lang in model.languages]}")
    print(f"  Can do TTS: {model.can_do_text_to_speech}")
    print()

# Main models:
# - eleven_v3: Dramatic delivery, 70+ languages
# - eleven_multilingual_v2: Stability and accuracy, 29 languages
# - eleven_flash_v2_5: Ultra-low latency, 32 languages
# - eleven_turbo_v2_5: Balanced quality/latency, 32 languages
```

## Async Client

Use the async client for non-blocking API calls in async applications.

```python
import asyncio
from elevenlabs import AsyncElevenLabs

async def generate_speech():
    client = AsyncElevenLabs(api_key="YOUR_API_KEY")

    # List models asynchronously
    models = await client.models.list()
    print(f"Found {len(models)} models")

    # Generate speech asynchronously
    audio = await client.text_to_speech.convert(
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        text="This is async speech generation.",
        model_id="eleven_multilingual_v2",
    )

    # Save the audio
    with open("async_output.mp3", "wb") as f:
        async for chunk in audio:
            f.write(chunk)

    print("Audio saved!")

asyncio.run(generate_speech())
```

## Conversational AI - Basic Usage

Build interactive AI agents with real-time audio capabilities using ElevenAgents.

```python
from elevenlabs import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface

client = ElevenLabs(api_key="YOUR_API_KEY")

# Create audio interface for real-time audio input/output
audio_interface = DefaultAudioInterface()

# Create conversation with callbacks
conversation = Conversation(
    client=client,
    agent_id="your-agent-id",
    requires_auth=True,
    audio_interface=audio_interface,
    callback_agent_response=lambda response: print(f"Agent: {response}"),
    callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
)

# Start the conversation
conversation.start_session()

# The conversation runs until you call:
# conversation.end_session()

# Wait for session to end and get conversation ID
conversation_id = conversation.wait_for_session_end()
print(f"Conversation ID: {conversation_id}")
```

## Conversational AI - Text-Only Chat Mode

Use conversational AI in text-only mode without audio interface.

```python
from elevenlabs import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation

client = ElevenLabs(api_key="YOUR_API_KEY")

# Text-only mode (no audio_interface)
conversation = Conversation(
    client=client,
    agent_id="your-agent-id",
    requires_auth=True,
    callback_agent_response=lambda response: print(f"Agent: {response}"),
)

conversation.start_session()

# Send text messages
conversation.send_user_message("Hello, how are you?")
conversation.send_user_message("Tell me about the weather")

# Send contextual updates (non-interrupting)
conversation.send_contextual_update("User is located in New York")

# End session
conversation.end_session()
```

## Conversational AI - Custom Tools

Register custom tools that the AI agent can call during conversations.

```python
import asyncio
from elevenlabs import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation, ClientTools

client = ElevenLabs(api_key="YOUR_API_KEY")

# Create client tools
client_tools = ClientTools()

# Register a sync tool
def calculate_sum(params):
    numbers = params.get("numbers", [])
    result = sum(numbers)
    return f"The sum is {result}"

client_tools.register("calculate_sum", calculate_sum, is_async=False)

# Register an async tool
async def fetch_weather(params):
    location = params.get("location", "Unknown")
    # Simulate API call
    await asyncio.sleep(0.1)
    return f"Weather in {location}: Sunny, 72F"

client_tools.register("fetch_weather", fetch_weather, is_async=True)

# Use tools in conversation
conversation = Conversation(
    client=client,
    agent_id="your-agent-id",
    requires_auth=True,
    client_tools=client_tools,
    callback_agent_response=lambda response: print(f"Agent: {response}"),
)

conversation.start_session()
# The agent can now call registered tools during the conversation
```

## Conversational AI - Async Conversation

Use async conversation for better integration with async applications.

```python
import asyncio
from elevenlabs import ElevenLabs
from elevenlabs.conversational_ai.conversation import AsyncConversation

async def run_conversation():
    client = ElevenLabs(api_key="YOUR_API_KEY")

    conversation = AsyncConversation(
        client=client,
        agent_id="your-agent-id",
        requires_auth=True,
        callback_agent_response=async_callback,
        callback_user_transcript=async_user_callback,
    )

    await conversation.start_session()

    # Send messages
    await conversation.send_user_message("Hello!")

    # Wait some time for responses
    await asyncio.sleep(5)

    # End session
    await conversation.end_session()
    conversation_id = await conversation.wait_for_session_end()
    print(f"Conversation ended: {conversation_id}")

async def async_callback(response):
    print(f"Agent: {response}")

async def async_user_callback(transcript):
    print(f"User: {transcript}")

asyncio.run(run_conversation())
```

## Pronunciation Dictionaries

Create and manage pronunciation dictionaries for consistent speech output.

```python
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

# Create a pronunciation dictionary from rules
dictionary = client.pronunciation_dictionaries.create_from_rules(
    name="Tech Terms",
    rules=[
        {"type": "alias", "string_to_replace": "API", "alias": "A P I"},
        {"type": "alias", "string_to_replace": "SDK", "alias": "S D K"},
        {"type": "phoneme", "string_to_replace": "ElevenLabs", "phoneme": "ɪˈlevənlæbz", "alphabet": "ipa"},
    ],
)

print(f"Dictionary ID: {dictionary.id}")

# Use dictionary in TTS
audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="The ElevenLabs API and SDK are powerful tools.",
    model_id="eleven_multilingual_v2",
    pronunciation_dictionary_locators=[
        {"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id}
    ],
)
```

## Voice Settings

Customize voice parameters for fine-tuned audio output.

```python
from elevenlabs import ElevenLabs
from elevenlabs.types import VoiceSettings

client = ElevenLabs(api_key="YOUR_API_KEY")

# Generate speech with custom voice settings
audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="This is a customized voice output with adjusted settings.",
    model_id="eleven_multilingual_v2",
    voice_settings=VoiceSettings(
        stability=0.5,           # 0-1: Lower = more variable, Higher = more consistent
        similarity_boost=0.75,    # 0-1: How closely to match the original voice
        style=0.0,               # 0-1: Style exaggeration (model dependent)
        use_speaker_boost=True,  # Boost speaker clarity
    ),
)
```

## User and Subscription Information

Access account and subscription details.

```python
from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

# Get user information
user = client.user.get()
print(f"User ID: {user.xi_api_key}")

# Get subscription details
subscription = client.user.get_subscription()
print(f"Tier: {subscription.tier}")
print(f"Character count: {subscription.character_count}")
print(f"Character limit: {subscription.character_limit}")
print(f"Voice limit: {subscription.voice_limit}")
```

## Error Handling

Handle common errors gracefully in your applications.

```python
from elevenlabs import ElevenLabs
from elevenlabs.errors import (
    BadRequestError,
    UnauthorizedError,
    ForbiddenError,
    NotFoundError,
    UnprocessableEntityError,
)

client = ElevenLabs(api_key="YOUR_API_KEY")

try:
    audio = client.text_to_speech.convert(
        voice_id="invalid-voice-id",
        text="Test",
        model_id="eleven_multilingual_v2",
    )
except UnauthorizedError:
    print("Invalid API key")
except NotFoundError:
    print("Voice not found")
except BadRequestError as e:
    print(f"Bad request: {e}")
except UnprocessableEntityError as e:
    print(f"Validation error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
```

## Summary

The ElevenLabs Python SDK enables seamless integration of AI-powered voice and audio capabilities into Python applications. Core use cases include text-to-speech conversion with multiple premium voice models, real-time audio streaming for interactive applications, voice cloning for personalized experiences, sound effects generation, speech transcription, and multi-language dubbing. The conversational AI module allows building sophisticated voice agents with custom tool integration for applications like customer service bots, virtual assistants, and interactive voice response systems.

Integration typically follows a simple pattern: initialize the client with an API key, select the appropriate voice and model, and call the conversion methods. For real-time applications, the streaming APIs and async client provide low-latency options. The SDK handles authentication, request formatting, and response parsing automatically, allowing developers to focus on application logic. Production deployments can leverage the comprehensive history API for auditing and the pronunciation dictionary system for domain-specific terminology. Enterprise features include zero-retention mode for privacy-sensitive applications and workspace management for team collaboration.