### Install Dependencies Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/tts/tts_autoplay/README.md Install the necessary Python packages for the example. ```bash pip install -r requirements.txt ``` -------------------------------- ### Install Voice SDK and Dependencies Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/voice/README.md Commands to set up a virtual environment, install the Voice SDK with development and smart features, and install the PyAudio package for audio handling. ```shell # Voice examples cd # Create a virtual environment python -m venv .venv # Activate : macOS / Ubuntu / Debian source .venv/bin/activate # Activate : Windows .venv\Scripts\activate # Update pip python -m pip install --upgrade pip ``` ```shell # macOS brew install portaudio # Ubuntu/Debian sudo apt-get install portaudio19-dev ``` ```shell # Voice SDK python -m pip install -e 'sdk/voice[dev,smart]' # Required audio package for examples python -m pip install pyaudio ``` -------------------------------- ### JWT Authentication Setup Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/rt/README.md Demonstrates how to use JWT for authentication with the AsyncClient. This method is recommended for minimizing API key exposure, especially in browser applications. Requires installing with the 'jwt' extra. ```python from speechmatics.rt import AsyncClient, JWTAuth # Create JWT auth (requires: pip install 'speechmatics-rt[jwt]') auth = JWTAuth("your-api-key", ttl=60) async with AsyncClient(auth=auth) as client: pass ``` -------------------------------- ### Quick Start Transcription Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/batch/README.md A simple example demonstrating how to transcribe an audio file using the AsyncClient with default settings, creating the client using environment variables for authentication. ```APIDOC ## Transcribe Audio File (Quick Start) ### Description This operation performs a basic transcription of an audio file using default configurations. It demonstrates the simplest way to get started with the client, relying on environment variables for API key authentication. ### Method `client.transcribe(audio_file)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python import asyncio from speechmatics.batch import AsyncClient async def main(): # Create a client using environment variable SPEECHMATICS_API_KEY async with AsyncClient() as client: # Simple transcription result = await client.transcribe("audio.wav") print(result.transcript_text) asyncio.run(main()) ``` ### Response #### Success Response (200) - **transcript_text** (string) - The transcribed text of the audio file. - **confidence** (float) - The confidence score of the transcription. #### Response Example ```json { "transcript_text": "This is the transcribed text.", "confidence": 0.95 } ``` ``` -------------------------------- ### Install Development Dependencies Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/tests/voice/README.md Install the necessary dependencies for development and testing. ```bash make install-dev ``` -------------------------------- ### Run TTS Streaming Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/tts/tts_autoplay/README.md Execute the Python script to start the TTS audio generation and playback. ```bash python tts_stream_example.py ``` -------------------------------- ### Microphone Transcription Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/utilities-and-helpers.md An example demonstrating real-time transcription of microphone audio using AsyncClient and Microphone. Handles session start, audio sending, and transcript reception. ```python import asyncio from speechmatics.rt import ( AsyncClient, Microphone, AudioFormat, AudioEncoding, TranscriptionConfig, ServerMessageType, TranscriptResult ) async def microphone_transcription(): client = AsyncClient(api_key="your-api-key") mic = Microphone(sample_rate=16000, chunk_size=320) @client.on(ServerMessageType.ADD_TRANSCRIPT) def on_final(message): result = TranscriptResult.from_message(message) print(f"Final: {result.metadata.transcript}") if not mic.start(): print("PyAudio not available") return try: await client.start_session( transcription_config=TranscriptionConfig(language="en"), audio_format=AudioFormat( encoding=AudioEncoding.PCM_S16LE, sample_rate=16000 ) ) print("Speak now...") while True: chunk = await mic.read(320) await client.send_audio(chunk) except KeyboardInterrupt: print("\nStopping...") finally: mic.stop() await client.close() asyncio.run(microphone_transcription()) ``` -------------------------------- ### Quick Start Transcription Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/batch/README.md A simple example to transcribe an audio file using default credentials (environment variable SPEECHMATICS_API_KEY). ```python import asyncio from speechmatics.batch import AsyncClient async def main(): # Create a client using environment variable SPEECHMATICS_API_KEY async with AsyncClient() as client: # Simple transcription result = await client.transcribe("audio.wav") print(result.transcript_text) asyncio.run(main()) ``` -------------------------------- ### Install Speechmatics Voice SDK Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Use pip to install the SDK. Install with '[smart]' for VAD and SMART_TURN features. ```bash pip install speechmatics-voice ``` ```bash pip install speechmatics-voice[smart] ``` -------------------------------- ### SPEAKER_STARTED Payload Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Example JSON payload for the SPEAKER_STARTED event, indicating the speaker ID and the time they started speaking. ```json { "message": "SpeakerStarted", "is_active": true, "speaker_id": "S1", "time": 1.28 } ``` -------------------------------- ### Install Git LFS Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/tests/voice/README.md Install Git LFS for audio file tests on Linux or macOS. ```bash sudo apt install git-lfs ``` ```bash brew install git-lfs ``` -------------------------------- ### RECOGNITION_STARTED Payload Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Example JSON payload for the RECOGNITION_STARTED event, detailing session information and language pack details. ```json { "message": "RecognitionStarted", "id": "a8779b0b-a238-43de-8211-c70f5fcbe191", "orchestrator_version": "2025.08.29127+289170c022.HEAD", "language_pack_info": { "language_description": "English", "word_delimiter": " ", "writing_direction": "left-to-right", "itn": true, "adapted": false } } ``` -------------------------------- ### Realtime Streaming Transcription Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md This example demonstrates live microphone transcription using the real-time streaming service. It handles both final and partial transcriptions. Ensure 'speechmatics-rt', 'python-dotenv', and 'pyaudio' are installed. ```python import asyncio import os from dotenv import load_dotenv from speechmatics.rt import ( AsyncClient, ServerMessageType, TranscriptionConfig, TranscriptResult, AudioFormat, AudioEncoding, Microphone, ) load_dotenv() CHUNK_SIZE = 4096 async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) mic = Microphone(sample_rate=16000, chunk_size=CHUNK_SIZE) @client.on(ServerMessageType.ADD_TRANSCRIPT) def on_final(message): result = TranscriptResult.from_message(message) if result.metadata.transcript: print(f"[final]: {result.metadata.transcript}") @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT) def on_partial(message): result = TranscriptResult.from_message(message) if result.metadata.transcript: print(f"[partial]: {result.metadata.transcript}") mic.start() try: await client.start_session( transcription_config=TranscriptionConfig(language="en", enable_partials=True), audio_format=AudioFormat(encoding=AudioEncoding.PCM_S16LE, sample_rate=16000), ) print("Speak now...") while True: await client.send_audio(await mic.read(CHUNK_SIZE)) finally: mic.stop() await client.close() asyncio.run(main()) ``` ```bash pip install speechmatics-rt python-dotenv pyaudio ``` -------------------------------- ### Quick Start: Single-Stream Transcription Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/rt/README.md Basic example demonstrating how to create an async client, register a transcript handler, and transcribe an audio file. The client uses the SPEECHMATICS_API_KEY environment variable for authentication. ```python import asyncio from speechmatics.rt import AsyncClient, ServerMessageType async def main(): # Create a client using environment variable SPEECHMATICS_API_KEY async with AsyncClient() as client: # Register event handlers @client.on(ServerMessageType.ADD_TRANSCRIPT) def handle_final_transcript(msg): print(f"Final: {msg['metadata']['transcript']}") # Transcribe audio file with open("audio.wav", "rb") as audio_file: await client.transcribe(audio_file) # Run the async function asyncio.run(main()) ``` -------------------------------- ### Install speechmatics-batch SDK Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/batch/MIGRATION.md Install the new Speechmatics Batch SDK using pip. ```bash pip install speechmatics-batch ``` -------------------------------- ### Example NotificationConfig Usage Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/batch-models.md Demonstrates how to instantiate and configure NotificationConfig for sending job completion notifications. ```python from speechmatics.batch import NotificationConfig, NotificationContents, NotificationMethod config = NotificationConfig( url="https://example.com/webhook", method=NotificationMethod.POST, contents=[NotificationContents.TRANSCRIPT, NotificationContents.SENTIMENT], auth_headers=["Authorization: Bearer token"] ) ``` -------------------------------- ### Install speechmatics-flow Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/flow/README.md Install the Speechmatics Flow API client using pip. ```bash pip install speechmatics-flow ``` -------------------------------- ### Initialize VoiceAgentClient with Preset Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md Example of initializing the VoiceAgentClient using a predefined preset like 'adaptive'. Ensure to replace 'your-api-key' with your actual API key. ```python from speechmatics.voice import VoiceAgentClient, VoiceAgentConfigPreset # Using a preset async with VoiceAgentClient( api_key="your-api-key", preset="adaptive" ) as client: await client.connect() # Use client... ``` -------------------------------- ### Install Speechmatics Real-time SDK and PyAudio Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md Install packages for real-time audio streaming. PyAudio is required for microphone access. ```bash # Real-time streaming pip install speechmatics-rt pyaudio ``` -------------------------------- ### Voice Agent with Speaker Diarization Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md This example sets up a voice agent for real-time transcription with speaker diarization and turn detection. It prints segments with speaker IDs. Ensure 'speechmatics-voice', 'speechmatics-rt', 'python-dotenv', and 'pyaudio' are installed. ```python import asyncio import os from dotenv import load_dotenv from speechmatics.rt import Microphone from speechmatics.voice import VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType load_dotenv() async def main(): client = VoiceAgentClient( api_key=os.getenv("SPEECHMATICS_API_KEY"), config=VoiceAgentConfigPreset.load("adaptive") ) @client.on(AgentServerMessageType.ADD_SEGMENT) def on_segment(message): for segment in message.get("segments", []): print(f"[{segment.get('speaker_id', 'S1')}]: {segment.get('text', '')}") @client.on(AgentServerMessageType.END_OF_TURN) def on_turn_end(message): print("[END OF TURN]") mic = Microphone(sample_rate=16000, chunk_size=320) mic.start() try: await client.connect() print("Voice agent ready. Speak now...") while True: await client.send_audio(await mic.read(320)) finally: mic.stop() await client.disconnect() asyncio.run(main()) ``` ```bash pip install speechmatics-voice speechmatics-rt python-dotenv pyaudio ``` -------------------------------- ### Install Speechmatics Text-to-Speech (TTS) SDK Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md Install the package for text-to-speech functionality. ```bash # Text-to-speech pip install speechmatics-tts ``` -------------------------------- ### Install Speechmatics Batch SDK Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Install the necessary packages for using the Speechmatics batch transcription client and loading environment variables. ```bash pip install speechmatics-batch python-dotenv ``` -------------------------------- ### Install Pre-commit Hooks Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Install pre-commit hooks to ensure code quality and consistency before committing changes. ```bash pre-commit install ``` -------------------------------- ### Start Microphone Capture Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/utilities-and-helpers.md Starts capturing audio from the microphone. Returns True on success, False if PyAudio is not available. ```python mic = Microphone(sample_rate=16000, chunk_size=320) if not mic.start(): print("PyAudio not installed") exit(1) ``` -------------------------------- ### Microphone Initialization Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/utilities-and-helpers.md Initializes microphone input with configurable sample rate, chunk size, and device index. Ensure PyAudio is installed. ```python class Microphone: """Wrapper around PyAudio for microphone input.""" def __init__( self, sample_rate: int = 16000, chunk_size: int = 4096, device_index: Optional[int] = None ) -> None: """ Initialize microphone input device. Args: sample_rate: Sample rate in Hz (16000 recommended for Speechmatics). chunk_size: Bytes to read per chunk. device_index: PyAudio device index (None for default). """ ``` -------------------------------- ### Basic Transcription with AsyncClient Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/batch-client.md Shows a simple example of transcribing an audio file using the AsyncClient and waiting for the transcription to complete. ```python import asyncio from speechmatics.batch import AsyncClient, TranscriptionConfig async def transcribe(): async with AsyncClient(api_key="key") as client: job = await client.submit_job( "audio.wav", transcription_config=TranscriptionConfig(language="en") ) result = await client.wait_for_completion(job.id) return result.transcript_text print(asyncio.run(transcribe())) ``` -------------------------------- ### Install Speechmatics Batch SDK Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md Install the package for batch transcription if you need to process audio files. ```bash # Batch transcription pip install speechmatics-batch ``` -------------------------------- ### Basic Batch Transcription Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/INDEX.md An asynchronous Python example demonstrating how to submit an audio file for transcription, wait for its completion, and retrieve the transcript text. Ensure you have an API key and an audio file named 'audio.wav'. ```python import asyncio from speechmatics.batch import AsyncClient, TranscriptionConfig async def transcribe(): async with AsyncClient(api_key="your-api-key") as client: # Submit job job = await client.submit_job( "audio.wav", transcription_config=TranscriptionConfig(language="en") ) # Wait for completion result = await client.wait_for_completion(job.id) # Get result print(result.transcript_text) asyncio.run(transcribe()) ``` -------------------------------- ### Context Manager Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/tts-client.md Demonstrates how to use the AsyncClient within an asynchronous context manager for automatic resource cleanup. ```APIDOC ## Context Manager ```python async def context_manager_example(): async with AsyncClient(api_key="key") as client: response = await client.generate( text="Automatic cleanup on exit", voice="THEO" ) audio = await response.read() # Client is automatically closed after exiting the with block ``` ``` -------------------------------- ### Install Speechmatics Voice SDK Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md Install the base package for voice agents. Use the optional 'smart' extra for advanced ML features. ```bash # Voice agents pip install speechmatics-voice pip install speechmatics-voice[smart] # For ML-based features ``` -------------------------------- ### connect() Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md Establishes a WebSocket connection and starts an agent session. It handles potential connection, session, and authentication errors. ```APIDOC ## connect() ### Description Establishes a WebSocket connection and starts an agent session. ### Method `async def connect() -> None` ### Raises: - `ConnectionError`: WebSocket connection failed - `SessionError`: Session initialization failed - `AuthenticationError`: Invalid API key ### Example: ```python client = VoiceAgentClient(api_key="key", preset="adaptive") try: await client.connect() print("Agent ready for audio") # Send audio... finally: await client.disconnect() ``` ``` -------------------------------- ### Microphone Transcription Quick Start Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/voice/cli/README.md Initiates real-time transcription from the microphone. Use -p for pretty-printed output. ```bash python cli.py -k YOUR_API_KEY -p ``` -------------------------------- ### Install Speechmatics Realtime Client Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md Install the Speechmatics Realtime client package for live, real-time audio transcription via WebSockets. Use this for streaming transcription needs. ```bash pip install speechmatics-rt ``` -------------------------------- ### Install Speechmatics Voice with Smart Features Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/utilities-and-helpers.md Install the speechmatics-voice package with the 'smart' extra to include smart turn detection capabilities. ```bash pip install speechmatics-voice[smart] ``` -------------------------------- ### Install Speechmatics TTS Client Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md Install the Speechmatics TTS client package for text-to-speech generation. Use this when you need to convert text into spoken audio. ```bash pip install speechmatics-tts ``` -------------------------------- ### Install Speechmatics Real-time SDK Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Install the required packages for the Speechmatics real-time transcription client, including PyAudio for microphone input and python-dotenv for environment variables. ```bash pip install speechmatics-rt python-dotenv pyaudio ``` -------------------------------- ### Migrate JWT Authentication: Before Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/rt/MIGRATION.md Example of configuring JWT authentication using ConnectionConfig in the legacy client. ```python from speechmatics.client import WebsocketClient from speechmatics.models import TranscriptionConfig, ConnectionConfig conn_config = ConnectionConfig(auth_token="API-KEY", generate_temp_token=True) client = WebsocketClient(conn_config) conf = TranscriptionConfig(language="en") await client.run(audio_stream, conf) ``` -------------------------------- ### Install Speechmatics Voice SDK with Smart Features Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/utilities-and-helpers.md Installs the speechmatics-voice package along with necessary ML libraries like PyTorch and ONNX for advanced features. ```bash pip install speechmatics-voice[smart] # Installs: torch, onnx, torchaudio, etc. ``` -------------------------------- ### Install Speechmatics Voice Agent Client Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md Install the Speechmatics Voice client package for building conversational AI applications and voice agents. Use this for interactive voice experiences. ```bash pip install speechmatics-voice ``` -------------------------------- ### Install Speechmatics Voice, RT, Python Dotenv, and PyAudio Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Command to install packages required for real-time voice agent applications, including the voice client, real-time components, environment variable loading, and microphone support. ```bash pip install speechmatics-voice speechmatics-rt python-dotenv pyaudio ``` -------------------------------- ### Environment Variables for Configuration Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/flow/MIGRATION.md Demonstrates how to configure the SDK using environment variables for API key and Flow URL, and then initializing the client without explicit parameters. ```bash # Set environment variables export SPEECHMATICS_API_KEY=your-api-key export SPEECHMATICS_FLOW_URL=wss://flow.api.speechmatics.com/v1/flow ``` ```python # Use without explicit configuration from speechmatics.flow import AsyncClient async with AsyncClient() as client: await client.start_conversation(audio_stream) ``` -------------------------------- ### Migrate Basic Client Usage: After Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/rt/MIGRATION.md Demonstrates the new async context manager pattern with AsyncClient for transcription. ```python from speechmatics.rt import AsyncClient, TranscriptionConfig async with AsyncClient("API-KEY") as client: conf = TranscriptionConfig(language="en") await client.transcribe(audio_stream, transcription_config=conf) ``` -------------------------------- ### Show Complete Configuration from Preset Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/voice/cli/README.md Displays the full configuration details loaded from a specified preset. Use this to review all available settings. ```bash python cli.py -P scribe -W ``` -------------------------------- ### Instantiate TranscriptionConfig Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/batch-models.md Example of creating a `TranscriptionConfig` object with specific settings for language, model, entity recognition, diarization, and custom vocabulary. Ensure necessary imports are included. ```python from speechmatics.batch import TranscriptionConfig, Model config = TranscriptionConfig( language="en", model=Model.ENHANCED, enable_entities=True, diarization="speaker", additional_vocab=[ {"content": "Kubernetes", "sounds_like": ["koo ber net ees"]}, {"content": "API"} ] ) ``` -------------------------------- ### Basic Microphone Transcription with Preset Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/voice/cli/README.md Starts transcription from the default microphone using a specified preset and API key. Requires either a preset or a custom config. ```bash python cli.py -k YOUR_KEY -P adaptive -p ``` -------------------------------- ### Deploy Speechmatics Realtime on Kubernetes Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Install the Speechmatics realtime chart on Kubernetes using Helm. This command configures the chart and sets the ingress URL for external access. ```bash # Install the sm-realtime chart helm upgrade --install speechmatics-realtime \ oci://speechmaticspublic.azurecr.io/sm-charts/sm-realtime \ --version 0.7.0 \ --set proxy.ingress.url="speechmatics.example.com" ``` -------------------------------- ### Text-to-Speech Generation Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Convert text into audio using the Text-to-Speech service. This snippet saves the generated audio to an 'output.wav' file. Ensure 'speechmatics-tts' and 'python-dotenv' are installed. ```python import asyncio import os from dotenv import load_dotenv from speechmatics.tts import AsyncClient, Voice, OutputFormat load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) response = await client.generate( text="Hello! Welcome to Speechmatics Text-to-Speech", voice=Voice.SARAH, output_format=OutputFormat.WAV_16000 ) audio_data = await response.read() with open("output.wav", "wb") as f: f.write(audio_data) print("Audio saved to output.wav") await client.close() asyncio.run(main()) ``` ```bash pip install speechmatics-tts python-dotenv ``` -------------------------------- ### Set Up Python Virtual Environment Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Create and activate a Python virtual environment for the Speechmatics SDK development. ```bash python -m venv .venv .venv\Scripts\activate ``` ```bash source .venv/bin/activate ``` -------------------------------- ### Configure Voice Agent Client with Presets Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Demonstrates creating a VoiceAgentClient using various predefined presets for different use cases like low latency, conversation, or note-taking. Use `VoiceAgentConfigPreset.list_presets()` to see all available options. ```python # Low latency preset - for fast responses (may split speech in to smaller segments) client = VoiceAgentClient(api_key=api_key, preset="fast") # Conversation preset - for natural dialogue client = VoiceAgentClient(api_key=api_key, preset="adaptive") # Advanced conversation with ML turn detection client = VoiceAgentClient(api_key=api_key, preset="smart_turn") # External end of turn preset - endpointing handled by the client client = VoiceAgentClient(api_key=api_key, preset="external") # Scribe preset - for note-taking client = VoiceAgentClient(api_key=api_key, preset="scribe") # Captions preset - for live captioning client = VoiceAgentClient(api_key=api_key, preset="captions") # To view all available presets, use: presets = VoiceAgentConfigPreset.list_presets() ``` -------------------------------- ### Batch Transcription Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/README.md Use this snippet to transcribe audio files using the batch transcription service. Ensure your API key is loaded from a .env file and the 'speechmatics-batch' and 'python-dotenv' libraries are installed. ```python import asyncio import os from dotenv import load_dotenv from speechmatics.batch import AsyncClient load_dotenv() async def main(): client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) result = await client.transcribe("audio.wav") print(result.transcript_text) await client.close() asyncio.run(main()) ``` ```bash pip install speechmatics-batch python-dotenv ``` -------------------------------- ### Migrate Basic Usage: Before Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/flow/MIGRATION.md Illustrates the old method of initializing and running the WebsocketClient for basic audio processing. ```python from speechmatics_flow.client import WebsocketClient from speechmatics_flow.models import ( ConnectionSettings, Interaction, AudioSettings, ConversationConfig, ) client = WebsocketClient( ConnectionSettings( url="wss://flow.api.speechmatics.com/v1/flow", auth_token="API-KEY", ) ) await client.run( interactions=[Interaction(audio_stream)], audio_settings=AudioSettings(), conversation_config=ConversationConfig(), ) ``` -------------------------------- ### Migrate Basic Usage: After Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/flow/MIGRATION.md Shows the new approach using AsyncClient for initiating a conversation with audio stream, audio format, and conversation configuration. ```python from speechmatics.flow import AsyncClient, AudioFormat, ConversationConfig async with AsyncClient("API-KEY") as client: audio_format = AudioFormat() conversation_config = ConversationConfig() await client.start_conversation( audio_stream, audio_format=audio_format, conversation_config=conversation_config, ) ``` -------------------------------- ### Instantiate Voice Agent Presets Directly Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md Alternatively, instantiate preset configurations directly using their class methods. ```python from speechmatics.voice import VoiceAgentConfigPreset adaptive = VoiceAgentConfigPreset.ADAPTIVE() scribe = VoiceAgentConfigPreset.SCRIBE() ``` -------------------------------- ### Configure Voice Agent Client with Presets and Overlays Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Illustrates using a preset as a base and then customizing it with specific overrides using a `VoiceAgentConfig` object. This allows for a balance between predefined configurations and custom adjustments. ```python from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig # Use preset with custom overrides config = VoiceAgentConfigPreset.SCRIBE( VoiceAgentConfig( language="es", max_delay=0.8 ) ) ``` -------------------------------- ### END_OF_TURN Payload Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Example JSON payload for the END_OF_TURN event, including the turn ID and start/end times. ```json { "message": "EndOfTurn", "turn_id": 0, "metadata": { "start_time": 1.28, "end_time": 8.04 } } ``` -------------------------------- ### Quick Start: Generate Speech and Save to WAV Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/tts/README.md Generate speech data from text and save it to a WAV file using the async client. Ensure you have the necessary imports and run the async main function. ```python import asyncio from speechmatics.tts import AsyncClient, Voice, OutputFormat # Generate speech data from text and save to WAV file async def main(): async with AsyncClient() as client: async with await client.generate( text="Welcome to the future of voice AI!", voice=Voice.SARAH, output_format=OutputFormat.WAV_16000 ) as response: audio = b''.join([chunk async for chunk in response.content.iter_chunked(1024)]) with open("output.wav", "wb") as f: f.write(audio) # Run the async main function if __name__ == "__main__": asyncio.run(main()) ``` -------------------------------- ### SPEAKER_ENDED Payload Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Example JSON payload for the SPEAKER_ENDED event, indicating the speaker ID and the time they stopped speaking. ```json { "message": "SpeakerEnded", "is_active": false, "speaker_id": "S1", "time": 2.64 } ``` -------------------------------- ### Show Compact Configuration from Preset Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/voice/cli/README.md Displays a compact view of the configuration derived from a specified preset. Useful for quickly checking key settings. ```bash python cli.py -P scribe -w ``` -------------------------------- ### Migrate Basic Client Usage: Before Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/rt/MIGRATION.md Shows the basic client instantiation and transcription call using the legacy WebsocketClient. ```python from speechmatics.client import WebsocketClient from speechmatics.models import TranscriptionConfig client = WebsocketClient("API-KEY") conf = TranscriptionConfig(language="en") await client.run(audio_stream, conf) ``` -------------------------------- ### Troubleshoot 'ModuleNotFoundError: No module named 'pyaudio'' Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/00-START-HERE.md If you encounter this error, it means the PyAudio library is not installed. Install it using pip. ```bash pip install pyaudio ``` -------------------------------- ### Basic Voice Agent Usage Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md A fundamental example demonstrating how to initialize the Voice Agent Client, connect to the service, process audio chunks from a microphone, and handle incoming messages for segment and turn-end events. ```python import asyncio from speechmatics.voice import VoiceAgentClient, AgentServerMessageType from speechmatics.rt import Microphone async def simple_agent(): client = VoiceAgentClient( api_key="your-api-key", preset="adaptive" ) @client.on(AgentServerMessageType.ADD_SEGMENT) def on_segment(message): for segment in message.get("segments", []): speaker = segment.get("speaker_id", "S1") text = segment.get("text", "") print(f"[{speaker}]: {text}") @client.on(AgentServerMessageType.END_OF_TURN) def on_turn_end(message): print(">>> Your turn to respond!") mic = Microphone(sample_rate=16000, chunk_size=320) mic.start() try: await client.connect() print("Agent ready. Speak now...") while True: chunk = await mic.read(320) await client.send_audio(chunk) except KeyboardInterrupt: print("\nStopping...") finally: mic.stop() await client.disconnect() asyncio.run(simple_agent()) ``` -------------------------------- ### ADD_PARTIAL_SEGMENT Payload Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Example JSON payload for the ADD_PARTIAL_SEGMENT event, showing interim transcription segments with speaker information and timestamps. ```json { "message": "AddPartialSegment", "segments": [ { "speaker_id": "S1", "is_active": true, "timestamp": "2025-11-11T23:18:37.189+00:00", "language": "en", "text": "Welcome to", "metadata": { "start_time": 1.28, "end_time": 1.6 } } ], "metadata": { "start_time": 1.28, "end_time": 1.6, "processing_time": 0.307 } } ``` -------------------------------- ### ADD_SEGMENT Payload Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Example JSON payload for the ADD_SEGMENT event, showing finalized transcription segments with speaker, text, and timing details. ```json { "message": "AddSegment", "segments": [ { "speaker_id": "S1", "is_active": true, "timestamp": "2025-11-11T23:18:37.189+00:00", "language": "en", "text": "Welcome to Speechmatics.", "metadata": { "start_time": 1.28, "end_time": 8.04 } } ], "metadata": { "start_time": 1.28, "end_time": 8.04, "processing_time": 0.187 } } ``` -------------------------------- ### Custom Vocabulary JSON Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/examples/voice/scribe/README.md Create a vocab.json file to improve accuracy for domain-specific terms. This example includes 'Speechmatics' with a 'sounds_like' pronunciation and 'API'. ```json [ { "content": "Speechmatics", "sounds_like": ["speech matics"] }, { "content": "API" } ] ``` -------------------------------- ### Initialize VoiceAgentClient with Custom Configuration Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md Example of initializing the VoiceAgentClient with a custom VoiceAgentConfig, enabling features like diarization and smart turn detection. Replace 'key' with your actual API key and configure 'smart_turn_config' as needed. ```python from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig config = VoiceAgentConfig( language="en", enable_diarization=True, smart_turn_config={"..."} ) async with VoiceAgentClient(api_key="key", config=config) as client: await client.connect() ``` -------------------------------- ### Error Handling Example Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/batch-client.md Provides an example of how to handle various errors that may occur during the transcription process, including authentication, batch, job, and timeout errors. ```APIDOC ## Error Handling ```python from speechmatics.batch import ( AsyncClient, TranscriptionConfig, AuthenticationError, BatchError, JobError, TimeoutError as SmError ) async def transcribe_with_errors(): client = AsyncClient(api_key="key") try: job = await client.submit_job("audio.wav") result = await client.wait_for_completion(job.id, timeout=300) print(result.transcript_text) except AuthenticationError as e: print(f"Auth failed: {e}") except BatchError as e: print(f"Batch error: {e}") except JobError as e: print(f"Job error: {e}") except SmError as e: print(f"Timeout waiting for job") finally: await client.close() ``` ``` -------------------------------- ### Starting a Real-time Transcription Session Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/rt-client.md Initiate a WebSocket connection and start a transcription session with optional configuration for language, partial transcripts, diarization, and audio format. ```python from speechmatics.rt import ( AsyncClient, TranscriptionConfig, AudioFormat, AudioEncoding ) async def stream_audio(): client = AsyncClient(api_key="key") # Start session with microphone audio await client.start_session( transcription_config=TranscriptionConfig( language="en", enable_partials=True, diarization="speaker" ), audio_format=AudioFormat( encoding=AudioEncoding.PCM_S16LE, sample_rate=16000 ) ) # Now send audio... await client.close() ``` -------------------------------- ### Configure Voice Agent Client with Custom Settings Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/voice/README.md Shows how to create a VoiceAgentClient with a custom `VoiceAgentConfig` object, allowing fine-grained control over parameters like language, diarization, and end-of-utterance detection. ```python from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, EndOfUtteranceMode # Define your custom configuration config = VoiceAgentConfig( language="en", enable_diarization=True, max_delay=0.7, end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE, ) client = VoiceAgentClient(api_key=api_key, config=config) ``` -------------------------------- ### Install Speechmatics SDK Dependencies Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/INDEX.md Install the necessary packages for different Speechmatics SDK functionalities using pip. Includes batch, real-time, TTS, and voice agent modules. ```bash # Batch pip install speechmatics-batch # Real-time pip install speechmatics-rt pyaudio # TTS pip install speechmatics-tts # Voice agents pip install speechmatics-voice pip install speechmatics-voice[smart] # For smart turn detection ``` -------------------------------- ### Initialize AsyncClient with API Key Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/batch-client.md Instantiate the AsyncClient using an API key. The client will use the SPEECHMATICS_API_KEY environment variable if no key is provided. ```python import asyncio from speechmatics.batch import AsyncClient async def main(): # Using API key client = AsyncClient(api_key="your-api-key") # Or using authentication object from speechmatics.batch import JWTAuth auth = JWTAuth(api_key="your-key", ttl=3600) client = AsyncClient(auth=auth) await client.close() asyncio.run(main()) ``` -------------------------------- ### Build a Voice Agent Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/INDEX.md Create a real-time voice agent using VoiceAgentClient. This example streams audio from a microphone, prints transcribed segments, and indicates when it's the user's turn. Requires an API key and an 'adaptive' preset. ```python import asyncio from speechmatics.voice import ( VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType ) from speechmatics.rt import Microphone async def voice_agent(): client = VoiceAgentClient( api_key="your-api-key", preset="adaptive" ) @client.on(AgentServerMessageType.ADD_SEGMENT) def on_segment(message): for segment in message.get("segments", []): speaker = segment.get("speaker_id", "S1") text = segment.get("text", "") print(f"[{speaker}]: {text}") @client.on(AgentServerMessageType.END_OF_TURN) def on_turn_end(message): print("[Your turn!]") mic = Microphone(sample_rate=16000, chunk_size=320) mic.start() try: await client.connect() print("Agent ready...") while True: chunk = await mic.read(320) await client.send_audio(chunk) finally: mic.stop() await client.disconnect() asyncio.run(voice_agent()) ``` -------------------------------- ### Microphone Input Handling Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/INDEX.md Utilize the Microphone class to list available audio input devices, initialize a microphone with specific settings, and read audio data in chunks. Ensure to start and stop the microphone appropriately. ```python from speechmatics.rt import Microphone # List devices devices = Microphone.list_devices() # Use device mic = Microphone(sample_rate=16000, device_index=0) mic.start() chunk = await mic.read(320) mic.stop() ``` -------------------------------- ### SmartTurnConfig Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md Configuration for ML-based turn detection. Requires the `speechmatics-voice[smart]` extra to be installed. ```APIDOC ## SmartTurnConfig Configuration for ML-based turn detection. Requires the `speechmatics-voice[smart]` extra to be installed. ### Fields - **enabled** (bool) - Default: `False` - Whether smart turn detection is enabled. - **model_name** (str) - Default: `None` - The name of the smart turn model to use. - **minimum_listening_time** (float) - Default: `None` - The minimum listening time before a turn is considered complete. ``` -------------------------------- ### Voice Agents: Connect Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/INDEX.md Establishes a connection to the Voice Agent service and starts a session. ```APIDOC ## connect() ### Description Establish connection and start session for the Voice Agent. ### Method `connect()` ``` -------------------------------- ### Complete Realtime Transcription Pipeline Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/utilities-and-helpers.md This example demonstrates a full real-time transcription pipeline, including microphone input, session configuration, audio streaming, and processing of transcript results. It handles `KeyboardInterrupt` for graceful shutdown and outputs the final transcripts. ```python import asyncio from speechmatics.rt import ( AsyncClient, Microphone, AudioFormat, AudioEncoding, TranscriptionConfig, ServerMessageType, TranscriptResult ) async def realtime_pipeline(): client = AsyncClient(api_key="key") mic = Microphone(sample_rate=16000, chunk_size=320) transcripts = [] @client.on(ServerMessageType.ADD_TRANSCRIPT) def on_final(message): result = TranscriptResult.from_message(message) transcripts.append({ "speaker": result.metadata.speaker or "Unknown", "text": result.metadata.transcript, "confidence": result.metadata.confidence, "start": result.metadata.start_time, "end": result.metadata.end_time }) if not mic.start(): raise RuntimeError("PyAudio not available") try: await client.start_session( transcription_config=TranscriptionConfig( language="en", diarization="speaker" ), audio_format=AudioFormat( encoding=AudioEncoding.PCM_S16LE, sample_rate=16000 ) ) print("Recording... (Ctrl+C to stop)") while True: chunk = await mic.read(320) await client.send_audio(chunk) except KeyboardInterrupt: print("\nProcessing final results...") finally: mic.stop() await client.close() # Output results for item in transcripts: print(f"[{item['speaker']} {item['start']:.1f}s]: {item['text']}") asyncio.run(realtime_pipeline()) ``` -------------------------------- ### Get Job Info Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/INDEX.md Fetches the current status and details of a specific transcription job. ```APIDOC ## get_job_info() ### Description Retrieves the current status and metadata for a given job. ### Method `get_job_info(job_id: str)` ### Parameters #### Path Parameters - **job_id** (str) - Required - The ID of the job to query. ### Returns - **JobDetails** - An object containing the job's ID, status, and other relevant information. ``` -------------------------------- ### JWT Authentication for AsyncClient Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/sdk/flow/README.md Configure JWT authentication for the AsyncClient. Requires 'speechmatics-flow[jwt]' to be installed. ```python from speechmatics.flow import AsyncClient, JWTAuth # Create JWT auth (requires: pip install 'speechmatics-flow[jwt]') auth = JWTAuth("your-api-key", ttl=60) async with AsyncClient(auth=auth) as client: pass ``` -------------------------------- ### Get Transcript Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/INDEX.md Retrieves the transcript for a job that has already completed. Supports various output formats. ```APIDOC ## get_transcript() ### Description Retrieves the transcript for a completed job. The format of the returned transcript depends on the job's output settings. ### Method `get_transcript(job_id: str, output_format: str = 'json')` ### Parameters #### Path Parameters - **job_id** (str) - Required - The ID of the completed job. - **output_format** (str) - Optional - The desired format for the transcript (e.g., 'json', 'txt', 'srt'). Defaults to 'json'. ### Returns - **Transcript or str** - The transcript data, either as a `Transcript` object (for JSON) or a string (for TXT/SRT). ``` -------------------------------- ### start_session Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/rt-client.md Establish a WebSocket connection and begin a transcription session with optional configuration for transcription and audio format. ```APIDOC ## start_session() Establish a WebSocket connection and begin transcription session. ```python async def start_session( transcription_config: Optional[TranscriptionConfig] = None, audio_format: Optional[AudioFormat] = None, *, wait_for_init: bool = True ) -> None ``` | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `transcription_config` | `TranscriptionConfig` | ✗ | `TranscriptionConfig(language="en")` | Transcription settings. | | `audio_format` | `AudioFormat` | ✗ | `AudioFormat()` for file mode | Audio encoding/sample rate specification. | | `wait_for_init` | `bool` | ✗ | `True` | Wait for server to confirm RecognitionStarted message. | **Raises:** - `ConnectionError`: WebSocket connection failed - `SessionError`: Session initialization failed - `AuthenticationError`: Invalid API key/JWT - `ConfigurationError`: Invalid configuration **Example:** ```python from speechmatics.rt import ( AsyncClient, TranscriptionConfig, AudioFormat, AudioEncoding ) async def stream_audio(): client = AsyncClient(api_key="key") # Start session with microphone audio await client.start_session( transcription_config=TranscriptionConfig( language="en", enable_partials=True, diarization="speaker" ), audio_format=AudioFormat( encoding=AudioEncoding.PCM_S16LE, sample_rate=16000 ) ) # Now send audio... await client.close() ``` ``` -------------------------------- ### SmartTurnConfig Dataclass Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md Configuration for enabling ML-based turn detection. Requires the 'speechmatics-voice[smart]' extra to be installed. ```python @dataclass class SmartTurnConfig: enabled: bool = False model_name: Optional[str] = None minimum_listening_time: Optional[float] = None ``` -------------------------------- ### FastAPI Integration Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/tts-client.md Provides an example of integrating the TTS client with FastAPI to create an audio streaming endpoint. ```APIDOC ### With FastAPI ```python from fastapi import FastAPI from fastapi.responses import StreamingResponse from speechmatics.tts import AsyncClient, Voice app = FastAPI() @app.post("/speak/{voice}") async def text_to_speech(text: str, voice: str): client = AsyncClient(api_key="key") try: response = await client.generate( text=text, voice=Voice[voice.upper()] ) return StreamingResponse( response.iter_chunks(), media_type="audio/wav" ) finally: await client.close() ``` ``` -------------------------------- ### Receive Real-time Session Metrics Source: https://github.com/speechmatics/speechmatics-python-sdk/blob/main/_autodocs/voice-agent-client.md Example of how to subscribe to and process real-time session metrics from the Voice Agent Client. This includes total audio processed and the number of segments transcribed. Assumes a 'client' object is already initialized. ```python from speechmatics.voice import AgentServerMessageType @client.on(AgentServerMessageType.SESSION_METRICS) def on_metrics(message): metrics = message.get("metrics", {}) print(f"Audio processed: {metrics.get('audio_duration_sec')}") print(f"Segments: {metrics.get('segment_count')}") ```