### Reply Started Message Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

This JSON object indicates that the agent has started generating a reply. It includes a unique reply ID.

```json
{
  "type": "reply.started",
  "reply_id": "reply_abc123"
}
```

--------------------------------

### Complete Response Example

Source: https://www.assemblyai.com/docs/api-reference/llm-gateway/create-speech-understanding

This is a comprehensive example of a successful response from the speech understanding API, including translation and speaker identification details.

```yaml
complete_response:
  summary: Complete response example
  value:
    speech_understanding:
      request:
        translation:
          target_languages:
            - es
            - de
          formal: true
          match_original_utterance: true
        speaker_identification:
          speaker_type: name
          speakers:
            - name: Michel Martin
```

--------------------------------

### Reply Done Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Example payload indicating the agent has finished speaking. The status can be 'completed' or 'interrupted' if the user barged in.

```json
{
  "type": "reply.done"
}
```

--------------------------------

### Chat Completion with Structured Output (JSON Schema)

Source: https://www.assemblyai.com/docs/api-reference/llm-gateway/create-chat-completion

Use this example to guide the model to return output in a specific JSON format defined by a JSON schema. This is ideal for extracting structured data or ensuring consistent response formats. The `strict: true` option enforces the schema.

```yaml
model: gemini-2.5-flash-lite
messages:
  - role: system
    content: >-
      You are a helpful math tutor. Guide the user through the
      solution step by step.
  - role: user
    content: how can I solve 8x + 7 = -23
response_format:
  type: json_schema
  json_schema:
    name: math_reasoning
    schema:
      type: object
      properties:
        steps:
          type: array
          items:
            type: object
            properties:
              explanation:
                type: string
              output:
                type: string
            required:
              - explanation
              - output
          additionalProperties: false
        final_answer:
          type: string
      required:
        - steps
        - final_answer
    additionalProperties: false
  strict: true
```

--------------------------------

### Authenticated Request Example

Source: https://www.assemblyai.com/docs/api-reference/overview

Example of how to make an authenticated request to the AssemblyAI API using cURL. Replace '<YOUR_API_KEY>' with your actual API key.

```bash
curl https://api.assemblyai.com/v2/transcript \
  --header 'Authorization: <YOUR_API_KEY>'
```

--------------------------------

### Receive session begins

Source: https://www.assemblyai.com/docs/api-reference/streaming-api/universal-streaming

Receive confirmation that the streaming session has successfully started.

```APIDOC
## Receive session begins

### Description
Receive confirmation that the streaming session has successfully started.

### Type
Receive

### Message
#### sessionBegins

##### Payload
- **type** (string) - Required - Identifies the type of the message.
- **id** (string) - Required - Unique identifier for the streaming session.
- **expires_at** (integer) - Required - Unix timestamp indicating when the session will expire.

### Response Example
```json
{
  "type": "Begin",
  "id": "<string>",
  "expires_at": <integer>
}
```
```

--------------------------------

### Tool Call Message Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

This example demonstrates the structure of a 'tool.call' message, which the agent sends when it needs to invoke a registered tool. Ensure the 'call_id' is included in the subsequent 'tool.result' message.

```json
{
  "type": "tool.call",
  "call_id": "call_abc123",
  "name": "get_weather",
  "arguments": {
    "location": "Tokyo"
  }
}
```

--------------------------------

### Session Ready Payload Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Server confirms the session is established and ready for audio. Save session_id for reconnection.

```json
{
  "type": "session.ready",
  "session_id": "sess_abc123"
}
```

--------------------------------

### Authenticated Request Example

Source: https://www.assemblyai.com/docs/api-reference/overview

This example shows how to make an authenticated request to the transcript endpoint using a cURL command. Replace '<YOUR_API_KEY>' with your actual API key.

```APIDOC
## POST /v2/transcript

### Description
Submits audio data for transcription.

### Method
POST

### Endpoint
/v2/transcript

### Parameters
#### Headers
- **Authorization** (string) - Required - Your API key for authentication.

### Request Example
```bash
curl https://api.assemblyai.com/v2/transcript \
  --header 'Authorization: <YOUR_API_API_KEY>'
```

### Response
#### Success Response (200)
- **status** (string) - Indicates the status of the transcription.
- **error** (string) - Provides details if the transcription failed.

#### Response Example
```json
{
  "status": "error",
  "error": "Download error to https://foo.bar, 403 Client Error: Forbidden for url: https://foo.bar",
  ...
}
```
```

--------------------------------

### Receive Reply Started

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Agent has begun generating a response. This message indicates the start of a reply and includes a unique reply ID.

```APIDOC
## Receive Reply Started

### Description
Agent has begun generating a response. This message indicates the start of a reply and includes a unique reply ID.

### Message Type
receive

### Payload Schema
```json
{
  "type": "reply.started",
  "reply_id": "string"
}
```

### Example
```json
{
  "type": "reply.started",
  "reply_id": "reply_abc123"
}
```
```

--------------------------------

### Input Audio Chunk Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Client streams a chunk of PCM16 audio as base64. This message type is 'input.audio'.

```json
{
  "type": "input.audio",
  "audio": "EAAgADAAQAAwACAAEAAAAPD/4P/Q/8D/"
}
```

--------------------------------

### Custom Formatting Request Example

Source: https://www.assemblyai.com/docs/api-reference/llm-gateway/create-speech-understanding

This example shows how to request custom formatting for dates, phone numbers, and email addresses within the speech understanding results. Specify the desired format for each type.

```yaml
custom_formatting_example:
  summary: Custom formatting request
  value:
    transcript_id: '12345'
    speech_understanding:
      request:
        custom_formatting:
          date: mm/dd/yyyy
          phone_number: (xxx)xxx-xxxx
          email: username@domain.com
```

--------------------------------

### Reply Create Payload Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Client asks the agent to generate a reply now, optionally with one-shot instructions.

```json
{
  "type": "reply.create",
  "instructions": "Let the customer know we're still processing the transfer."
}
```

--------------------------------

### Agent Transcript Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Example payload for a transcript message from the agent. This includes the spoken text, reply ID, item ID, and interruption status.

```json
{
  "type": "transcript.agent",
  "text": "It's currently 22°C and sunny in Tokyo.",
  "reply_id": "reply_abc123",
  "item_id": "item_abc123",
  "interrupted": false
}
```

--------------------------------

### Tool Result Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Client returns the result of a tool invocation to the agent. The 'result' field should be a JSON-encoded string.

```json
{
  "type": "tool.result",
  "call_id": "call_abc123",
  "result": "{\"temp_c\": 22, \"description\": \"Sunny\"}"
}
```

--------------------------------

### Resume Session Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Client message to resume a previous session using its `session_id`. This is used to re-establish a connection after an interruption.

```json
{
  "type": "session.resume",
  "session_id": "sess_abc123"
}
```

--------------------------------

### Speaker Identification Request with Known Values

Source: https://www.assemblyai.com/docs/api-reference/llm-gateway/create-speech-understanding

This example demonstrates how to perform speaker identification by providing a list of known speaker types, such as 'interviewer' or 'candidate'.

```yaml
speaker_identification_example:
  summary: Speaker identification request with known_values
  value:
    transcript_id: '12345'
    speech_understanding:
      request:
        speaker_identification:
          speaker_type: role
          known_values:
            - interviewer
            - candidate
```

--------------------------------

### User Started Speaking Event

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Signals that the server's turn detection has identified the user beginning to speak. This event is of type `input.speech.started`.

```json
{
  "type": "input.speech.started"
}
```

--------------------------------

### Session Begins Confirmation

Source: https://www.assemblyai.com/docs/api-reference/streaming-api/universal-3-pro-streaming

Server message indicating the streaming session has successfully started. This message confirms the session initiation and provides an expiration timestamp.

```APIDOC
## Session Begins Confirmation

### Description
Server message indicating the streaming session has successfully started.

### Message Type
`SessionBegins`

### Payload Schema
```json
{
  "type": "Begin",
  "id": "string (uuid)",
  "expires_at": "integer"
}
```

### Example
```json
{
  "type": "Begin",
  "id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "expires_at": 1678886400
}
```
```

--------------------------------

### Session Begins Confirmation

Source: https://www.assemblyai.com/docs/api-reference/streaming-api/universal-streaming

Server message indicating the streaming session has successfully started. This message confirms the session ID and provides an expiration timestamp.

```APIDOC
## Session Begins Confirmation

### Description
Server message indicating the streaming session has successfully started. This message confirms the session ID and provides an expiration timestamp.

### Message Type
`session_begins`

### Payload
- **type** (string) - Required - The type of message, always `session_begins`.
- **id** (string) - Required - The unique identifier for the streaming session.
- **expires_at** (integer) - Required - Unix timestamp indicating when the session will expire.

### Example
```json
{
  "type": "session_begins",
  "id": "<string>",
  "expires_at": 123
}
```
```

--------------------------------

### Chat Completion with Simple Prompt

Source: https://www.assemblyai.com/docs/api-reference/llm-gateway/create-chat-completion

Use this example for a single-turn prompt where you provide a direct instruction to the model. Specify the model, prompt, and optionally max tokens and temperature.

```yaml
model: claude-sonnet-4-6
prompt: Write a haiku about coding
max_tokens: 50
temperature: 0.5
```

--------------------------------

### User Started Speaking

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Turn detection determined the user has started speaking. This event signals the beginning of user input.

```APIDOC
## User Started Speaking

### Description
Turn detection determined the user has started speaking.

### Event
input.speech.started

### Payload
- **type** (string) - Required - `input.speech.started`

### Response Example
```json
{
  "type": "input.speech.started"
}
```
```

--------------------------------

### Session Ready

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Server confirms the session is established and ready for audio. The `session_id` should be saved for potential reconnection using `session.resume`.

```APIDOC
## Receive session ready

### Description

Server confirms the session is established and ready for audio. Save `session_id` for reconnection and start streaming audio.

### Message Type

`session.ready`

### Payload

- **type** (string) - Required - Must be `session.ready`.
- **session_id** (string) - Required - Unique identifier for this session. Save this to reconnect with `session.resume`.

### Response Example

```json
{
  "type": "session.ready",
  "session_id": "sess_abc123"
}
```
```

--------------------------------

### Chat Completion with Messages

Source: https://www.assemblyai.com/docs/api-reference/llm-gateway/create-chat-completion

Use this example to send a series of messages to the model for a chat-like interaction. Specify the model, messages, and optionally max tokens and temperature.

```yaml
model: claude-sonnet-4-6
messages:
  - role: user
    content: Hello, how are you?
max_tokens: 100
temperature: 0.7
```

--------------------------------

### Get sentences in transcript

Source: https://www.assemblyai.com/docs/api-reference/transcripts/get-sentences

Get the transcript split by sentences. The API will attempt to semantically segment the transcript into sentences to create more reader-friendly transcripts.

```APIDOC
## GET /v2/transcript/{transcript_id}/sentences

### Description
Get the transcript split by sentences. The API will attempt to semantically segment the transcript into sentences to create more reader-friendly transcripts.

### Method
GET

### Endpoint
/v2/transcript/{transcript_id}/sentences

### Parameters
#### Path Parameters
- **transcript_id** (string) - Required - ID of the transcript

### Response
#### Success Response (200)
- **id** (string) - The unique identifier for the transcript
- **confidence** (number) - The confidence score for the transcript
- **audio_duration** (number) - The duration of the audio file in seconds
- **sentences** (array) - An array of sentences in the transcript

#### Response Example
```json
{
  "sentences": [
    {
      "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US.",
      "start": 250,
      "end": 6350,
      "confidence": 0.72412,
      "words": [
        {
          "text": "Smoke",
          "start": 250,
          "end": 650,
          "confidence": 0.72412,
          "speaker": null
        },
        {
          "text": "from",
          "start": 730,
          "end": 1022,
          "confidence": 0.99996,
          "speaker": null
        },
        {
          "text": "hundreds",
          "start": 1076,
          "end": 1466,
          "confidence": 0.99992,
          "speaker": null
        },
        {
          "text": "of",
          "start": 1498,
          "end": 1646,
          "confidence": 1,
          "speaker": null
        }
      ],
      "speaker": null
    },
    {
      "text": "Skylines from Maine to Maryland to Minnesota are gray and smoggy.",
      "start": 6500,
      "end": 11050,
      "confidence": 0.99819,
      "words": [
        {
          "text": "Skylines",
          "start": 6500,
          "end": 7306,
          "confidence": 0.99819,
          "speaker": null
        }
      ],
      "speaker": null
    }
  ]
}
```
```

--------------------------------

### OpenAPI Specification for GET /v2/transcript/{transcript_id}

Source: https://www.assemblyai.com/docs/api-reference/transcripts/get

This OpenAPI specification defines the GET endpoint for retrieving a transcript by its ID. It includes server details and security information.

```yaml
openapi: 3.1.0
info:
  title: AssemblyAI API
  description: AssemblyAI API
  version: 1.3.4
  termsOfService: https://www.assemblyai.com/legal/terms-of-service
  contact:
    name: API Support
    email: support@assemblyai.com
    url: https://www.assemblyai.com/docs/
servers:
  - url: https://api.assemblyai.com
    description: AssemblyAI API
    x-fern-server-name: Default
security:
  - ApiKey: []
tags:
  - name: transcript
    description: Transcript related operations
    externalDocs:
      url: https://www.assemblyai.com/docs/guides/transcribing-an-audio-file
  - name: streaming
    description: Streaming Speech-to-Text
    externalDocs:
      url: https://www.assemblyai.com/docs/streaming/universal-streaming
paths:
  /v2/transcript/{transcript_id}:
    get:
      tags:
        - transcript
      summary: Get transcript
      description: >
        <Note>To retrieve your transcriptions on our EU server, replace
        `api.assemblyai.com` with `api.eu.assemblyai.com`.</Note>

```

--------------------------------

### OpenAPI Specification for Get Subtitles

Source: https://www.assemblyai.com/docs/api-reference/transcripts/get-subtitles

This OpenAPI specification defines the GET /v2/transcript/{transcript_id}/{subtitle_format} endpoint, which allows you to export transcripts in SRT or VTT format for use as subtitles.

```yaml
openapi: 3.1.0
info:
  title: AssemblyAI API
  description: AssemblyAI API
  version: 1.3.4
  termsOfService: https://www.assemblyai.com/legal/terms-of-service
  contact:
    name: API Support
    email: support@assemblyai.com
    url: https://www.assemblyai.com/docs/
servers:
  - url: https://api.assemblyai.com
    description: AssemblyAI API
    x-fern-server-name: Default
security:
  - ApiKey: []
tags:
  - name: transcript
    description: Transcript related operations
    externalDocs:
      url: https://www.assemblyai.com/docs/guides/transcribing-an-audio-file
  - name: streaming
    description: Streaming Speech-to-Text
    externalDocs:
      url: https://www.assemblyai.com/docs/streaming/universal-streaming
paths:
  /v2/transcript/{transcript_id}/{subtitle_format}:
    get:
      tags:
        - transcript
      summary: Get subtitles for transcript
      description: >
        <Note>To retrieve your transcriptions on our EU server, replace
        `api.assemblyai.com` with `api.eu.assemblyai.com`.</Note>

        Export your transcript in SRT or VTT format to use with a video player
        for subtitles and closed captions.
      operationId: getSubtitles
      parameters:
        - name: transcript_id
          x-label: Transcript ID
          in: path
          description: ID of the transcript
          required: true
          schema:
            type: string
        - name: subtitle_format
          x-label: Subtitle format
          in: path
          description: The format of the captions
          required: true
          schema:
            $ref: '#/components/schemas/SubtitleFormat'
        - name: chars_per_caption
          x-label: Number of characters per caption
          in: query
          description: The maximum number of characters per caption
          schema:
            type: integer
      responses:
        '200':
          description: The exported captions as text
          content:
            text/plain:
              schema:
                type: string
                example: >
                  WEBVTT

                  00:12.340 --> 00:16.220

                  Last year I showed these two slides said that demonstrate

                  00:16.200 --> 00:20.040

                  that the Arctic ice cap which for most of the last 3,000,000
                  years has been the

                  00:20.020 --> 00:25.040

                  size of the lower 48 States has shrunk by 40% but this
                  understates
              examples:
                srt:
                  $ref: '#/components/examples/SrtSubtitlesResponse'
                vtt:
                  $ref: '#/components/examples/VttSubtitlesResponse'
            text/html:
              schema:
                type: string
                example: >
                  WEBVTT

                  00:12.340 --> 00:16.220

                  Last year I showed these two slides said that demonstrate

                  00:16.200 --> 00:20.040

```

--------------------------------

### OpenAPI Specification for Get Transcript Sentences

Source: https://www.assemblyai.com/docs/api-reference/transcripts/get-sentences

This OpenAPI specification defines the GET /v2/transcript/{transcript_id}/sentences endpoint. It outlines the request parameters, possible responses, and the structure of the SentencesResponse schema.

```yaml
openapi: 3.1.0
info:
  title: AssemblyAI API
  description: AssemblyAI API
  version: 1.3.4
  termsOfService: https://www.assemblyai.com/legal/terms-of-service
  contact:
    name: API Support
    email: support@assemblyai.com
    url: https://www.assemblyai.com/docs/
servers:
  - url: https://api.assemblyai.com
    description: AssemblyAI API
    x-fern-server-name: Default
security:
  - ApiKey: []
tags:
  - name: transcript
    description: Transcript related operations
    externalDocs:
      url: https://www.assemblyai.com/docs/guides/transcribing-an-audio-file
  - name: streaming
    description: Streaming Speech-to-Text
    externalDocs:
      url: https://www.assemblyai.com/docs/streaming/universal-streaming
paths:
  /v2/transcript/{transcript_id}/sentences:
    get:
      tags:
        - transcript
      summary: Get sentences in transcript
      description: >
        <Note>To retrieve your transcriptions on our EU server, replace
        `api.assemblyai.com` with `api.eu.assemblyai.com`.</Note>

        Get the transcript split by sentences. The API will attempt to
        semantically segment the transcript into sentences to create more
        reader-friendly transcripts.
      operationId: getTranscriptSentences
      parameters:
        - name: transcript_id
          x-label: Transcript ID
          in: path
          description: ID of the transcript
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Exported sentences
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SentencesResponse'
        '400':
          $ref: '#/components/responses/BadRequest'
        '401':
          $ref: '#/components/responses/Unauthorized'
        '404':
          $ref: '#/components/responses/NotFound'
        '429':
          $ref: '#/components/responses/TooManyRequests'
        '500':
          $ref: '#/components/responses/InternalServerError'
        '503':
          $ref: '#/components/responses/ServiceUnavailable'
        '504':
          $ref: '#/components/responses/GatewayTimeout'
components:
  schemas:
    SentencesResponse:
      x-label: Sentences response
      type: object
      x-fern-sdk-group-name: transcripts
      additionalProperties: false
      required:
        - id
        - confidence
        - audio_duration
        - sentences
      properties:
        id:
          x-label: Transcript ID
          description: The unique identifier for the transcript
          type: string
          format: uuid
        confidence:
          x-label: Confidence
          description: The confidence score for the transcript
          type: number
          format: double
          minimum: 0
          maximum: 1
        audio_duration:
          x-label: Audio duration
          description: The duration of the audio file in seconds
          type: number
        sentences:
          x-label: Sentences
          description: An array of sentences in the transcript
          type: array
          items:
            $ref: '#/components/schemas/TranscriptSentence'
            x-label: Sentence
      example:
        sentences:
          - text: >-
              Smoke from hundreds of wildfires in Canada is triggering air
              quality alerts throughout the US.
            start: 250
            end: 6350
            confidence: 0.72412
            words:
              - text: Smoke
                start: 250
                end: 650
                confidence: 0.72412
                speaker: null
              - text: from
                start: 730
                end: 1022
                confidence: 0.99996
                speaker: null
              - text: hundreds
                start: 1076
                end: 1466
                confidence: 0.99992
                speaker: null
              - text: of
                start: 1498
                end: 1646
                confidence: 1
                speaker: null
            speaker: null
          - text: Skylines from Maine to Maryland to Minnesota are gray and smoggy.
            start: 6500
            end: 11050
            confidence: 0.99819
            words:
              - text: Skylines
                start: 6500
                end: 7306
                confidence: 0.99819
                speaker: null

```

--------------------------------

### HttpToolConfig Schema Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/create-agent

Defines the structure for configuring an HTTP tool, including its URL, HTTP method, and headers. AssemblyAI calls this endpoint when the model invokes the tool.

```json
{
  "url": "https://api.example.com/weather",
  "http_method": "POST",
  "headers": {
    "Authorization": "***"
  }
}
```

--------------------------------

### OpenAPI Specification for Get Redacted Audio

Source: https://www.assemblyai.com/docs/api-reference/transcripts/get-redacted-audio

This OpenAPI specification defines the GET request for retrieving redacted audio. It includes details on server URLs, request parameters, and security. Note that redacted audio is only available for 24 hours.

```yaml
openapi: 3.1.0
info:
  title: AssemblyAI API
  description: AssemblyAI API
  version: 1.3.4
  termsOfService: https://www.assemblyai.com/legal/terms-of-service
  contact:
    name: API Support
    email: support@assemblyai.com
    url: https://www.assemblyai.com/docs/
servers:
  - url: https://api.assemblyai.com
    description: AssemblyAI API
    x-fern-server-name: Default
security:
  - ApiKey: []
tags:
  - name: transcript
    description: Transcript related operations
    externalDocs:
      url: https://www.assemblyai.com/docs/guides/transcribing-an-audio-file
  - name: streaming
    description: Streaming Speech-to-Text
    externalDocs:
      url: https://www.assemblyai.com/docs/streaming/universal-streaming
paths:
  /v2/transcript/{transcript_id}/redacted-audio:
    get:
      tags:
        - transcript
      summary: Get redacted audio
      description: >
        <Note>To retrieve the redacted audio on the EU server, replace
        `api.assemblyai.com` with `api.eu.assemblyai.com` in the `GET` request
        above.</Note>

        <Note>Redacted audio files are only available for 24 hours. Make sure to
        download the file within this time frame.</Note>

```

--------------------------------

### Session Updated Payload Example

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Server acknowledges that a `session.update` was applied successfully.

```json
{
  "type": "session.updated"
}
```

--------------------------------

### API Error Response

Source: https://www.assemblyai.com/docs/api-reference/overview

Example of a JSON response when an authentication error occurs.

```json
{
  "error": "Authentication error, API token missing/invalid"
}
```

--------------------------------

### Reply Create

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Client asks the agent to generate a reply now, optionally with one-shot instructions.

```APIDOC
## Reply Create

### Description

Client asks the agent to generate a reply now, optionally with one-shot instructions.

### Message Type

`reply.create`

### Payload

- **type** (string) - Required - Must be `reply.create`.
- **instructions** (string) - Optional - One-shot instructions for generating the reply.

### Request Example

```json
{
  "type": "reply.create",
  "instructions": "Let the customer know we're still processing the transfer."
}
```
```

--------------------------------

### Send Session Update

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Configure the session. Send immediately on connect — before `session.ready` — to set the system prompt, greeting, voice, tools, and turn detection behavior. Can also be sent mid-conversation to update mutable fields.

```APIDOC
## Send session update

### Description
Configure the session. Send immediately on connect — before `session.ready` — to set the system prompt, greeting, voice, tools, and turn detection behavior. Can also be sent mid-conversation to update mutable fields (e.g. `system_prompt`, `input.turn_detection`).

`greeting` and `output` are immutable after `session.ready` and changing them returns `immutable_field`.

### Method
SEND

### Endpoint
/v1/ws

### Parameters
#### Path Parameters
- **ApiKey** (string) - Required - Pass your API key as a Bearer token in the `Authorization` header on the WebSocket upgrade request. For browser apps (which can't set custom headers on WebSockets), generate a [temporary token](/api-reference/voice-agent-api/voice-agent-web-socket/generate-voice-agent-token) and pass it via the `token` query parameter instead. See [Browser integration](/voice-agents/voice-agent-api/browser-integration).
- **token** (string) - Required - Temporary authentication token for client-side connections. Generate one with [`GET /v1/token`](/api-reference/voice-agent-api/voice-agent-web-socket/generate-voice-agent-token) on your server and pass it here so you don't expose your permanent API key in the browser. Each token is one-time use.

#### Request Body
- **type** (string) - Required - session.update
- **session** (object) - Required - Session configuration fields. All fields are optional — only include the ones you want to change.
  - **system_prompt** (string) - Optional - The agent's personality and context. Can be updated mid-session.
  - **greeting** (string) - Optional - What the agent says at the start of the conversation. Sent directly to the TTS engine and spoken verbatim. It is NOT run through the LLM.
```

--------------------------------

### Get Sentences

Source: https://www.assemblyai.com/docs/api-reference/transcripts/get-sentences

Retrieves a list of sentences from a transcription. This endpoint is part of the AssemblyAI API.

```APIDOC
## GET /v2/realtime/{id}/transcript/sentences

### Description
Retrieves a list of sentences from a transcription. This endpoint is part of the AssemblyAI API.

### Method
GET

### Endpoint
/v2/realtime/{id}/transcript/sentences

### Parameters
#### Path Parameters
- **id** (string) - Required - The ID of the transcription to retrieve sentences from.

### Response
#### Success Response (200)
- **text** (string) - The text of the sentence.
- **start** (integer) - The start timestamp of the sentence in milliseconds.
- **end** (integer) - The end timestamp of the sentence in milliseconds.
- **confidence** (number) - The confidence score of the sentence.
- **channel** (string or null) - The channel the sentence belongs to, if speaker diarization is enabled.
- **speaker** (string or null) - The speaker of the sentence, if speaker diarization is enabled.

#### Response Example
{
  "example": {
    "text": "Smoke",
    "start": 250,
    "end": 650,
    "confidence": 0.97465,
    "channel": null,
    "speaker": null
  }
}

ERROR HANDLING:
- **BadRequest**: Bad request
- **Unauthorized**: Unauthorized
- **NotFound**: Not found
- **TooManyRequests**: Too many requests
  - **Retry-After** (integer) - The number of seconds to wait before retrying the request
- **InternalServerError**: An error occurred while processing the request
- **ServiceUnavailable**: Service unavailable
- **GatewayTimeout**: Gateway timeout
```

--------------------------------

### Failed Transcription Response

Source: https://www.assemblyai.com/docs/api-reference/overview

Example of a JSON response when a transcription fails. Includes status and error details.

```json
{
    "status": "error",
    "error": "Download error to https://foo.bar, 403 Client Error: Forbidden for url: https://foo.bar",
    ...
}
```

--------------------------------

### Update Session Configuration

Source: https://www.assemblyai.com/docs/api-reference/voice-agent-api/voice-agent-web-socket

Send this message to configure the session, including system prompt, greeting, input/output formats, and tools. Use this to customize agent behavior and capabilities.

```json
{
  "type": "session.update",
  "session": {
    "system_prompt": "You are a concise assistant.",
    "greeting": "Hi — how can I help?",
    "input": {
      "format": {
        "encoding": "audio/pcm"
      },
      "turn_detection": {
        "vad_threshold": 0.5
      }
    },
    "output": {
      "voice": "ivy",
      "format": {
        "encoding": "audio/pcm"
      },
      "volume": 100
    },
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string"
            }
          },
          "required": [
            "city"
          ]
        }
      }
    ]
  }
}
```