### Install TypeAgent-py

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/getting-started.md

Installs the TypeAgent-py library using pip. It's recommended to use a virtual environment or a package manager like poetry or uv for managing dependencies.

```shell
pip install typeagent
```

--------------------------------

### Example Text Data File

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/getting-started.md

A sample text file (`testdata.txt`) containing lines that represent conversation messages, with each line starting with a speaker's name followed by their utterance. This format is used for ingestion by the `ingest.py` script.

```text
STEVE We should really make a Python library for Structured RAG.
UMESH Who would be a good person to do the Python library?
GUIDO I volunteer to do the Python library. Give me a few months.
```

--------------------------------

### Query Ingested Data with TypeAgent-py (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/getting-started.md

Creates a conversation object, defines a question, and queries the ingested data using TypeAgent-py. Requires the same environment setup as the ingestion program. Prints the question and the retrieved answer.

```python
from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage


async def main():
    conversation = await create_conversation("demo.db", TranscriptMessage)
    question = "Who volunteered to do the python library?"
    print("Q:", question)
    answer = await conversation.query(question)
    print("A:", answer)


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
```

--------------------------------

### Set OpenAI Environment Variables (Shell)

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/getting-started.md

Sets the necessary environment variables for using OpenAI's API with TypeAgent-py. This includes the API key and the desired model. Additional variables might be needed for specific setups, including Azure OpenAI.

```shell
export OPENAI_API_KEY=your-very-secret-openai-api-key
export OPENAI_MODEL=gpt-4o
```

--------------------------------

### Ingest Text Data with TypeAgent-py (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/getting-started.md

Reads messages from a text file, parses them into TranscriptMessage objects, and indexes them using TypeAgent-py. Requires OpenAI API key and model to be set as environment variables. Outputs the number of messages indexed and semantic references created.

```python
from typeagent import create_conversation
from typeagent.transcripts.transcript import (
    TranscriptMessage,
    TranscriptMessageMeta,
)


def read_messages(filename) -> list[TranscriptMessage]:
    messages: list[TranscriptMessage] = []
    with open(filename, "r") as f:
        for line in f:
            # Parse each line into a TranscriptMessage
            speaker, text_chunk = line.split(None, 1)
            message = TranscriptMessage(
                text_chunks=[text_chunk],
                metadata=TranscriptMessageMeta(speaker=speaker),
            )
            messages.append(message)
    return messages


async def main():
    conversation = await create_conversation("demo.db", TranscriptMessage)
    messages = read_messages("testdata.txt")
    print(f"Indexing {len(messages)} messages...")
    results = await conversation.add_messages_with_indexing(messages)
    print(f"Indexed {results.messages_added} messages.")
    print(f"Got {results.semrefs_added} semantic refs.")


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
```

--------------------------------

### Query Conversation with TypeAgent-Py

Source: https://context7.com/microsoft/typeagent-py/llms.txt

This example shows how to create a conversation instance, ask a single question, and then enter an interactive loop for continuous querying. It handles basic user input and potential errors.

```python
from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage
import asyncio

async def query_example():
    # Connect to existing conversation
    conversation = await create_conversation("demo.db", TranscriptMessage)

    # Single query
    question = "Who volunteered to work on authentication?"
    print(f"Q: {question}")
    answer = await conversation.query(question)
    print(f"A: {answer}")

    # Interactive query loop
    print("\nInteractive mode (type 'quit' to exit):")
    while True:
        try:
            user_question = input("typeagent> ")
            if not user_question.strip():
                continue
            if user_question.lower() in ('quit', 'exit', 'q'):
                break

            response = await conversation.query(user_question)
            print(response)

            # Check if no answer was found
            if response.startswith("No answer found:"):
                print("(Insufficient information in conversation)")
        except KeyboardInterrupt:
            break
        except Exception as e:
            print(f"Error: {e}")

asyncio.run(query_example())
```

--------------------------------

### Example Main Function and Execution in Python

Source: https://context7.com/microsoft/typeagent-py/llms.txt

Demonstrates the main execution flow for the TypeAgent project. It includes creating a conversation, generating a large dataset of messages, performing batch ingestion, and verifying the final state of the conversation. The script uses asyncio for asynchronous operations and basic logging.

```python
import logging
import asyncio

# Assume TranscriptMessage, TranscriptMessageMeta, create_conversation, and batch_ingest are defined elsewhere

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Placeholder classes and functions for demonstration purposes
class TranscriptMessageMeta:
    def __init__(self, speaker):
        self.speaker = speaker

class TranscriptMessage:
    def __init__(self, text_chunks, metadata):
        self.text_chunks = text_chunks
        self.metadata = metadata

async def create_conversation(db_path, message_type):
    # This is a mock implementation. Replace with actual conversation creation.
    logger.info(f"Creating conversation with db: {db_path} and type: {message_type.__name__}")
    class MockConversation:
        async def add_messages_with_indexing(self, messages):
            # Mocking the result of adding messages
            return type('obj', (object,), {'messages_added': len(messages), 'semrefs_added': len(messages) // 2})()
        @property
        async def messages(self):
            # Mocking the messages attribute and its size method
            class MockMessages:
                async def size(self):
                    return 500 # Mock size
            return MockMessages()
    return MockConversation()

async def batch_ingest(conversation, messages, batch_size=50):
    # This is a mock implementation of batch_ingest. Replace with actual implementation.
    total_messages = 0
    total_semrefs = 0
    for i in range(0, len(messages), batch_size):
        batch = messages[i:i + batch_size]
        batch_num = i // batch_size + 1
        try:
            logger.info(f"Processing batch {batch_num} ({len(batch)} messages)...")
            result = await conversation.add_messages_with_indexing(batch)
            total_messages += result.messages_added
            total_semrefs += result.semrefs_added
            logger.info(f"Batch {batch_num} complete: {result.messages_added} messages, {result.semrefs_added} semantic refs")
        except Exception as e:
            logger.error(f"Error in batch {batch_num}: {e}")
            continue
    return total_messages, total_semrefs

async def main():
    conversation = await create_conversation("large_dataset.db", TranscriptMessage)

    # Generate large dataset
    messages = [
        TranscriptMessage(
            text_chunks=[f"This is message number {i} about topic {i % 10}."],
            metadata=TranscriptMessageMeta(speaker=f"Speaker{i % 5}")
        )
        for i in range(500)
    ]

    logger.info(f"Starting ingestion of {len(messages)} messages...")
    total_msgs, total_refs = await batch_ingest(conversation, messages, batch_size=100)
    logger.info(f"Ingestion complete: {total_msgs} messages, {total_refs} semantic refs")

    # Verify conversation state
    size = await conversation.messages.size()
    logger.info(f"Total messages in conversation: {size}")

if __name__ == "__main__":
    asyncio.run(main())

```

--------------------------------

### TypeAgent-Py Environment Configuration

Source: https://context7.com/microsoft/typeagent-py/llms.txt

This example shows various ways to configure API credentials for OpenAI and Azure OpenAI services. It covers setting environment variables directly, using a .env file, and specifying custom endpoints for OpenAI-compatible services and embedding servers.

```python
from typeagent.aitools.utils import load_dotenv
import os

# Option 1: Set environment variables directly
os.environ['OPENAI_API_KEY'] = 'sk-...'
os.environ['OPENAI_MODEL'] = 'gpt-4o'

# Option 2: Use .env file (recommended)
# Create a .env file in your project directory:
# OPENAI_API_KEY=sk-your-secret-key
# OPENAI_MODEL=gpt-4o
load_dotenv()

# Option 3: Azure OpenAI configuration
os.environ['AZURE_OPENAI_API_KEY'] = 'your-azure-key'
os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT/chat/completions?api-version=2023-05-15'

# Option 4: Custom OpenAI-compatible endpoint
os.environ['OPENAI_API_KEY'] = 'dummy-key'
os.environ['OPENAI_MODEL'] = 'llama:3.2:1b'
os.environ['OPENAI_ENDPOINT'] = 'http://localhost:11434/v1'  # Ollama example

# Option 5: Custom embedding server
os.environ['OPENAI_API_KEY'] = 'dummy-key'
os.environ['OPENAI_BASE_URL'] = 'http://localhost:7997'  # Infinity embedding server
```

--------------------------------

### WebVTT Transcript Format Example

Source: https://github.com/microsoft/typeagent-py/blob/main/typeagent/transcripts/README.md

Illustrates the standard WebVTT file format for captions and subtitles, including timestamps and speaker information. This format is recognized and parsed by the `ingest_vtt_transcript` function.

```webvtt
WEBVTT
Kind: captions
Language: en

00:00:07.599 --> 00:00:10.559
SPEAKER: Hello, this is a test.

00:00:10.560 --> 00:00:15.000
[Another Speaker] This is another line.
```

--------------------------------

### Read Messages from File and Index with TypeAgent-Py

Source: https://context7.com/microsoft/typeagent-py/llms.txt

This example demonstrates how to read messages from a text file, parse them with speaker attribution, and then add them to a conversation for indexing. It includes error handling for file not found and malformed lines.

```python
from typeagent import create_conversation
from typeagent.transcripts.transcript import (
    TranscriptMessage,
    TranscriptMessageMeta
)
import asyncio

def read_messages_from_file(filename):
    """
    Parse messages from a text file.
    Expected format: SPEAKER message text
    Example file (testdata.txt):
        ALICE We should add a new feature.
        BOB I agree, let's start next week.
        CHARLIE I can help with the design.
    """
    messages = []
    try:
        with open(filename, 'r') as f:
            for line_num, line in enumerate(f, 1):
                line = line.strip()
                if not line or line.startswith('#'):
                    continue  # Skip empty lines and comments

                try:
                    # Split on first whitespace
                    speaker, text_chunk = line.split(None, 1)
                    message = TranscriptMessage(
                        text_chunks=[text_chunk],
                        metadata=TranscriptMessageMeta(speaker=speaker)
                    )
                    messages.append(message)
                except ValueError:
                    print(f"Warning: Skipping malformed line {line_num}")
                    continue
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found")
        return []

    return messages

async def main():
    conversation = await create_conversation("demo.db", TranscriptMessage)
    messages = read_messages_from_file("testdata.txt")

    if messages:
        print(f"Indexing {len(messages)} messages...")
        results = await conversation.add_messages_with_indexing(messages)
        print(f"Indexed {results.messages_added} messages.")
        print(f"Got {results.semrefs_added} semantic refs.")
    else:
        print("No messages to index")

asyncio.run(main())
```

--------------------------------

### Python Type Hinting for Interfaces

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

Interfaces in Python, specifically those starting with 'I' followed by a capital letter, should be defined using the 'Protocol' type from the 'typing' module. This facilitates structural subtyping.

```python
from typing import Protocol

class IListService:
    ...

class SomeClass(IListService):
    ...

```

--------------------------------

### Test VTT Transcript Ingestion - Python

Source: https://github.com/microsoft/typeagent-py/blob/main/typeagent/transcripts/README.md

An example test case for the `ingest_vtt_transcript` function using pytest. It demonstrates setting up `ConversationSettings` with an embedding model and asserting that the ingested transcript contains messages. Dependencies include pytest and fixtures for authentication and embedding models.

```python
import pytest
from fixtures import needs_auth, embedding_model
from typeagent.knowpro.convsettings import ConversationSettings
from typeagent.transcripts.transcript_ingest import ingest_vtt_transcript

@pytest.mark.asyncio
async def test_my_transcript(needs_auth, embedding_model):
    settings = ConversationSettings(embedding_model)
    
    transcript = await ingest_vtt_transcript(
        "test.vtt", 
        settings,
        dbname="test.db",
    )
    
    assert await transcript.messages.size() > 0
```

--------------------------------

### Python Conversation Metadata Usage Examples

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_spec.md

Demonstrates how to interact with conversation metadata using an instance of IStorageProvider. It covers retrieving metadata (with default creation if none exists), setting metadata with various options (updating specific fields, providing explicit timestamps, overriding all fields), handling schema version validation errors, and using None to retain existing values. This snippet requires an asynchronous context to run.

```python
# Assume 'provider' is an initialized instance of IStorageProvider

# Get metadata (always returns a valid object, creates defaults if needed)
# metadata = await provider.get_conversation_metadata()
# print(f"Name: {metadata.name_tag}")  # Never None, could be empty string
# print(f"Tags: {metadata.tags}")      # Never None, could be empty list []
# print(f"Extra: {metadata.extra}")    # Never None, could be empty dict {}

# Create new conversation with defaults (timestamps auto-set to now UTC)
# await provider.set_conversation_metadata(name_tag="my_conversation")

# Update just the tags, timestamp gets updated automatically
# await provider.set_conversation_metadata(tags=["important", "work"])

# Update timestamp only (equivalent to refresh/touch)
# await provider.set_conversation_metadata()

# Override everything explicitly with timezone handling
# from datetime import datetime, timezone
# await provider.set_conversation_metadata(
#     name_tag="conversation",
#     created_at=datetime(2025, 1, 1, 12, 0, 0),      # Assumes local TZ, converted to UTC
#     updated_at=datetime.now(timezone.utc),          # Explicit UTC
#     tags=["tag1", "tag2"],                          # Never None
#     extra={"custom_field": "value"}                 # Never None
# )

# Schema version validation (raises ValueError if mismatch)
# try:
#     await provider.set_conversation_metadata(schema_version="1.0")  # Would raise error
# except ValueError as e:
#     print(f"Schema mismatch: {e}")

# Baseline behavior - use None to keep existing values
# await provider.set_conversation_metadata(
#     name_tag="new_name",     # Update name
#     tags=None,               # Keep existing tags from baseline
#     extra=None,              # Keep existing extra from baseline
#     # created_at not specified -> keeps existing, updated_at -> current time
# )

```

--------------------------------

### Configure Embedding Models in TypeAgent Python

Source: https://context7.com/microsoft/typeagent-py/llms.txt

This example shows how to configure and use custom embedding models with TypeAgent Python for semantic search. It covers using default OpenAI embeddings, specifying custom models with different sizes, and generating embeddings for text. Dependencies include 'typeagent.aitools.embeddings', 'asyncio', and 'numpy'.

```python
from typeagent.aitools.embeddings import AsyncEmbeddingModel
import asyncio
import numpy as np

async def embedding_example():
    # Option 1: Default OpenAI embeddings (text-embedding-ada-002)
    default_model = AsyncEmbeddingModel()

    # Option 2: Specify model explicitly
    small_model = AsyncEmbeddingModel(
        model_name="text-embedding-small",
        embedding_size=512  # Smaller, faster embeddings
    )

    # Option 3: Large model for better quality
    large_model = AsyncEmbeddingModel(
        model_name="text-embedding-large",
        embedding_size=3072
    )

    # Generate embeddings for text
    texts = [
        "Machine learning is a subset of artificial intelligence.",
        "Python is a popular programming language.",
        "Deep learning uses neural networks."
    ]

    try:
        embeddings = await default_model.create_embeddings(texts)
        print(f"Generated {len(embeddings)} embeddings")
        print(f"Embedding shape: {embeddings[0].shape}")
        print(f"Embedding size: {default_model.embedding_size}")

        # Calculate similarity between first two texts
        similarity = np.dot(embeddings[0], embeddings[1])
        print(f"Similarity between texts: {similarity:.4f}")

    except Exception as e:
        print(f"Error generating embeddings: {e}")

asyncio.run(embedding_example())
```

--------------------------------

### Run TypeAgent demo UI with default podcast data

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/demos.md

This command runs the TypeAgent demo UI without specifying a database, which defaults to using the provided podcast index files. Users can then interactively ask questions about the podcast content. The demo utilizes Azure OpenAI for processing.

```sh
python tools/query.py
```

--------------------------------

### Activating Virtual Environment

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

To activate the project's virtual environment, first create it using 'make venv' and then source the activate script located in '.venv/bin/activate'. This ensures that the correct Python interpreter and packages are used.

```bash
make venv
source .venv/bin/activate

```

--------------------------------

### Ingest WebVTT files into SQLite DB

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/demos.md

This tool ingests WebVTT format files into a SQLite database for querying. It requires one or more .vtt files and an output database file name. The process can take a significant amount of time depending on the number and size of the input files.

```sh
python tools/ingest_vtt.py FILE1.vtt ... FILEN.vtt -d mp.db
```

--------------------------------

### Formatting Code with Black

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

Run the 'make format' command to automatically format all files in the project using the Black code formatter. This ensures consistent code style across the codebase.

```bash
make format

```

--------------------------------

### Get Nearest Indexes (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/indexes_overview.md

Retrieves indexes of nearest neighbors based on a given embedding. This method supports optional parameters for maximum matches, minimum score, and a predicate for filtering results. It returns a list of scored integer identifiers.

```python
def get_indexes_of_nearest(
    self, 
    embedding: NormalizedEmbedding,
    max_matches: int | None = None,
    min_score: float | None = None,
    predicate: Callable[[int], bool] | None = None,
) -> list[ScoredInt]
```

--------------------------------

### Query data from SQLite DB

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/demos.md

This tool allows querying a SQLite database. It can be used to ask questions about data that has been previously ingested. The database file path is provided as an argument. This tool is used for both the Monty Python and GMail demos.

```sh
python tools/query.py -d mp.db
```

--------------------------------

### Analyze VTT Transcript - Python

Source: https://github.com/microsoft/typeagent-py/blob/main/typeagent/transcripts/README.md

Provides functions to analyze WebVTT files, including getting the total duration and extracting speaker information. It also includes a utility to extract speaker names from text lines. Dependencies include typeagent.transcripts.transcript_ingest.

```python
from typeagent.transcripts.transcript_ingest import (
    get_transcript_duration,
    get_transcript_speakers,
    extract_speaker_from_text,
)

# Get basic information
duration = get_transcript_duration("transcript.vtt")
speakers = get_transcript_speakers("transcript.vtt")

print(f"Duration: {duration/60:.1f} minutes")
print(f"Speakers: {speakers}")

# Test speaker extraction
speaker, text = extract_speaker_from_text("NARRATOR: Once upon a time...")
print(f"Speaker: {speaker}, Text: {text}")
```

--------------------------------

### Package Management with uv: Adding Dependencies

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

Use the 'uv add' command to incorporate new packages into the project's dependencies. uv will automatically update the 'pyproject.toml' file to reflect the changes.

```bash
uv add <package>

```

--------------------------------

### Download GMail messages using GMail API

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/demos.md

A tool to download GMail messages using the GMail API. It requires a Google Cloud app to be created and configured. The tool can download a specified number of messages, with a default of 50. Instructions for setting up the Google Cloud app are provided via a GeeksForGeeks link.

```python
from gmail import gmail_dump

# Example usage (assuming configuration is done)
# gmail_dump.download_messages(num_messages=50)
```

--------------------------------

### Memory Storage Provider Implementation (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

Provides a concrete implementation of the `IStorageProvider` interface using in-memory storage. The `MemoryStorageProvider` class ensures that all required index getter methods are implemented, returning the corresponding in-memory index instances. This serves as a basic storage solution for testing and development purposes.

```python
class MemoryStorageProvider[TMessage: IMessage](IStorageProvider[TMessage]):
    async def get_conversation_index(self) -> ITermToSemanticRefIndex:
        return self._conversation_index

    async def get_property_index(self) -> IPropertyToSemanticRefIndex:
        return self._property_index

    # ... all other index getters implemented
```

--------------------------------

### SQL: Create SemanticRefs Table

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_spec.md

Defines the schema for the SemanticRefs table, used to store semantic references. It decomposes text ranges into start and end message IDs and chunk orders, specifying the knowledge type and the knowledge content. Foreign key constraints ensure referential integrity with the Messages table.

```sql
CREATE TABLE SemanticRefs (
    semref_id INTEGER PRIMARY KEY AUTOINCREMENT,
    -- TextRange decomposed into separate columns for efficient querying
    -- Forms a half-open interval [start, end)
    -- If in-memory TextRange has no end, defaults to: end_msg_id = start_msg_id, end_chunk_ord = start_chunk_ord + 1
    start_msg_id INTEGER NOT NULL,
    start_chunk_ord INTEGER NOT NULL,
    end_msg_id INTEGER NOT NULL,
    end_chunk_ord INTEGER NOT NULL,  -- Points past the last included chunk
    ktype TEXT NOT NULL CHECK (ktype IN ('entity', 'action', 'topic', 'tag')),
    knowledge JSON NOT NULL,

    FOREIGN KEY (start_msg_id) REFERENCES Messages(msg_id) ON DELETE RESTRICT,
    FOREIGN KEY (end_msg_id) REFERENCES Messages(msg_id) ON DELETE RESTRICT
);

CREATE INDEX idx_semantic_refs_start_msg ON SemanticRefs(start_msg_id);
CREATE INDEX idx_semantic_refs_end_msg ON SemanticRefs(end_msg_id);
CREATE INDEX idx_semantic_refs_ktype ON SemanticRefs(ktype);
```

--------------------------------

### Running Project Tests

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

To execute the tests within the project, use the 'pytest test' command. This command invokes the pytest framework to discover and run all defined tests.

```bash
make test
```

--------------------------------

### Running Type Checking with Pyright

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

To perform type checking on the project's code, utilize the 'pyright' command or the 'make check' command. This helps in identifying type-related errors before runtime.

```bash
pyright

```

```bash
make check

```

--------------------------------

### Package Management with uv: Upgrading Dependencies

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

To upgrade existing packages to their latest compatible versions, use 'uv add <package> --upgrade'. This command ensures that packages are updated and 'pyproject.toml' is synchronized.

```bash
uv add <package> --upgrade

```

--------------------------------

### Running All Checks and Tests

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

Execute 'make check test' to first run the type checker ('make check') and, if it passes, subsequently run all tests ('make test'). This is a comprehensive validation step.

```bash
make check test

```

--------------------------------

### Implement MemoryStorageProvider Index Management (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

This Python code snippet implements index management within the MemoryStorageProvider class. It initializes various dictionaries to hold different types of indexes and provides methods to asynchronously retrieve or create these indexes for a given conversation ID. It also includes methods to ensure all necessary indexes are created or dropped for a conversation.

```python
class MemoryStorageProvider[TMessage: IMessage](IStorageProvider[TMessage]):
    def __init__(self):
        # ... existing init ...
        self._conversation_indexes: dict[str, SemanticRefIndex] = {}
        self._property_indexes: dict[str, PropertyIndex] = {}
        self._timestamp_indexes: dict[str, TimestampToTextRangeIndex] = {}
        self._message_text_indexes: dict[str, MessageTextIndex] = {}
        self._related_terms_indexes: dict[str, RelatedTermsIndex] = {}
        self._conversation_threads: dict[str, ConversationThreads] = {}

    async def get_conversation_index(
        self, conversation_id: str
    ) -> ITermToSemanticRefIndex:
        if conversation_id not in self._conversation_indexes:
            self._conversation_indexes[conversation_id] = SemanticRefIndex()
        return self._conversation_indexes[conversation_id]

    async def get_related_terms_index(
        self, conversation_id: str
    ) -> ITermToRelatedTermsIndex:
        if conversation_id not in self._related_terms_indexes:
            # Use default settings for now
            from .reltermsindex import RelatedTermsIndex, RelatedTermIndexSettings
            settings = RelatedTermIndexSettings()
            self._related_terms_indexes[conversation_id] = RelatedTermsIndex(settings)
        return self._related_terms_indexes[conversation_id]

    async def get_conversation_threads(
        self, conversation_id: str
    ) -> IConversationThreads:
        if conversation_id not in self._conversation_threads:
            self._conversation_threads[conversation_id] = ConversationThreads()
        return self._conversation_threads[conversation_id]

    # ... similar methods for other index types ...

    async def create_indexes_for_conversation(
        self, conversation_id: str
    ) -> None:
        # Ensure all indexes exist for this conversation
        await self.get_conversation_index(conversation_id)
        await self.get_property_index(conversation_id)
        await self.get_timestamp_index(conversation_id)
        await self.get_message_text_index(conversation_id)
        await self.get_related_terms_index(conversation_id)
        await self.get_conversation_threads(conversation_id)

    async def drop_indexes_for_conversation(
        self, conversation_id: str
    ) -> None:
        self._conversation_indexes.pop(conversation_id, None)
        self._property_indexes.pop(conversation_id, None)
        self._timestamp_indexes.pop(conversation_id, None)
        self._message_text_indexes.pop(conversation_id, None)
        self._related_terms_indexes.pop(conversation_id, None)
        self._conversation_threads.pop(conversation_id, None)
```

--------------------------------

### Python Copyright Header

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

When creating a new Python file, a standard copyright and license header must be included at the top of the file. This ensures proper attribution and licensing for the code.

```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

```

--------------------------------

### Ingest EML email files into SQLite DB

Source: https://github.com/microsoft/typeagent-py/blob/main/docs/demos.md

This tool ingests email messages from .eml files in a specified directory into a SQLite database named gmail.db. It is an interactive tool, and the primary command is to add messages from a given path. The process can be time-consuming and may encounter errors with large files or timeouts.

```sh
python tools/test_email.py .
```

```text
@add_messages --path "email-folder"
```

--------------------------------

### Interface: IStorageProvider Index Methods

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

This Python code snippet demonstrates the extended IStorageProvider interface, which includes methods for retrieving various types of indexes. These methods are crucial for accessing and managing indexes within the storage layer of the TypeAgent-Py project.

```python
from typing import Dict, List, Optional, Tuple

from datatypes import ScoredSemanticRefOrdinal, TextRange, Thread
from datatypes import SemanticRef
from vectorbase import VectorBase


class IStorageProvider:
    def get_semantic_ref_index(self) -> ITermToSemanticRefIndex:
        ...  # Placeholder for implementation

    def get_property_index(self) -> IPropertyToSemanticRefIndex:
        ...  # Placeholder for implementation

    def get_timestamp_to_text_range_index(self) -> ITimestampToTextRangeIndex:
        ...  # Placeholder for implementation

    def get_message_text_index(self) -> IMessageTextEmbeddingIndex:
        ...  # Placeholder for implementation

    def get_related_terms_index(self) -> ITermToRelatedTermsIndex:
        ...  # Placeholder for implementation

    def get_conversation_threads(self) -> IConversationThreads:
        ...  # Placeholder for implementation

    def get_embedding_index(self) -> EmbeddingIndex:
        ...  # Placeholder for implementation
```

--------------------------------

### Accessing Secondary Indexes via Coordinator

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/indexes_overview.md

Demonstrates the recommended way to access secondary indexes in TypeAgent-Py. It shows how to obtain the index coordinator and subsequently access specific indexes like `property_to_semantic_ref_index`. This pattern abstracts away the underlying storage provider details.

```python
# Access via the coordinator (recommended)
idx = conversation.secondary_indexes
prop_idx = idx.property_to_semantic_ref_index
# … use prop_idx, timestamp_index, message_index, etc.
```

--------------------------------

### Build Semantic Reference Index using Storage Provider (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

This Python asynchronous function builds a semantic reference index by retrieving the conversation index from a storage provider. It maintains existing building logic but utilizes the index obtained from storage.

```python
async def build_semantic_ref[TMessage: IMessage](
    conversation: IConversation[TMessage, SemanticRefIndex],
    conversation_settings: importing.ConversationSettings,
    event_handler: IndexingEventHandlers | None = None,
) -> IndexingResults:
    # Get indexes from storage provider instead of conversation properties
    storage_provider = conversation.storage_provider
    conversation_index = await storage_provider.get_conversation_index(conversation.conversation_id)

    # Keep existing building logic, just use storage provider index
    result = IndexingResults()
    result.semantic_refs = await build_semantic_ref_index(
        conversation,
        conversation_settings.semantic_ref_index_settings,
        event_handler,
    )
    # ... rest of building logic stays the same ...

```

--------------------------------

### Loading Environment Variables

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

To load environment variables, particularly API keys, for ad-hoc code execution, call the 'typeagent.aitools.utils.load_dotenv()' function. This is useful for local development and testing.

```python
from typeagent.aitools.utils import load_dotenv

load_dotenv()

```

--------------------------------

### Test All Index Creation with Storage Provider (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

This Python test function, using pytest and asyncio, verifies that all six index types can be created and accessed via the storage provider. It asserts that conversation and property indexes are not null.

```python
# ✅ IMPLEMENTED in test/test_storage_indexes.py
@pytest.mark.asyncio
async def test_all_index_creation(storage, needs_auth):
    """Test that all 6 index types are created and accessible."""
    conv_index = await storage.get_conversation_index()
    assert conv_index is not None

    prop_index = await storage.get_property_index()
    assert prop_index is not None
    # ... tests for all index types

```

--------------------------------

### SQL: Create Usage Metrics and Query Performance Tables

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_future_extensions.md

Defines database schemas for tracking usage metrics and query performance. The UsageMetrics table stores named metrics with values and timestamps, while QueryPerformance tracks details about executed queries.

```sql
-- Usage metrics table
CREATE TABLE UsageMetrics (
    metric_id INTEGER PRIMARY KEY AUTOINCREMENT,
    metric_name TEXT NOT NULL,
    metric_value REAL NOT NULL,
    timestamp TEXT NOT NULL,
    metadata JSON
);

CREATE INDEX idx_usage_metrics_name_time ON UsageMetrics(metric_name, timestamp);

-- Query performance tracking
CREATE TABLE QueryPerformance (
    query_id INTEGER PRIMARY KEY AUTOINCREMENT,
    query_type TEXT NOT NULL,
    duration_ms INTEGER NOT NULL,
    result_count INTEGER,
    timestamp TEXT NOT NULL,
    query_params JSON
);
```

--------------------------------

### Parameterizing Storage Provider Tests with Pytest

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

This fixture demonstrates how to parameterize tests to run against multiple storage provider implementations (Memory and SQLite). It ensures that tests are executed once for each provider, facilitating cross-provider validation. Dependencies include pytest_asyncio and the embedding model.

```python
@pytest_asyncio.fixture(params=["memory", "sqlite"])
async def storage_provider_type(request, embedding_model, temp_db_path):
    # Returns both provider types, tests run twice - once per provider
```

--------------------------------

### Build Timestamp Index using Storage Provider (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

This Python asynchronous function constructs a timestamp index by fetching it from the storage provider. It then uses existing logic to add new messages to this index.

```python
async def build_timestamp_index(conversation: IConversation) -> ListIndexingResult:
    if conversation.messages:
        # Get timestamp index from storage provider
        storage_provider = conversation.storage_provider
        timestamp_index = await storage_provider.get_timestamp_index(conversation.conversation_id)

        # Use existing logic with storage provider index
        return await add_to_timestamp_index(
            timestamp_index,
            conversation.messages,
            0,
        )
    return ListIndexingResult(0)

```

--------------------------------

### Update Conversation Access Pattern (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

Demonstrates the refactoring of how to access conversation indexes, moving from direct property access to using new asynchronous getter methods on the conversation object.

```python
# Before:
# index = conversation.semantic_ref_index

# After:
index = await conversation.get_conversation_index()
```

--------------------------------

### Email Memory Integration in Python

Source: https://context7.com/microsoft/typeagent-py/llms.txt

Illustrates how to set up a specialized conversation for indexing email messages using TypeAgent's EmailMemory. This includes configuring conversation settings, storage providers, and enabling features like noise term filtering and verb synonyms. It requires the `typeagent` library and its email-related submodules.

```python
from typeagent.emails.email_memory import EmailMemory, EmailMemorySettings
from typeagent.emails.email_message import EmailMessage
from typeagent.knowpro.convsettings import ConversationSettings
from typeagent.storage.utils import create_storage_provider
import asyncio

async def create_email_conversation():
    # Create settings
    conversation_settings = ConversationSettings()
    email_settings = EmailMemorySettings(conversation_settings)

    # Create storage provider
    storage_provider = await create_storage_provider(
        message_text_settings=conversation_settings.message_text_index_settings,
        related_terms_settings=conversation_settings.related_term_index_settings,
        dbname="emails.db",
        message_type=EmailMessage
    )

    email_settings.conversation_settings.storage_provider = storage_provider

    # Create email memory (includes noise term filtering and verb synonyms)
    email_memory = await EmailMemory.create(
        settings=email_settings.conversation_settings,
        name="Corporate Inbox",
        tags=["work-email"]
    )

    # Add email messages
    email_messages = [
        EmailMessage(
            text_chunks=["Please review the quarterly report by EOD."],
            metadata={
                'sender': 'boss@company.com',
                'recipients': ['team@company.com'],
                'subject': 'Q1 Report Review'
            },
            timestamp="2025-01-15T09:00:00z"
        )
    ]

    result = await email_memory.add_messages_with_indexing(email_messages)
    print(f"Indexed {result.messages_added} emails, {result.semrefs_added} semantic refs")

    # Query emails (noise filtering applied automatically)
    answer = await email_memory.query("Who asked for the quarterly report?")
    print(answer)

asyncio.run(create_email_conversation())
```

--------------------------------

### Test All Index Creation (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

Tests the lazy creation of all seven index types within the MemoryStorageProvider. It asserts that the correct interface types are instantiated when accessed for a given conversation.

```python
import pytest
from typeagent.storage.memorystore import MemoryStorageProvider
from typeagent.knowpro.interfaces import (
    ITermToSemanticRefIndex, IPropertyToSemanticRefIndex,
    ITimestampToTextRangeIndex, IMessageTextIndex,
    ITermToRelatedTermsIndex, IConversationThreads
)

@pytest.mark.asyncio
async def test_all_index_creation():
    """Test that all 7 index types are created lazily."""
    storage = MemoryStorageProvider()

    # Test all index types
    conv_index = await storage.get_conversation_index("conv1")
    assert isinstance(conv_index, ITermToSemanticRefIndex)

    prop_index = await storage.get_property_index("conv1")
    assert isinstance(prop_index, IPropertyToSemanticRefIndex)

    time_index = await storage.get_timestamp_index("conv1")
    assert isinstance(time_index, ITimestampToTextRangeIndex)

    msg_index = await storage.get_message_text_index("conv1")
    assert isinstance(msg_index, IMessageTextIndex)

    rel_index = await storage.get_related_terms_index("conv1")
    assert isinstance(rel_index, ITermToRelatedTermsIndex)

    threads = await storage.get_conversation_threads("conv1")
    assert isinstance(threads, IConversationThreads)
```

--------------------------------

### Advanced Query with Custom Options in Python

Source: https://context7.com/microsoft/typeagent-py/llms.txt

Demonstrates how to perform advanced queries using TypeAgent with customizable search and answer generation options. This includes configuring search parameters like exact scope, verb scope, and semantic similarity, as well as answer generation parameters such as entity and topic limits. It requires the `typeagent` library and its submodules.

```python
from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage
from typeagent.knowpro.searchlang import (
    LanguageSearchOptions,
    LanguageQueryCompileOptions
)
from typeagent.knowpro.answers import AnswerContextOptions
import asyncio

async def advanced_query():
    conversation = await create_conversation("demo.db", TranscriptMessage)

    # Configure search options
    search_options = LanguageSearchOptions(
        compile_options=LanguageQueryCompileOptions(
            exact_scope=False,      # Allow fuzzy entity matching
            verb_scope=True,        # Match action verbs
            term_filter=None,       # No term filtering
            apply_scope=True        # Apply scoping rules
        ),
        exact_match=False,          # Enable semantic similarity
        max_message_matches=50,     # Maximum messages to retrieve
        max_knowledge_matches=100   # Maximum knowledge items
    )

    # Configure answer generation options
    answer_options = AnswerContextOptions(
        entities_top_k=50,          # Top entities to include
        topics_top_k=50,            # Top topics to include
        messages_top_k=None,        # No message limit
        chunking=None               # No text chunking
    )

    question = "What security features were discussed?"
    answer = await conversation.query(
        question=question,
        search_options=search_options,
        answer_options=answer_options
    )

    print(f"Q: {question}")
    print(f"A: {answer}")

asyncio.run(advanced_query())
```

--------------------------------

### Test Index Persistence Per Conversation (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

Ensures that accessing indexes for the same conversation multiple times returns the same instance, demonstrating persistence within the provider for a given conversation.

```python
@pytest.mark.asyncio
async def test_index_persistence_per_conversation():
    """Test that same index instance is returned for same conversation."""
    storage = MemoryStorageProvider()

    # All index types should return same instance for same conversation
    conv1_1 = await storage.get_conversation_index("conv1")
    conv1_2 = await storage.get_conversation_index("conv1")
    assert conv1_1 is conv1_2

    prop1_1 = await storage.get_property_index("conv1")
    prop1_2 = await storage.get_property_index("conv1")
    assert prop1_1 is prop1_2
```

--------------------------------

### Create Conversation with TypeAgent Python

Source: https://context7.com/microsoft/typeagent-py/llms.txt

Initializes a TypeAgent conversation, allowing for message storage and indexing. Supports both persistent SQLite databases and in-memory storage. Requires API keys and model configuration for LLM integration.

```python
from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage
import asyncio
import os

# Set up environment variables for OpenAI
os.environ['OPENAI_API_API_KEY'] = 'your-api-key-here'
os.environ['OPENAI_MODEL'] = 'gpt-4o'

async def main():
    # Create conversation with SQLite storage
    conversation = await create_conversation(
        dbname="my_conversation.db",
        message_type=TranscriptMessage,
        name="Team Meeting",
        tags=["project-discussion", "2025-q1"]
    )

    # Create in-memory conversation (no persistence)
    temp_conversation = await create_conversation(
        dbname=None,
        message_type=TranscriptMessage,
        name="Temporary Session"
    )

    print(f"Conversation created with {await conversation.messages.size()} messages")

asyncio.run(main())
```

--------------------------------

### Build Secondary Indexes (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/indexes_overview.md

Coordinates the building of all secondary indexes, handling dependencies between them and managing their lifecycle. This function is responsible for initializing the complete set of secondary indexes for a conversation.

```python
async def build_secondary_indexes(
    conversation: IConversation,
    conversation_settings: ConversationSettings,
) -> SecondaryIndexingResults:
    # Controls building of all secondary indexes
    # Handles dependencies between indexes
```

--------------------------------

### Python Type Hinting for String Literals

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

Use the 'Literal' type from the 'typing' module for unions of string literals in Python type hints. This provides precise type information for string constants.

```python
from typing import Literal

status: Literal['pending', 'completed', 'failed']
```

--------------------------------

### Route ConversationSecondaryIndexes Through Storage Provider (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

This Python code snippet demonstrates how to modify the ConversationSecondaryIndexes class to leverage a storage provider for retrieving indexes. It uses async properties to lazily load indexes from the provided storage_provider instance, ensuring that indexes are fetched only when needed and are managed centrally.

```python
class ConversationSecondaryIndexes[TMessage: IMessage](IConversationSecondaryIndexes[TMessage]):
    def __init__(self, storage_provider: IStorageProvider[TMessage], conversation_id: str):
        self._storage_provider = storage_provider
        self._conversation_id = conversation_id
        # Initialize all indexes through storage provider
        self._property_index: IPropertyToSemanticRefIndex | None = None
        self._timestamp_index: ITimestampToTextRangeIndex | None = None
        self._related_terms_index: ITermToRelatedTermsIndex | None = None
        self._threads: IConversationThreads | None = None
        self._message_index: IMessageTextIndex[TMessage] | None = None

    @property
    async def property_to_semantic_ref_index(self) -> IPropertyToSemanticRefIndex | None:
        if self._property_index is None:
            self._property_index = await self._storage_provider.get_property_index(self._conversation_id)
        return self._property_index

    @property
    async def timestamp_index(self) -> ITimestampToTextRangeIndex | None:
        if self._timestamp_index is None:
            self._timestamp_index = await self._storage_provider.get_timestamp_index(self._conversation_id)
        return self._timestamp_index

    # ... similar async properties for other indexes ...
```

--------------------------------

### Python Type Hinting for Structured Types

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

For classes that primarily serve as structured data containers (other than interfaces), use the 'dataclass' decorator from the 'dataclasses' module. This simplifies the creation of data-holding classes.

```python
from dataclasses import dataclass

@dataclass
class UserProfile:
    user_id: str
    display_name: str | None = None

```

--------------------------------

### Storage Provider Interface Definition (Python)

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_immediate_implementation.md

Defines the abstract base class `IStorageProvider` for managing different types of indexes within a conversation. It outlines methods for retrieving various index types, such as conversation index, property index, timestamp index, message text index, related terms index, and conversation threads. This interface serves as a contract for concrete storage implementations.

```python
class IStorageProvider[TMessage: IMessage](Protocol):
    # ... existing methods ...

    # Index getters - ALL 6 index types for this conversation
    async def get_conversation_index(self) -> ITermToSemanticRefIndex: ...
    async def get_property_index(self) -> IPropertyToSemanticRefIndex: ...
    async def get_timestamp_index(self) -> ITimestampToTextRangeIndex: ...
    async def get_message_text_index(self) -> IMessageTextIndex[TMessage]: ...
    async def get_related_terms_index(self) -> ITermToRelatedTermsIndex: ...
    async def get_conversation_threads(self) -> IConversationThreads: ...

    # ❌ TODO: Multi-conversation support when needed
    # async def create_indexes_for_conversation(
    #     self, conversation_id: str
    # ) -> None: ...
    # async def drop_indexes_for_conversation(
    #     self, conversation_id: str
    # ) -> None: ...
```

--------------------------------

### Implement Python Interface for Advanced Term Search

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_future_extensions.md

Defines a protocol for advanced term indexing and searching. Features include adding terms with variants, fuzzy searching, semantic similarity searches, and term suggestions.

```python
class IAdvancedTermIndex(Protocol):
    async def add_term_with_variants(
        self, 
        term: str, 
        semref_id: int,
        relevance_score: float = 1.0
    ) -> None: ...
    
    async def search_fuzzy(
        self, 
        query: str, 
        max_distance: int = 2
    ) -> list[tuple[int, float]]: ...  # (semref_id, relevance_score)
    
    async def search_semantic_similar(
        self, 
        term: str, 
        threshold: float = 0.8
    ) -> list[tuple[int, float]]: ...
    
    async def get_term_suggestions(
        self, 
        partial_term: str, 
        limit: int = 10
    ) -> list[str]: ...
```

--------------------------------

### SQL: Create SemanticRefIndex Table

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_spec.md

Defines the schema for the SemanticRefIndex table, used for indexing semantic references. It maps a normalized, lowercased term to a semantic reference ID, enabling efficient searching of semantic information. A composite primary key prevents duplicate entries for the same term and semref_id.

```sql
CREATE TABLE SemanticRefIndex (
    term TEXT NOT NULL,             -- lowercased, not-unique/normalized
    semref_id INTEGER NOT NULL,

    PRIMARY KEY (term, semref_id),
    FOREIGN KEY (semref_id) REFERENCES SemanticRefs(semref_id) ON DELETE CASCADE
);

CREATE INDEX idx_semantic_ref_index_term ON SemanticRefIndex(term);
```

--------------------------------

### Python Type Hinting for Aliased Types

Source: https://github.com/microsoft/typeagent-py/blob/main/AGENTS.md

For type aliases in Python, use the 'type' keyword. Type aliases should follow PascalCase naming conventions, similar to class names, for consistency.

```python
from typing import TypeAlias

UserIdentifier: TypeAlias = str

```

--------------------------------

### Python: Storage Analytics Interface

Source: https://github.com/microsoft/typeagent-py/blob/main/spec/storage_future_extensions.md

Defines a protocol for storage analytics, enabling the recording of query performance and retrieval of usage statistics and performance metrics. It supports flexible querying of historical data.

```python
class IStorageAnalytics(Protocol):
    async def record_query_performance(
        self, 
        query_type: str, 
        duration_ms: int, 
        result_count: int
    ) -> None: ...
    
    async def get_usage_stats(
        self, 
        start_time: str, 
        end_time: str
    ) -> dict[str, Any]: ...
    
    async def get_performance_metrics(
        self, 
        query_type: str | None = None
    ) -> dict[str, Any]: ...
```