# pyseekdb

pyseekdb is a unified Python client library for SeekDB and OceanBase vector databases that provides simple, beginner-friendly APIs for AI applications. It abstracts away SQL complexities by treating vector data operations as key-value operations similar to MongoDB, Elasticsearch, and Milvus. The library follows a schema-free interface design where users manage text documents and their vector embeddings without explicitly defining relational table structures. This SDK is particularly valuable for RAG (Retrieval-Augmented Generation) applications, semantic search systems, and AI-powered knowledge bases.

The library supports three connection modes: embedded mode for local development using pylibseekdb, remote SeekDB server mode for dedicated vector database deployments, and OceanBase server mode for enterprise multi-tenant environments. It features automatic embedding generation through configurable embedding functions, efficient HNSW (Hierarchical Navigable Small World) vector indexing, full-text search combined with semantic search via hybrid search, and comprehensive filtering with metadata and document queries. The design emphasizes ease of use with automatic dimension detection, optional embedding function configuration, and unified CRUD operations across all deployment modes.

## Client Connection - Embedded Mode

Initialize a local embedded SeekDB instance for development and testing.

```python
import pyseekdb

# Create embedded client with explicit path
client = pyseekdb.Client(
    path="./seekdb",
    database="demo"
)

# Execute raw SQL if needed
rows = client.execute("SELECT COUNT(*) FROM information_schema.tables")
print(f"Tables: {rows}")

# Create collection with auto-generated embeddings
collection = client.create_collection("documents")
collection.add(
    ids=["doc1", "doc2"],
    documents=["Python is popular", "Machine learning transforms AI"],
    metadatas=[{"lang": "python"}, {"topic": "ai"}]
)

# Query by semantic similarity
results = collection.query(
    query_texts=["programming languages"],
    n_results=1
)
print(f"Found: {results['documents'][0][0]}")
# Output: Found: Python is popular
```

## Client Connection - Remote SeekDB Server

Connect to a remote SeekDB server for production vector search workloads.

```python
import pyseekdb

# Connect to SeekDB server
client = pyseekdb.Client(
    host="127.0.0.1",
    port=2881,
    database="production",
    user="root",
    password=""  # Uses SEEKDB_PASSWORD env var if empty
)

# Create collection with custom configuration
from pyseekdb import HNSWConfiguration, DefaultEmbeddingFunction

config = HNSWConfiguration(dimension=384, distance='cosine')
ef = DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')
collection = client.create_collection(
    name="knowledge_base",
    configuration=config,
    embedding_function=ef
)

# Batch insert with automatic embedding
docs = [
    "Neural networks mimic brain structure",
    "Vector databases enable semantic search",
    "Python supports machine learning workflows"
]
ids = [f"kb_{i}" for i in range(len(docs))]
collection.add(ids=ids, documents=docs)

# Semantic search with metadata filter
results = collection.query(
    query_texts=["AI and deep learning"],
    where={"$or": [{"topic": "ai"}, {"topic": "ml"}]},
    n_results=2
)
for i, doc_id in enumerate(results['ids'][0]):
    print(f"{i+1}. {results['documents'][0][i]} (distance: {results['distances'][0][i]:.4f})")
```

## Client Connection - OceanBase Multi-Tenant

Connect to OceanBase server for enterprise deployments with tenant isolation.

```python
import pyseekdb
import os

# Set password via environment variable
os.environ["SEEKDB_PASSWORD"] = "secure_password"

# Connect to OceanBase with tenant
client = pyseekdb.Client(
    host="oceanbase.example.com",
    port=2881,
    tenant="analytics",  # Tenant isolation
    database="vectors",
    user="analyst",
    password=""  # Automatically reads from SEEKDB_PASSWORD
)

# Verify connection
collections = client.list_collections()
print(f"Available collections: {len(collections)}")

# Create collection with manual embeddings (no auto-embedding)
collection = client.create_collection(
    name="custom_vectors",
    configuration=HNSWConfiguration(dimension=128, distance='l2'),
    embedding_function=None  # Manual embeddings required
)

# Insert with pre-computed embeddings
import random
embeddings = [[random.random() for _ in range(128)] for _ in range(3)]
collection.add(
    ids=["v1", "v2", "v3"],
    embeddings=embeddings,
    documents=["Doc one", "Doc two", "Doc three"],
    metadatas=[{"idx": 1}, {"idx": 2}, {"idx": 3}]
)

# Vector similarity search with manual query embedding
query_embedding = [random.random() for _ in range(128)]
results = collection.query(
    query_embeddings=query_embedding,
    where={"idx": {"$gte": 2}},
    n_results=2
)
print(f"Matched {len(results['ids'][0])} vectors")
```

## Database Management with AdminClient

Manage databases across embedded, SeekDB, and OceanBase deployments.

```python
import pyseekdb

# Embedded mode - database admin
admin = pyseekdb.AdminClient(path="./seekdb")

# Create and list databases
admin.create_database("analytics")
admin.create_database("staging")
databases = admin.list_databases()
for db in databases:
    print(f"Database: {db.name}, Charset: {db.charset}, Collation: {db.collation}")

# Get database metadata
db_info = admin.get_database("analytics")
print(f"Retrieved: {db_info.name}")

# OceanBase mode - tenant-aware database management
admin_ob = pyseekdb.AdminClient(
    host="127.0.0.1",
    port=2881,
    tenant="analytics",
    user="admin",
    password="secure_pass"
)

# List databases in tenant
dbs = admin_ob.list_databases(tenant="analytics", limit=10, offset=0)
print(f"Tenant 'analytics' has {len(dbs)} databases")

# Delete database
admin.delete_database("staging")

# Connect to specific database for collection operations
client = pyseekdb.Client(path="./seekdb", database="analytics")
collection = client.create_collection("reports")
print(f"Created collection in {client._client.database}")
```

## Collection Creation and Management

Create and configure collections with vector indexes and embedding functions.

```python
import pyseekdb
from pyseekdb import HNSWConfiguration, DefaultEmbeddingFunction

client = pyseekdb.Client(database="vectors")

# Create with default configuration (384-dim, cosine distance)
collection_basic = client.create_collection("basic_docs")
print(f"Default dimension: {collection_basic.dimension}")
# Output: Default dimension: 384

# Create with custom HNSW configuration
config = HNSWConfiguration(dimension=768, distance='inner_product')
collection_custom = client.create_collection(
    name="advanced_docs",
    configuration=config
)

# Create with custom embedding function
from typing import List, Union

class CustomEmbedding:
    @property
    def dimension(self) -> int:
        return 512

    def __call__(self, input: Union[str, List[str]]) -> List[List[float]]:
        import random
        texts = [input] if isinstance(input, str) else input
        return [[random.random() for _ in range(512)] for _ in texts]

ef = CustomEmbedding()
collection_ef = client.create_collection(
    name="custom_embed",
    configuration=HNSWConfiguration(dimension=512, distance='cosine'),
    embedding_function=ef
)

# Get or create (idempotent)
collection = client.get_or_create_collection("idempotent_docs")

# Check existence
if client.has_collection("advanced_docs"):
    col = client.get_collection("advanced_docs", embedding_function=None)
    print(f"Collection {col.name} has {col.dimension} dimensions")

# List all collections
all_collections = client.list_collections()
for c in all_collections:
    print(f"- {c.name}: {c.dimension}D, {c.distance} metric")

# Count collections
total = client.count_collection()
print(f"Total collections: {total}")

# Delete collection
client.delete_collection("basic_docs")
```

## Data Insertion with Add Operation

Insert new documents with automatic or manual embedding generation.

```python
import pyseekdb

client = pyseekdb.Client(database="content")
collection = client.create_collection("articles")

# Add single item with auto-generated embedding
collection.add(
    ids="art_001",
    documents="Python enables rapid AI development",
    metadatas={"category": "tech", "year": 2024, "rating": 4.5}
)

# Add multiple items with auto-embedding
articles = [
    "Machine learning requires quality training data",
    "Vector databases optimize similarity search",
    "Neural networks process complex patterns",
    "Natural language processing understands text"
]
ids = [f"art_{100+i}" for i in range(len(articles))]
metadatas = [
    {"category": "AI", "year": 2023, "rating": 4.8, "tags": ["ml", "data"]},
    {"category": "DB", "year": 2024, "rating": 4.6, "tags": ["vectors", "search"]},
    {"category": "AI", "year": 2023, "rating": 4.7, "tags": ["neural", "dl"]},
    {"category": "NLP", "year": 2024, "rating": 4.9, "tags": ["text", "ai"]}
]
collection.add(ids=ids, documents=articles, metadatas=metadatas)

# Add with pre-computed embeddings (bypasses embedding function)
import random
manual_embeddings = [[random.random() for _ in range(384)] for _ in range(2)]
collection.add(
    ids=["art_200", "art_201"],
    embeddings=manual_embeddings,
    documents=["Custom embed doc 1", "Custom embed doc 2"],
    metadatas=[{"source": "manual"}, {"source": "manual"}]
)

# Add embeddings only (no documents)
vector_only = [[random.random() for _ in range(384)] for _ in range(3)]
collection.add(
    ids=["vec_1", "vec_2", "vec_3"],
    embeddings=vector_only
)

# Verify insertion
count = collection.count()
print(f"Total articles: {count}")
# Output: Total articles: 10
```

## Data Update and Upsert Operations

Modify existing records or insert new ones with flexible update semantics.

```python
import pyseekdb

client = pyseekdb.Client(database="content")
collection = client.get_collection("articles")

# Update metadata only
collection.update(
    ids="art_001",
    metadatas={"category": "tech", "year": 2024, "rating": 5.0, "featured": True}
)

# Update document and embedding (auto-generated)
collection.update(
    ids="art_100",
    documents="Deep learning transforms machine learning applications",
    metadatas={"category": "AI", "updated": True}
)

# Update multiple items with new embeddings
collection.update(
    ids=["art_101", "art_102"],
    documents=["Updated vector database content", "Updated neural network content"],
    metadatas=[{"updated": True, "version": 2}, {"updated": True, "version": 2}]
)

# Upsert existing item (updates if exists)
collection.upsert(
    ids="art_100",
    documents="Machine learning revolutionizes data analysis",
    metadatas={"category": "AI", "year": 2024, "upserted": True}
)

# Upsert new item (inserts if not exists)
collection.upsert(
    ids="art_300",
    documents="Transformers enable state-of-the-art NLP",
    metadatas={"category": "NLP", "year": 2024, "new": True}
)

# Batch upsert (mix of existing and new)
import random
upsert_ids = ["art_101", "art_400", "art_401"]  # art_101 exists, others new
upsert_docs = [
    "Updated: Vectors power semantic search",
    "New: Attention mechanisms improve models",
    "New: BERT revolutionized NLP tasks"
]
embeddings = [[random.random() for _ in range(384)] for _ in range(3)]
collection.upsert(
    ids=upsert_ids,
    embeddings=embeddings,
    documents=upsert_docs,
    metadatas=[{"op": "upsert"} for _ in range(3)]
)

# Verify updates
result = collection.get(ids="art_300")
print(f"Upserted doc: {result['documents'][0]}")
```

## Data Deletion with Filters

Remove documents by ID, metadata filters, or document content filters.

```python
import pyseekdb

client = pyseekdb.Client(database="content")
collection = client.get_collection("articles")

# Delete by single ID
collection.delete(ids="art_300")

# Delete by multiple IDs
collection.delete(ids=["vec_1", "vec_2", "vec_3"])

# Delete by metadata filter (equality)
collection.delete(where={"source": {"$eq": "manual"}})

# Delete by comparison operator
collection.delete(where={"rating": {"$lt": 4.5}})

# Delete by $in operator
collection.delete(where={"category": {"$in": ["deprecated", "archived"]}})

# Delete by logical OR
collection.delete(
    where={
        "$or": [
            {"year": {"$lt": 2020}},
            {"rating": {"$lt": 3.0}}
        ]
    }
)

# Delete by document content filter
collection.delete(where_document={"$contains": "obsolete"})

# Delete with combined filters
collection.delete(
    where={"category": {"$eq": "tech"}, "year": {"$lt": 2023}},
    where_document={"$contains": "deprecated"}
)

# Delete all low-rated AI articles from 2023
collection.delete(
    where={
        "$and": [
            {"category": "AI"},
            {"year": {"$eq": 2023}},
            {"rating": {"$lte": 4.0}}
        ]
    }
)

# Verify remaining count
remaining = collection.count()
print(f"Remaining articles: {remaining}")
```

## Vector Similarity Search with Query

Perform semantic search using vector embeddings with metadata and document filters.

```python
import pyseekdb

client = pyseekdb.Client(database="knowledge")
collection = client.get_collection("documents")

# Basic semantic search with query text
results = collection.query(
    query_texts="artificial intelligence and deep learning",
    n_results=5
)
for i in range(len(results['ids'][0])):
    doc_id = results['ids'][0][i]
    distance = results['distances'][0][i]
    document = results['documents'][0][i]
    print(f"{i+1}. [{doc_id}] {document} (distance: {distance:.4f})")

# Query with manual embedding vector
import random
query_vector = [random.random() for _ in range(384)]
results = collection.query(
    query_embeddings=query_vector,
    n_results=3
)

# Batch query with multiple texts
results = collection.query(
    query_texts=["machine learning", "natural language processing", "computer vision"],
    n_results=2
)
# results['ids'][0] = top 2 for "machine learning"
# results['ids'][1] = top 2 for "natural language processing"
# results['ids'][2] = top 2 for "computer vision"
for query_idx, query_ids in enumerate(results['ids']):
    print(f"Query {query_idx+1}: {len(query_ids)} results")

# Query with metadata filter (simplified equality)
results = collection.query(
    query_texts="python programming",
    where={"category": "tech"},
    n_results=5
)

# Query with comparison operators
results = collection.query(
    query_texts="advanced AI techniques",
    where={"year": {"$gte": 2023}, "rating": {"$gte": 4.5}},
    n_results=3
)

# Query with $in operator
results = collection.query(
    query_texts="data science tools",
    where={"tags": {"$in": ["ml", "data", "analytics"]}},
    n_results=5
)

# Query with logical OR
results = collection.query(
    query_texts="neural networks",
    where={
        "$or": [
            {"category": "AI"},
            {"category": "ML"}
        ]
    },
    n_results=5
)

# Query with document content filter
results = collection.query(
    query_texts="machine learning",
    where_document={"$contains": "neural network"},
    n_results=3
)

# Query with combined filters
results = collection.query(
    query_texts="AI research",
    where={"category": "AI", "year": {"$gte": 2024}},
    where_document={"$contains": "transformer"},
    include=["documents", "metadatas", "embeddings"],
    n_results=5
)
for i in range(len(results['ids'][0])):
    print(f"ID: {results['ids'][0][i]}")
    print(f"Document: {results['documents'][0][i]}")
    print(f"Metadata: {results['metadatas'][0][i]}")
    print(f"Embedding dim: {len(results['embeddings'][0][i])}")
    print(f"Distance: {results['distances'][0][i]:.4f}\n")
```

## Data Retrieval with Get Operation

Retrieve documents by ID or filters without vector similarity ranking.

```python
import pyseekdb

client = pyseekdb.Client(database="knowledge")
collection = client.get_collection("documents")

# Get single document by ID
result = collection.get(ids="art_001")
print(f"Document: {result['documents'][0]}")
print(f"Metadata: {result['metadatas'][0]}")

# Get multiple documents by IDs
result = collection.get(ids=["art_001", "art_100", "art_101"])
for i, doc_id in enumerate(result['ids']):
    print(f"{i+1}. [{doc_id}] {result['documents'][i]}")

# Get by metadata filter (simplified equality)
result = collection.get(
    where={"category": "AI"},
    limit=10
)
print(f"Found {len(result['ids'])} AI documents")

# Get by comparison operators
result = collection.get(
    where={"rating": {"$gte": 4.5}, "year": {"$eq": 2024}},
    limit=5
)

# Get by $in operator
result = collection.get(
    where={"category": {"$in": ["AI", "ML", "NLP"]}},
    limit=20
)

# Get by logical OR
result = collection.get(
    where={
        "$or": [
            {"category": "AI"},
            {"rating": {"$gte": 4.8}}
        ]
    },
    limit=15
)

# Get by document content filter
result = collection.get(
    where_document={"$contains": "machine learning"},
    limit=10
)

# Get with pagination
page_1 = collection.get(limit=10, offset=0)
page_2 = collection.get(limit=10, offset=10)
page_3 = collection.get(limit=10, offset=20)
print(f"Page 1: {len(page_1['ids'])} items")
print(f"Page 2: {len(page_2['ids'])} items")

# Get with combined filters
result = collection.get(
    where={"category": "AI", "year": {"$gte": 2023}},
    where_document={"$contains": "neural"},
    include=["documents", "metadatas", "embeddings"],
    limit=5
)

# Get all documents (up to limit)
all_docs = collection.get(limit=1000)
print(f"Total documents retrieved: {len(all_docs['ids'])}")

# Get specific fields only
result = collection.get(
    ids=["art_100", "art_101"],
    include=["documents", "metadatas"]  # Excludes embeddings
)
print(f"Has embeddings: {'embeddings' in result}")
# Output: Has embeddings: False
```

## Hybrid Search - Full-Text + Vector Fusion

Combine full-text search and vector similarity with intelligent result ranking.

```python
import pyseekdb

client = pyseekdb.Client(database="knowledge")
collection = client.get_collection("documents")

# Basic hybrid search: full-text keyword + semantic similarity
results = collection.hybrid_search(
    query={
        "where_document": {"$contains": "machine learning"},
        "n_results": 10
    },
    knn={
        "query_texts": ["artificial intelligence research"],
        "n_results": 10
    },
    rank={"rrf": {}},  # Reciprocal Rank Fusion
    n_results=5
)
print("Top 5 hybrid results:")
for i, doc_id in enumerate(results['ids'][0]):
    print(f"{i+1}. [{doc_id}] {results['documents'][0][i]}")

# Hybrid search with independent filters
results = collection.hybrid_search(
    query={
        "where_document": {"$contains": "neural network"},
        "where": {"year": {"$eq": 2024}},  # Filter for full-text search
        "n_results": 10
    },
    knn={
        "query_texts": ["deep learning applications"],
        "where": {"rating": {"$gte": 4.5}},  # Different filter for vector search
        "n_results": 10
    },
    rank={"rrf": {"rank_window_size": 60, "rank_constant": 60}},
    n_results=5,
    include=["documents", "metadatas", "distances"]
)

# Hybrid search with batch queries
results = collection.hybrid_search(
    query={
        "where_document": {"$contains": "AI"},
        "n_results": 10
    },
    knn={
        "query_texts": ["transformers", "computer vision", "reinforcement learning"],
        "n_results": 10
    },
    rank={"rrf": {}},
    n_results=3
)
# Returns combined results from all queries

# Full-text only hybrid search (no vector component)
results = collection.hybrid_search(
    query={
        "where_document": {"$contains": "Python programming"},
        "where": {"category": "tech"},
        "n_results": 10
    },
    rank={"rrf": {}},
    n_results=5
)

# Vector only hybrid search (no full-text component)
import random
query_embedding = [random.random() for _ in range(384)]
results = collection.hybrid_search(
    knn={
        "query_embeddings": [query_embedding],
        "where": {"year": {"$gte": 2023}},
        "n_results": 10
    },
    rank={"rrf": {}},
    n_results=5
)

# Complex multi-criteria hybrid search
results = collection.hybrid_search(
    query={
        "where_document": {
            "$or": [
                {"$contains": "machine learning"},
                {"$contains": "deep learning"}
            ]
        },
        "where": {"category": "AI"},
        "n_results": 15
    },
    knn={
        "query_texts": ["neural network architectures"],
        "where": {
            "$and": [
                {"year": {"$gte": 2023}},
                {"rating": {"$gte": 4.0}}
            ]
        },
        "n_results": 15
    },
    rank={"rrf": {"rank_window_size": 100, "rank_constant": 60}},
    n_results=10,
    include=["documents", "metadatas", "embeddings", "distances"]
)
for i in range(len(results['ids'][0])):
    print(f"\nRank {i+1}:")
    print(f"  ID: {results['ids'][0][i]}")
    print(f"  Doc: {results['documents'][0][i]}")
    print(f"  Meta: {results['metadatas'][0][i]}")
    print(f"  Distance: {results['distances'][0][i]:.4f}")
```

## Custom Embedding Functions

Implement custom embedding functions for domain-specific vector generation.

```python
import pyseekdb
from typing import List, Union

# Example 1: Sentence-Transformers Custom Embedding
class SentenceTransformerEmbedding:
    def __init__(self, model_name: str = "all-MiniLM-L6-v2", device: str = "cpu"):
        self.model_name = model_name
        self.device = device
        self._model = None
        self._dimension = None

    def _ensure_model_loaded(self):
        if self._model is None:
            from sentence_transformers import SentenceTransformer
            self._model = SentenceTransformer(self.model_name, device=self.device)
            test_embedding = self._model.encode(["test"], convert_to_numpy=True)
            self._dimension = len(test_embedding[0])

    @property
    def dimension(self) -> int:
        self._ensure_model_loaded()
        return self._dimension

    def __call__(self, input: Union[str, List[str]]) -> List[List[float]]:
        self._ensure_model_loaded()
        if isinstance(input, str):
            input = [input]
        if not input:
            return []
        embeddings = self._model.encode(input, convert_to_numpy=True, show_progress_bar=False)
        return [embedding.tolist() for embedding in embeddings]

# Use custom embedding function
ef = SentenceTransformerEmbedding(model_name='all-mpnet-base-v2', device='cpu')
client = pyseekdb.Client(database="custom")
collection = client.create_collection(
    name="research_papers",
    configuration=pyseekdb.HNSWConfiguration(dimension=ef.dimension, distance='cosine'),
    embedding_function=ef
)

# Add documents (automatically embedded with custom function)
collection.add(
    ids=["paper_1", "paper_2"],
    documents=[
        "Attention mechanisms improve neural machine translation",
        "Convolutional neural networks excel at image classification"
    ],
    metadatas=[{"field": "NLP"}, {"field": "CV"}]
)

# Example 2: OpenAI API Embedding Function
import os
import openai

class OpenAIEmbedding:
    def __init__(self, model_name: str = "text-embedding-ada-002", api_key: str = None):
        self.model_name = model_name
        self.api_key = api_key or os.environ.get('OPENAI_API_KEY')
        if not self.api_key:
            raise ValueError("OpenAI API key required")
        self._dimension = 1536 if "ada-002" in model_name else None

    @property
    def dimension(self) -> int:
        if self._dimension is None:
            raise ValueError("Dimension not set for this model")
        return self._dimension

    def __call__(self, input: Union[str, List[str]]) -> List[List[float]]:
        if isinstance(input, str):
            input = [input]
        if not input:
            return []
        response = openai.Embedding.create(
            model=self.model_name,
            input=input,
            api_key=self.api_key
        )
        return [item['embedding'] for item in response['data']]

# Use OpenAI embedding
ef_openai = OpenAIEmbedding(model_name='text-embedding-ada-002')
collection_openai = client.create_collection(
    name="openai_docs",
    configuration=pyseekdb.HNSWConfiguration(dimension=1536, distance='cosine'),
    embedding_function=ef_openai
)

# Query with custom embedding function
results = collection.query(
    query_texts=["machine learning models"],
    n_results=5
)
print(f"Found {len(results['ids'][0])} relevant papers")
```

## Collection Information and Inspection

Access collection metadata, preview data, and inspect collection properties.

```python
import pyseekdb

client = pyseekdb.Client(database="analytics")
collection = client.get_collection("documents")

# Get item count
count = collection.count()
print(f"Collection contains {count} documents")

# Get collection properties
print(f"Name: {collection.name}")
print(f"ID: {collection.id}")
print(f"Dimension: {collection.dimension}")
print(f"Distance metric: {collection.distance}")
print(f"Has embedding function: {collection.embedding_function is not None}")
print(f"Metadata: {collection.metadata}")

# Peek at first few items (returns all fields by default)
preview = collection.peek(limit=3)
for i in range(len(preview['ids'])):
    print(f"\nItem {i+1}:")
    print(f"  ID: {preview['ids'][i]}")
    print(f"  Document: {preview['documents'][i]}")
    print(f"  Metadata: {preview['metadatas'][i]}")
    print(f"  Embedding: {preview['embeddings'][i][:5]}... (dim={len(preview['embeddings'][i])})")

# Get detailed collection information
info = collection.describe()
print(f"\nCollection Info:")
print(f"  Name: {info['name']}")
print(f"  Dimension: {info['dimension']}")
print(f"  Count: {info.get('count', 'N/A')}")

# Count collections in database
total_collections = client.count_collection()
print(f"\nDatabase has {total_collections} collections")

# List all collections with details
all_collections = client.list_collections()
print("\nAll collections:")
for col in all_collections:
    print(f"  - {col.name}: {col.dimension}D, {col.distance} distance")
    if col.embedding_function:
        print(f"    Embedding: {col.embedding_function}")

# Check if collection exists before operations
if client.has_collection("documents"):
    col = client.get_collection("documents")
    data = col.get(limit=5)
    print(f"\nFound collection with {len(data['ids'])} sample items")
else:
    print("\nCollection does not exist")

# Get collection client reference
print(f"\nClient mode: {collection.client.mode}")
print(f"Client database: {collection.client.database}")
```

pyseekdb provides a production-ready vector database client that simplifies AI application development through intuitive APIs and flexible deployment options. The library is ideal for building RAG systems where documents need to be semantically searchable, knowledge bases that combine keyword and semantic search, recommendation engines powered by vector similarity, and document classification systems using embedding-based retrieval. Its automatic embedding generation reduces boilerplate code while maintaining the flexibility to use custom embedding models for specialized domains.

The unified interface across embedded, SeekDB server, and OceanBase deployments enables seamless migration from development to production without code changes. Whether prototyping locally with embedded mode, deploying to dedicated vector databases with SeekDB server, or integrating with enterprise OceanBase clusters for multi-tenant isolation, pyseekdb provides consistent APIs with comprehensive error handling. The library's hybrid search capabilities combine traditional full-text search with semantic vector search, making it particularly effective for complex information retrieval scenarios where both keyword matching and conceptual similarity matter.