### Multi-column List Partition Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Example of creating a multi-column list partition based on 'country' and 'status'. ```python partition = ObListPartition( is_list_columns=True, col_name_list=["country", "status"], list_part_infos=[ RangeListPartInfo("p_us_active", ["US", "active"]), RangeListPartInfo("p_us_inactive", ["US", "inactive"]), RangeListPartInfo("p_eu_active", ["EU", "active"]), RangeListPartInfo("p_default", ["DEFAULT", "DEFAULT"]) ] ) ``` -------------------------------- ### Key Subpartitioning Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Demonstrates adding a subpartition strategy to an existing Key partition. ```python partition = ObKeyPartition( col_name_list=["user_id"], part_count=4 ) partition.add_subpartition( ObSubKeyPartition( col_name_list=["region"], part_count=2 # 4 * 2 = 8 total partitions ) ) ``` -------------------------------- ### RangeListPartInfo Examples Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Illustrates creating RangeListPartInfo objects with integer and string bounds. ```python @dataclass class RangeListPartInfo: part_name: str # Partition name part_upper_bound_expr: list | str | int # Upper bound value(s) # Integer bound RangeListPartInfo("p0", 100) # String bound RangeListPartInfo("p2024", "MAXVALUE") ``` -------------------------------- ### Single-column List Partition Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Example of creating a single-column list partition based on a 'region' column. ```python from pyobvector import ObListPartition, RangeListPartInfo partition = ObListPartition( is_list_columns=False, list_expr="region", list_part_infos=[ RangeListPartInfo("p_us", ["US", "CA", "MX"]), RangeListPartInfo("p_eu", ["UK", "DE", "FR", "IT"]), RangeListPartInfo("p_asia", ["CN", "JP", "IN"]), RangeListPartInfo("p_default", "DEFAULT") ] ) ``` -------------------------------- ### Key Partition by Multiple Columns Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Example of creating a Key partition based on 'user_id' and 'date'. ```python from pyobvector import ObKeyPartition partition = ObKeyPartition( col_name_list=["user_id", "date"], part_count=4 # 4 partitions based on hash of (user_id, date) ) ``` -------------------------------- ### List Subpartitioning Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Demonstrates adding a subpartition strategy to an existing List partition. ```python partition = ObListPartition( is_list_columns=False, list_expr="region", list_part_infos=[ RangeListPartInfo("p_us", ["US", "CA"]), RangeListPartInfo("p_eu", ["UK", "DE"]) ] ) partition.add_subpartition( ObSubListPartition( is_list_columns=False, list_expr="status", list_part_infos=[ RangeListPartInfo("p_active", ["active"]), RangeListPartInfo("p_inactive", ["inactive"]) ] ) ) ``` -------------------------------- ### Hash Subpartitioning Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Demonstrates adding a subpartition strategy to an existing Hash partition. ```python partition = ObHashPartition( hash_expr="user_id", part_count=4 # 4 main partitions ) partition.add_subpartition( ObSubHashPartition( hash_expr="timestamp", part_count=3 # 3 subpartitions per main = 12 total ) ) ``` -------------------------------- ### Setup MilvusLikeClient Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Initialize a MilvusLikeClient for interacting with OceanBase Vector Store in a Milvus-compatible mode. Requires specifying the connection URI and user. ```python from pyobvector import * client = MilvusLikeClient(uri="127.0.0.1:2881", user="test@test") ``` -------------------------------- ### Install pyobvector with uv Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Install the pyobvector package using uv for dependency management. ```shell uv sync ``` -------------------------------- ### Complete Hybrid Search Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/hybridsearch.md Demonstrates a full hybrid search implementation in Python using pyobvector. Includes table creation with vector and text columns, full-text index setup, data insertion, and executing a hybrid search query combining text and vector search. ```python from pyobvector import HybridSearch, FtsIndexParam, FtsParser, VectorIndex from sqlalchemy import Column, Integer, String, VARCHAR # Setup client = HybridSearch(uri="127.0.0.1:2881", user="root@test") # Create table with vector and text columns client.create_table( table_name="articles", columns=[ Column("id", Integer, primary_key=True), Column("title", VARCHAR(255)), Column("content", VARCHAR(1024)), Column("embedding", VECTOR(128)) ], indexes=[ VectorIndex("vec_idx", "embedding", params="distance=l2, type=hnsw, lib=vsag") ] ) # Create full-text indexes for col in ["title", "content"]: client.create_fts_idx_with_fts_index_param( table_name="articles", fts_idx_param=FtsIndexParam( index_name=f"fts_{col}", field_names=[col], parser_type=FtsParser.IK ) ) # Insert data client.insert("articles", [ { "id": 1, "title": "Machine Learning Basics", "content": "Introduction to ML algorithms", "embedding": [0.1, 0.2, ...] } ]) # Hybrid search results = client.search( index="articles", body={ "query": { "query_string": { "fields": ["title^2", "content"], "query": "machine learning" } }, "knn": { "field": "embedding", "query_vector": [0.1, 0.2, ...], "k": 10, "similarity": 0.7 }, "size": 20 } ) print(f"Found {len(results)} articles") ``` -------------------------------- ### Hash Partition with Partition Count Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Example of creating a Hash partition by specifying the desired number of partitions. ```python partition = ObHashPartition( hash_expr="user_id", part_count=8 # Creates 8 partitions automatically ) ``` -------------------------------- ### Hash Partition with Explicit Names Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Example of creating a Hash partition with explicitly defined partition names. ```python from pyobvector import ObHashPartition partition = ObHashPartition( hash_expr="id", hash_part_name_list=["p0", "p1", "p2", "p3"] # 4 partitions ) ``` -------------------------------- ### Install pyobvector with embedded SeekDB support Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Install pyobvector with optional dependencies for embedded SeekDB support, enabling local SeekDB usage without a server. ```shell pip install pyobvector[pyseekdb] ``` -------------------------------- ### Single-Column Range Partition Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Demonstrates creating a single-column range partition strategy using ObRangePartition. Suitable for partitioning based on a single column's value ranges. ```python from pyobvector import ObRangePartition, RangeListPartInfo partition = ObRangePartition( is_range_columns=False, range_expr="id", range_part_infos=[ RangeListPartInfo("p0", 100), # 0 <= id < 100 RangeListPartInfo("p1", 1000), # 100 <= id < 1000 RangeListPartInfo("p2", "MAXVALUE") # 1000 <= id (unlimited) ] ) ``` -------------------------------- ### Range Subpartition Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Shows how to define a main range partition (by year) and add a subpartition strategy (by month) using ObSubRangePartition. This is useful for hierarchical data partitioning. ```python from pyobvector import ObRangePartition, ObSubRangePartition, RangeListPartInfo # Main partition by year, subpartition by month partition = ObRangePartition( is_range_columns=False, range_expr="year", range_part_infos=[ RangeListPartInfo("p2023", 2024), RangeListPartInfo("p2024", 2025) ] ) partition.add_subpartition( ObSubRangePartition( is_range_columns=False, range_expr="month", range_part_infos=[ RangeListPartInfo("p_q1", 4), RangeListPartInfo("p_q2", 7), RangeListPartInfo("p_q3", 10), RangeListPartInfo("p_q4", 13) ] ) ) ``` -------------------------------- ### Setup ObVecClient for SQLAlchemy Hybrid Mode Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Initialize an ObVecClient for using vector storage functions with SQLAlchemy. Requires specifying the connection URI and user. ```python from pyobvector import * from sqlalchemy import Column, Integer, JSON from sqlalchemy import func client = ObVecClient(uri="127.0.0.1:2881", user="test@test") ``` -------------------------------- ### Multi-Column Range Partition Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Illustrates creating a multi-column range partition strategy using ObRangePartition with `is_range_columns=True`. This is used for partitioning based on ranges across multiple columns. ```python partition = ObRangePartition( is_range_columns=True, col_name_list=["year", "month"], range_part_infos=[ RangeListPartInfo("p202301", ["2023", "01"]), RangeListPartInfo("p202302", ["2023", "02"]), RangeListPartInfo("p202401", ["2024", "01"]), RangeListPartInfo("pmax", ["MAXVALUE", "MAXVALUE"]) ] ) ``` -------------------------------- ### Querying Specific Partitions Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Demonstrates how to retrieve data by specifying partition names in the `get()` operation. This allows for targeted data retrieval, improving query performance. ```python # Specify partition in get() result = client.get( table_name="events", ids=[1, 2, 3], partition_names=["p202401"] # Only search in Jan 2024 partition ) ``` -------------------------------- ### Install pyobvector with pip Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Install a specific version of the pyobvector package using pip. ```shell pip install pyobvector==0.2.26 ``` -------------------------------- ### Keyword Search Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/hybridsearch.md Performs a full-text search using the `query_string` syntax. Useful for text-based retrieval where vector similarity is not required. ```python # Keyword search only query = { "query": { "query_string": { "fields": ["title^10", "content"], "query": "machine learning", "minimum_should_match": "50%" } } } results = client.search(index="documents", body=query) ``` -------------------------------- ### Handling PartitionFieldException for Range Partition Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Example demonstrating how to catch a PartitionFieldException when a 'range_expr' is missing for a non-columns range partition. ```python try: from pyobvector import ObRangePartition # Missing range_expr for non-columns range partition partition = ObRangePartition( is_range_columns=False, range_part_infos=[...], # range_expr missing! ) except PartitionFieldException as e: print(f"Error {e.code}: {e.message}") ``` -------------------------------- ### Configure SeekdbRemoteClient with Environment Variables Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/configuration.md Connect to a remote SeekDB instance using environment variables for authentication. This example shows how to set the password and then instantiate the client. ```python import os os.environ["SEEKDB_PASSWORD"] = "mypassword" client = SeekdbRemoteClient( uri="127.0.0.1:2881", user="root@test" # Password loaded from environment ) ``` -------------------------------- ### Configure Full-Text Search Index Parameters Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/types.md Example of creating FtsIndexParam with a specified parser type, such as IK for Chinese text, and associating it with relevant fields. ```python from pyobvector import FtsIndexParam, FtsParser fts_param = FtsIndexParam( index_name="fts_text", field_names=["title", "content"], parser_type=FtsParser.IK ) client.create_fts_idx_with_fts_index_param("table", fts_param) ``` -------------------------------- ### Complete Example: Embedded SeekDB with Table Operations Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/seekdbremoteclient.md Demonstrates a full workflow including connecting to an embedded SeekDB, creating a table with a vector index, inserting data, performing an ANN search, and cleaning up. Requires `pyobvector` and `sqlalchemy`. ```python from pyobvector import SeekdbRemoteClient, VECTOR, l2_distance from sqlalchemy import Column, Integer, String, VARCHAR # Connect to embedded SeekDB client = SeekdbRemoteClient(path="./data/seekdb", database="test") # Create table with vector index from pyobvector import VectorIndex client.drop_table_if_exist("documents") client.create_table( table_name="documents", columns=[ Column("id", Integer, primary_key=True), Column("title", VARCHAR(255)), Column("embedding", VECTOR(128)) ], indexes=[ VectorIndex("vec_idx", "embedding", params="distance=l2, type=hnsw, lib=vsag") ] ) # Insert data client.insert("documents", [ {"id": 1, "title": "Document 1", "embedding": [0.1, 0.2, ...]}, {"id": 2, "title": "Document 2", "embedding": [0.3, 0.4, ...]}) ] ) # Perform ANN search results = client.ann_search( table_name="documents", vec_data=[0.1, 0.2, ...], vec_column_name="embedding", distance_func=l2_distance, topk=5, output_column_names=["id", "title"] ) rows = results.fetchall() for row in rows: print(f"ID: {row[0]}, Title: {row[1]}") # Cleanup client.drop_table_if_exist("documents") ``` -------------------------------- ### get Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Queries rows from a table by primary key or conditions, with options to specify output columns, partitions, and result limits. ```APIDOC ## get ### Description Query rows from a table by primary key or conditions. ### Method Not specified (assumed to be a client method call) ### Endpoint Not applicable (SDK method) ### Parameters * **table_name** (str) - Required - Table name * **ids** (list | str | int | None) - Optional - Primary key values to retrieve * **where_clause** (list | None) - Optional - SQLAlchemy filter conditions * **output_column_name** (list[str] | None) - Optional - Column names to return * **partition_names** (list[str] | None) - Optional - Partitions to query * **n_limits** (int | None) - Optional - Limit number of results ### Returns SQLAlchemy Result object (call `.fetchall()` for list of tuples) ### Example ```python result = client.get('users', ids=[1, 2]) rows = result.fetchall() result = client.get('users', output_column_name=['id', 'text'], n_limits=10) rows = result.fetchall() ``` ``` -------------------------------- ### Handling PrimaryKeyException for Invalid Data Type Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Example showing how to catch a PrimaryKeyException when attempting to use an unsupported data type (FLOAT_VECTOR) as a primary key. ```python try: field = FieldSchema( name="embedding", dtype=DataType.FLOAT_VECTOR, is_primary=True, # Cannot use FLOAT_VECTOR as primary key! dim=128 ) except PrimaryKeyException as e: print(f"Invalid primary key: {e.message}") ``` -------------------------------- ### Embedded SeekDB Connection using Existing Client Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/configuration.md Integrate an existing pyseekdb.Client instance with ObVecClient for embedded SeekDB operations. Ensure pyseekdb is installed. ```python import pyseekdb seekdb = pyseekdb.Client(path="./data/seekdb", database="test") client = ObVecClient(pyseekdb_client=seekdb) ``` -------------------------------- ### Create Schema with Specified Data Types Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/types.md Example demonstrating how to use the DataType enum to define fields in a schema, including primary keys, vector embeddings, and variable-length strings. ```python from pyobvector import DataType, MilvusLikeClient client = MilvusLikeClient() schema = client.create_schema() schema.add_field("id", DataType.INT64, is_primary=True) schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=128) schema.add_field("text", DataType.VARCHAR, max_length=255) ``` -------------------------------- ### Vector Similarity Search Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/hybridsearch.md Performs a vector similarity search using the `knn` parameter. Ideal for finding documents with similar vector embeddings. ```python # Vector similarity search only query = { "knn": { "field": "embedding", "k": 5, "num_candidates": 20, "query_vector": [0.1, 0.2, 0.3] } } results = client.search(index="documents", body=query) ``` -------------------------------- ### Post-Filtering ANN Search Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obvecclient.md Executes an ANN search with post-filtering enabled, returning a specified number of nearest neighbors. This method is suitable when pre-filtering is not required or desired. ```python result = client.post_ann_search( table_name='documents', vec_data=[0.1, 0.2, 0.3], vec_column_name='embedding', distance_func=l2_distance, topk=5 ) ``` -------------------------------- ### Configure Vector Index Parameters Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/types.md Use IndexParams and VecIndexType to configure and add vector indexes to a specified field. This example shows setting up an HNSW index with L2 metric. ```python from pyobvector import IndexParams, VecIndexType idx_params = IndexParams() idx_params.add_index( field_name="embedding", index_type=VecIndexType.HNSW, index_name="vec_idx", metric_type="l2", params={"M": 16, "efConstruction": 256} ) ``` -------------------------------- ### Build Documentation Locally Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Build the project documentation locally using Sphinx. This involves creating a build directory and running the make html command. ```shell mkdir build make html ``` -------------------------------- ### Set Up Collection Schema and Index Parameters Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/configuration.md Define a collection schema with fields and partitioning, then configure index parameters for vector search. This prepares the database for storing and querying vector data. ```python from pyobvector import CollectionSchema, FieldSchema, DataType, ObRangePartition, RangeListPartInfo # Create schema schema = CollectionSchema( description="Documents with embeddings" ) # Add fields schema.add_field("id", DataType.INT64, is_primary=True) schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=256) schema.add_field("text", DataType.VARCHAR, max_length=2048, nullable=True) schema.add_field("metadata", DataType.JSON, nullable=True) schema.add_field("tags", DataType.ARRAY, element_type=DataType.VARCHAR) # Add partition partition = ObRangePartition( is_range_columns=False, range_expr="id", range_part_infos=[ RangeListPartInfo("p0", 1000000), RangeListPartInfo("p1", "MAXVALUE") ] ) schema.partitions = partition # Use schema in collection creation from pyobvector import IndexParams, VecIndexType idx_params = IndexParams() idx_params.add_index( field_name="embedding", index_type=VecIndexType.HNSW, index_name="vec_idx", metric_type="cosine" ) client.create_collection( collection_name="documents", schema=schema, index_params=idx_params ) ``` -------------------------------- ### Initialize ObVecClient Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Instantiate ObVecClient for direct interaction with vector data. Ensure the path and database name are correctly specified. ```python from pyobvector import ObVecClient client = ObVecClient(path="./seekdb_data", db_name="test") assert isinstance(client, ObVecClient) assert isinstance(client, ObClient) ``` -------------------------------- ### Handling VectorFieldParamException for Missing Dimension Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Example demonstrating how to catch a VectorFieldParamException when the 'dim' parameter is missing for a FLOAT_VECTOR field. ```python try: schema = client.create_schema() schema.add_field( field_name="embedding", datatype=DataType.FLOAT_VECTOR, # dim missing! ) except VectorFieldParamException as e: print(f"Vector field error: {e.message}") ``` -------------------------------- ### Create SQLAlchemy Engine Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Demonstrates how to create a synchronous SQLAlchemy engine for connecting to OceanBase. ```python import pyobvector from sqlalchemy.dialects import registry from sqlalchemy import create_engine uri: str = "127.0.0.1:2881" user: str = "root@test" password: str = "" db_name: str = "test" registry.register("mysql.oceanbase", "pyobvector.schema.dialect", "OceanBaseDialect") connection_str = ( f"mysql+oceanbase://{user}:{password}@{uri}/{db_name}?charset=utf8mb4" ) engine = create_engine(connection_str, **kwargs) ``` -------------------------------- ### Get Collection Statistics Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/milvuslikeclient.md Retrieves statistics for a given collection, specifically the row count. This is useful for monitoring collection size. ```python stats = client.get_collection_stats("documents") print(f"Collection has {stats['row_count']} rows") ``` -------------------------------- ### ObKeyPartition Constructor Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Defines a Key partition strategy. Use for partitioning based on multiple columns. ```python class ObKeyPartition(ObPartition): def __init__( self, col_name_list: list[str], key_part_name_list: list[str] | None = None, part_count: int | None = None, ) ``` -------------------------------- ### Get Rows by IDs Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Retrieve rows from a table using their primary key values. This method is efficient for fetching specific records. ```python result = client.get('users', ids=[1, 2]) rows = result.fetchall() ``` -------------------------------- ### Get HNSW EF Search Parameter Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obvecclient.md Retrieves the current value of the `ob_hnsw_ef_search` system variable, which controls HNSW search performance. ```python ef_search = client.get_ob_hnsw_ef_search() print(f"EF search: {ef_search}") ``` -------------------------------- ### Create Async SQLAlchemy Engine Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Demonstrates how to create an asynchronous SQLAlchemy engine for connecting to OceanBase. ```python import pyobvector from sqlalchemy.dialects import registry from sqlalchemy.ext.asyncio import create_async_engine uri: str = "127.0.0.1:2881" user: str = "root@test" password: str = "" db_name: str = "test" registry.register("mysql.aoceanbase", "pyobvector", "AsyncOceanBaseDialect") connection_str = ( f"mysql+aoceanbase://{user}:{password}@{uri}/{db_name}?charset=utf8mb4" ) engine = create_async_engine(connection_str) ``` -------------------------------- ### MilvusLikeClient Constructor Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/milvuslikeclient.md Initializes the MilvusLikeClient. Accepts connection details for the OceanBase server and optional engine parameters. ```python class MilvusLikeClient( uri: str = "127.0.0.1:2881", user: str = "root@test", password: str = "", db_name: str = "test", **kwargs, ) ``` -------------------------------- ### Initialize HybridSearch Client Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Set up a HybridSearch client for performing combined full-text and vector searches. Requires OceanBase version >= 4.4.1.0 or SeekDB. ```python from pyobvector import * from pyobvector.client.hybrid_search import HybridSearch from sqlalchemy import Column, Integer, VARCHAR client = HybridSearch(uri="127.0.0.1:2881", user="test@test") ``` -------------------------------- ### Handle SQL-Level Errors (Access Denied) Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Demonstrates handling 'Access denied' errors due to invalid credentials. Verify the user and password. ```python # Invalid credentials # Check user/password ``` -------------------------------- ### Handle SQL-Level Errors (Connection Refused) Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Demonstrates handling 'Connection refused' errors. Ensure the URI, host, and port are correct and network connectivity is established. ```python # Cannot connect to server # Check URI, host, port, network connectivity ``` -------------------------------- ### Pagination Control Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/hybridsearch.md Control the offset and limit for paginating search results. 'from' specifies the starting offset, and 'size' determines the number of results to return. ```python { "from": 20, # Offset "size": 10 # Limit } ``` -------------------------------- ### Import Core Client Classes Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/README.md Import essential client classes for interacting with OceanBase vector store functionalities. ```python from pyobvector import ( ObVecClient, # Vector store client MilvusLikeClient, # Milvus-compatible HybridSearch, # Keyword + vector search SeekdbRemoteClient # Factory for embedded/remote ) ``` -------------------------------- ### Handle CollectionStatusException Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Raised when a collection does not exist or has an invalid status, for example, when trying to load a non-existent table. This exception helps in managing collection-related operational errors. ```python try: table = client.load_table("nonexistent_collection") except CollectionStatusException as e: print(f"Collection error: {e.message}") ``` -------------------------------- ### Import Partitioning Components Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/README.md Import classes for defining various partitioning strategies, including range, list, and hash. ```python from pyobvector import ( ObRangePartition, ObSubRangePartition, ObListPartition, ObSubListPartition, ObHashPartition, ObSubHashPartition, ObKeyPartition, ObSubKeyPartition, RangeListPartInfo, PartType ) ``` -------------------------------- ### Get SQL for Hybrid Search Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/hybridsearch.md Retrieves the actual SQL query that will be executed for a given hybrid search body. Useful for debugging and understanding query translation. ```python def get_sql( self, index: str, body: dict[str, Any], ) -> str ``` ```python query = { "query": { "query_string": { "fields": ["title"], "query": "database" } }, "knn": { "field": "embedding", "k": 5, "query_vector": [0.1, 0.2, 0.3] } } sql = client.get_sql(index="documents", body=query) print(sql) # Prints the actual SQL ``` -------------------------------- ### Handle SQL-Level Errors (No Such Table) Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Demonstrates handling the 'No such table' SQL error. Ensure the table exists or check spelling before querying. ```python # Querying non-existent table # Create table or check name spelling ``` -------------------------------- ### ObListPartition Constructor Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/partitions.md Defines a List partition strategy. Use for single-column LIST or multi-column LIST COLUMNS partitioning. ```python class ObListPartition(ObPartition): def __init__( self, is_list_columns: bool, list_part_infos: list[RangeListPartInfo], list_expr: str | None = None, col_name_list: list[str] | None = None, ) ``` -------------------------------- ### Get SQL Query from Search Body Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Retrieves the underlying SQL query that will be executed by the client for a given search body. This is useful for debugging or understanding the query translation. ```python sql = client.get_sql(index=test_table_name, body=body) print(sql) # prints the SQL query ``` -------------------------------- ### MilvusLikeClient Constructor Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/milvuslikeclient.md Initializes the MilvusLikeClient. This client provides a Milvus-compatible API for interacting with OceanBase as a vector store. It inherits methods from ObVecClient and ObClient. ```APIDOC ## MilvusLikeClient Constructor ### Description Initializes the MilvusLikeClient. This client provides a Milvus-compatible API for interacting with OceanBase as a vector store. It inherits methods from ObVecClient and ObClient. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **uri** (str) - Optional - Default: "127.0.0.1:2881" - OceanBase server address - **user** (str) - Optional - Default: "root@test" - Database user - **password** (str) - Optional - Default: "" - Database password - **db_name** (str) - Optional - Default: "test" - Database name - **kwargs** (dict) - Optional - Default: {} - Additional engine parameters ### Returns `MilvusLikeClient` instance ### Example ```python client = MilvusLikeClient(uri="localhost:2881", user="root", password="password", db_name="my_db") ``` ``` -------------------------------- ### Get and Set HNSW Search Ef Parameter Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/configuration.md Retrieve and modify the `ef_search` parameter for HNSW index. Adjusting this value impacts the trade-off between search recall and performance. ```python # Get current value ef_search = client.get_ob_hnsw_ef_search() # Set new value (affects HNSW search quality/performance) client.set_ob_hnsw_ef_search(100) # Higher values = better recall but slower search # Default: 40, Typical range: 40-200 ``` -------------------------------- ### Define Range Partition with Hash Subpartition Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/configuration.md Sets up a range partition on 'year' with partitions 'p2023' and 'p2024', and adds a hash subpartition on 'id' with 4 partitions to each range partition. ```python from pyobvector import ObRangePartition, ObSubHashPartition # Range with hash subpartition partition = ObRangePartition( is_range_columns=False, range_expr="year", range_part_infos=[ RangeListPartInfo("p2023", 2024), RangeListPartInfo("p2024", 2025) ] ) partition.add_subpartition( ObSubHashPartition( hash_expr="id", part_count=4 # 2 range × 4 hash = 8 total ) ) ``` -------------------------------- ### Connect to Embedded SeekDB (Local Path) Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/seekdbremoteclient.md Use this snippet to connect to an embedded SeekDB instance using a local file path. Ensure pyobvector is installed with pyseekdb support. ```python from pyobvector import SeekdbRemoteClient # Option A: Direct path client = SeekdbRemoteClient(path="./seekdb_data", database="test") # Option B: Using existing pyseekdb.Client import pyseekdb pyseekdb_client = pyseekdb.Client(path="./seekdb_data", database="test") client = SeekdbRemoteClient(pyseekdb_client=pyseekdb_client) ``` -------------------------------- ### Create Table with Index Parameters Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obvecclient.md Creates a table with specified columns, regular indexes, vector indexes, and full-text search indexes. Use for defining table structure with advanced indexing. ```python from sqlalchemy import Column, Integer, String from pyobvector import VECTOR, IndexParams, VecIndexType cols = [ Column('id', Integer, primary_key=True), Column('embedding', VECTOR(128)), Column('text', String(255)) ] idx_params = IndexParams() idx_params.add_index( field_name='embedding', index_type=VecIndexType.HNSW, index_name='vec_idx', metric_type='l2', params={'M': 16, 'efConstruction': 256} ) client.create_table_with_index_params( table_name='documents', columns=cols, vidxs=idx_params ) ``` -------------------------------- ### Create Partitioned Table with Vector Index (SQLAlchemy Mode) Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Define and create a table with specified columns, including a vector type, and configure a vector index. Supports range partitioning based on an ID column. ```python # create partitioned table range_part = ObRangePartition(False, range_part_infos = [ RangeListPartInfo('p0', 100), RangeListPartInfo('p1', 'maxvalue'), ], range_expr='id') cols = [ Column('id', Integer, primary_key=True, autoincrement=False), Column('embedding', VECTOR(3)), Column('meta', JSON) ] client.create_table(test_collection_name, columns=cols, partitions=range_part) # create vector index client.create_index( test_collection_name, is_vec_index=True, index_name='vidx', column_names=['embedding'], vidx_params='distance=l2, type=hnsw, lib=vsag', ) ``` -------------------------------- ### Get Rows with Column Selection and Limit Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Query rows from a table, specifying which columns to return and limiting the number of results. Useful for performance optimization and targeted data retrieval. ```python result = client.get('users', output_column_name=['id', 'text'], n_limits=10) rows = result.fetchall() ``` -------------------------------- ### ObClient Constructor Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Initializes an ObClient instance for connecting to OceanBase or SeekDB. Supports various connection parameters including URI, user credentials, database name, and options for embedded SeekDB or existing SQLAlchemy engines. ```APIDOC ## ObClient Constructor ### Description Initializes an ObClient instance for connecting to OceanBase or SeekDB. Supports various connection parameters including URI, user credentials, database name, and options for embedded SeekDB or existing SQLAlchemy engines. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **uri** (str) - Optional - OceanBase server address (format: host:port). Defaults to "127.0.0.1:2881". - **user** (str) - Optional - Database user (format: user@tenant for OceanBase). Defaults to "root@test". - **password** (str) - Optional - Database password. Defaults to "". - **db_name** (str) - Optional - Database name to connect to. Defaults to "test". - **path** (str | None) - Optional - Local path for embedded SeekDB (requires pip install pyobvector[pyseekdb]). Defaults to None. - **engine** (Any | None) - Optional - Provide an existing SQLAlchemy engine instead of creating one. Defaults to None. - **pyseekdb_client** (Any | None) - Optional - Existing pyseekdb.Client instance for embedded SeekDB. Defaults to None. - **kwargs** (dict) - Optional - Additional SQLAlchemy engine parameters. ### Returns `ObClient` instance ### Raises - Exception: If database connection fails ``` -------------------------------- ### Hybrid Search Example Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/hybridsearch.md Combines keyword search with vector similarity search and filtering. Use for complex queries requiring both text relevance and vector similarity, with optional data filtering. ```python # Hybrid search query = { "query": { "bool": { "must": [ { "query_string": { "fields": ["title", "content"], "query": "artificial intelligence", "minimum_should_match": "30%" } } ], "filter": [ { "range": { "date": {"gte": "2023-01-01"} } } ] } }, "knn": { "field": "embedding", "k": 10, "num_candidates": 50, "query_vector": [0.1, 0.2, 0.3], "similarity": 0.5 }, "from": 0, "size": 20 } results = client.search(index="documents", body=query) ``` -------------------------------- ### Import Index and Schema Components Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/README.md Import classes for defining vector indexes, parameters, and collection schemas. ```python from pyobvector import ( VectorIndex, # Vector index definition IndexParam, # Single index config IndexParams, # Multiple indexes VecIndexType, # Index algorithm enum FieldSchema, # Field definition CollectionSchema, # Collection structure FtsIndexParam, # Full-text search index FtsParser # FTS parser enum ) ``` -------------------------------- ### Handle SQL-Level Errors (No Such Index) Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Demonstrates handling the 'No such index' SQL error when attempting to drop a non-existent index. Verify the index name and its existence. ```python # Dropping non-existent index # Check index name or verify it exists ``` -------------------------------- ### Configure Valid Partitioning with Range Expression Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Shows the correct way to configure range partitioning for a collection, including the essential 'range_expr' parameter. Missing this parameter will result in a PartitionFieldException. ```python # ❌ WRONG from pyobvector import ObRangePartition, RangeListPartInfo partition = ObRangePartition( is_range_columns=False, range_part_infos=[RangeListPartInfo("p0", 100)] # Missing range_expr! ) # Raises: PartitionFieldException # ✅ CORRECT partition = ObRangePartition( is_range_columns=False, range_part_infos=[RangeListPartInfo("p0", 100)], range_expr="id" # Added! ) ``` -------------------------------- ### create_table Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Creates a new table in the database with the specified name, columns, and optional indexes and partitions. ```APIDOC ## create_table ### Description Creates a new table in the database with the specified name, columns, and optional indexes and partitions. ### Method `create_table` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **table_name** (str) - Required - Name of the table to create. - **columns** (list[Column]) - Required - SQLAlchemy Column objects defining table schema. - **indexes** (list[Index] | None) - Optional - Optional index objects to create on the table. Defaults to None. - **partitions** (ObPartition | None) - Optional - Optional partition strategy for the table. Defaults to None. - **kwargs** (dict) - Optional - Additional table options (e.g., extend_existing=True). ### Returns None ### Raises Exception if table creation fails ### Example ```python from sqlalchemy import Column, Integer, String from pyobvector import VECTOR cols = [ Column('id', Integer, primary_key=True), Column('embedding', VECTOR(128)), Column('text', String(255)) ] client.create_table('embeddings', columns=cols) ``` ``` -------------------------------- ### Milvus-Compatible Workflow Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/README.md Create collections, insert data, and perform searches using a Milvus-like API. Supports defining schemas and index parameters. ```python from pyobvector import MilvusLikeClient, DataType, IndexParams, VecIndexType client = MilvusLikeClient() # Create collection with schema schema = client.create_schema() schema.add_field("id", DataType.INT64, is_primary=True) schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=128) idx_params = IndexParams() idx_params.add_index("embedding", VecIndexType.HNSW, "vec_idx") client.create_collection( "documents", schema=schema, index_params=idx_params ) # Insert and search (Milvus-like API) client.insert("documents", [ {"id": 1, "embedding": [0.1, 0.2, ...]}, ]) results = client.search( "documents", data=[0.1, 0.2, ...], anns_field="embedding", limit=10 ) ``` -------------------------------- ### Create Table with Schema Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Create a new table in the database with a defined schema, including columns, optional indexes, and partitioning. Supports defining primary keys and vector columns. ```python from sqlalchemy import Column, Integer, String from pyobvector import VECTOR cols = [ Column('id', Integer, primary_key=True), Column('embedding', VECTOR(128)), Column('text', String(255)) ] client.create_table('embeddings', columns=cols) ``` -------------------------------- ### Create Table, Insert Data, and Perform ANN Search Source: https://github.com/oceanbase/pyobvector/blob/main/README.md Demonstrates creating a table with a vector index, inserting data, and executing an Approximate Nearest Neighbor (ANN) search. This API mirrors remote client usage. ```python from sqlalchemy import Column, Integer, VARCHAR from pyobvector import VECTOR, VectorIndex, l2_distance client.drop_table_if_exist("vec_table") client.create_table( table_name="vec_table", columns=[ Column("id", Integer, primary_key=True), Column("title", VARCHAR(255)), Column("vec", VECTOR(3)), ], indexes=[VectorIndex("vec_idx", "vec", params="distance=l2, type=hnsw, lib=vsag")], mysql_organization="heap", ) client.insert("vec_table", data=[ {"id": 1, "title": "doc A", "vec": [1.0, 1.0, 1.0]}, {"id": 2, "title": "doc B", "vec": [1.0, 2.0, 3.0]}, ]) res = client.ann_search( "vec_table", vec_data=[1.0, 2.0, 3.0], vec_column_name="vec", distance_func=l2_distance, with_dist=True, topk=5, output_column_names=["id", "title"], ) client.drop_table_if_exist("vec_table") ``` -------------------------------- ### ObClient Constructor Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Instantiate the ObClient for connecting to OceanBase or SeekDB. Supports remote connections via URI or embedded SeekDB using a local path or an existing client instance. Additional SQLAlchemy engine parameters can be passed via kwargs. ```python class ObClient( uri: str = "127.0.0.1:2881", user: str = "root@test", password: str = "", db_name: str = "test", path: str | None = None, engine: Any | None = None, pyseekdb_client: Any | None = None, **kwargs: Any, ) ``` -------------------------------- ### create_table_with_index_params Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obvecclient.md Creates a table with specified columns, and optionally, vector indexes, full-text search indexes, and partition strategies. ```APIDOC ## create_table_with_index_params ### Description Create a table with vector indexes and full-text search indexes. ### Method `create_table_with_index_params` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters Table | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | table_name | str | Yes | — | Table name to create | | columns | list[Column] | Yes | — | SQLAlchemy Column objects | | indexes | list[Index] \| None | No | None | Regular database indexes | | vidxs | IndexParams \| None | No | None | Vector index parameters | | fts_idxs | list[FtsIndexParam] \| None | No | None | Full-text search indexes | | partitions | ObPartition \| None | No | None | Partition strategy | | **kwargs | dict | No | {} | Table options (e.g., extend_existing=True, mysql_organization='heap') | ### Returns None ### Example ```python from sqlalchemy import Column, Integer, String from pyobvector import VECTOR, IndexParams, VecIndexType cols = [ Column('id', Integer, primary_key=True), Column('embedding', VECTOR(128)), Column('text', String(255)) ] idx_params = IndexParams() idx_params.add_index( field_name='embedding', index_type=VecIndexType.HNSW, index_name='vec_idx', metric_type='l2', params={'M': 16, 'efConstruction': 256} ) client.create_table_with_index_params( table_name='documents', columns=cols, vidxs=idx_params ) ``` ``` -------------------------------- ### Create Vector Index with Raw Parameters Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/vector-operations.md Create a vector index using raw string parameters. This is an alternative to using IndexParam for simpler index creation. ```python # Or create with raw parameters client.create_index( table_name="documents", is_vec_index=True, index_name="vec_idx", column_names=["embedding"], vidx_params="distance=l2, type=hnsw, lib=vsag, m=16, ef_construction=256" ) ``` -------------------------------- ### Prepare Index Parameters Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Create an empty `IndexParams` object, which is a container for building vector indexes. This is typically used with `ObVecClient` or `MilvusLikeClient`. ```python idx_params = ObClient.prepare_index_params() # Used with ObVecClient or MilvusLikeClient ``` -------------------------------- ### Correctly Create Collection with Primary Key Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Demonstrates the correct way to define a primary key when creating a collection schema. The primary key must be of a type like INT64, not FLOAT_VECTOR. ```python # ❌ WRONG schema = client.create_schema() schema.add_field("embedding", DataType.FLOAT_VECTOR, is_primary=True, dim=128) # Raises: PrimaryKeyException - FLOAT_VECTOR cannot be primary key # ✅ CORRECT schema = client.create_schema() schema.add_field("id", DataType.INT64, is_primary=True) schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=128) ``` -------------------------------- ### Debug Query Construction with get_sql() Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md This snippet illustrates how to retrieve the generated SQL query from the client for debugging purposes. It requires specifying the index and query body. ```python sql = client.get_sql(index="docs", body=query_body) print(f"Generated SQL: {sql}") ``` -------------------------------- ### prepare_index_params(cls) -> IndexParams Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/api-reference/obclient.md Create an IndexParams object for building vector indexes. ```APIDOC ## Method: prepare_index_params ### Description Create an IndexParams object for building vector indexes. ### Signature ```python @classmethod def prepare_index_params(cls) -> IndexParams ``` ### Returns `IndexParams` - Empty index parameters container ### Example ```python idx_params = ObClient.prepare_index_params() # Used with ObVecClient or MilvusLikeClient ``` ``` -------------------------------- ### Handle SQL-Level Errors (Index Exists) Source: https://github.com/oceanbase/pyobvector/blob/main/_autodocs/errors.md Demonstrates handling the 'Index already exists' SQL error. Drop the index first or verify its name before creation. ```python # Creating index that already exists # Drop index first or check name ```