### Run pgvector-python Example Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Navigate to the examples directory, install development dependencies, create an example database, and run the example script. ```sh cd examples/loading pip install --group dev createdb pgvector_example python3 example.py ``` -------------------------------- ### Install pgvector-python Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Install the pgvector-python library using pip. Follow specific instructions for your database library. ```sh pip install pgvector ``` -------------------------------- ### Clone and Install pgvector-python Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Clone the repository, navigate to the directory, install development dependencies, create a test database, and run tests. ```sh git clone https://github.com/pgvector/pgvector-python.git cd pgvector-python pip install --group dev createdb pgvector_python_test pytest ``` -------------------------------- ### Bit String Text Serialization Example Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/types-reference.md Example of how a Bit string is represented in text format, showing binary string without padding. ```text 101010 ``` -------------------------------- ### Vector Text Serialization Example Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/types-reference.md Example of how a Vector type is represented in human-readable text format. ```text [1.5,2.3,3.1] ``` -------------------------------- ### SQLAlchemy Hybrid Search Example Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md This SQLAlchemy example demonstrates a simple reranking strategy for hybrid search. It first retrieves top results by vector similarity and then reranks them using full-text search scores. ```python from sqlalchemy import func, text # Simple reranking: vector search with text filter with Session(engine) as session: query_vector = [...] # Placeholder for the actual query vector # Get top 100 by vector, rerank with text score subq = select(Item).order_by( Item.embedding.l2_distance(query_vector) ).limit(100).subquery() stmt = select(subq).filter( func.to_tsvector('english', subq.c.content).match( func.plainto_tsquery('english', 'search query') ) ).order_by( func.ts_rank( func.to_tsvector('english', subq.c.content), func.plainto_tsquery('english', 'search query') ).desc() ).limit(10) results = session.scalars(stmt).all() ``` -------------------------------- ### SparseVector Text Serialization Example Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/types-reference.md Example of how a SparseVector type is represented in human-readable text format, including indices and dimensions. ```text {1:1.5,3:2.3,5:3.1}/10 ``` -------------------------------- ### Enable Vector Extension in Django Migration Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Create a Django migration to enable the pgvector extension. This is a one-time setup for your database. ```python from pgvector.django import VectorExtension class Migration(migrations.Migration): operations = [ VectorExtension() ] ``` -------------------------------- ### Core Vector Operations in Python Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md Demonstrates creating and manipulating various vector types (Vector, HalfVector, SparseVector, Bit) from Python lists, NumPy arrays, and text/binary formats. Includes examples of conversion and equality checks. ```python from pgvector import Vector, HalfVector, SparseVector, Bit import numpy as np # Create from Python list vec = Vector([1.0, 2.0, 3.0]) print(vec.dimensions()) # 3 print(vec.to_list()) # [1.0, 2.0, 3.0] # Create from NumPy array arr = np.array([1.0, 2.0, 3.0], dtype=np.float32) vec = Vector(arr) # Convert to NumPy arr_back = vec.to_numpy() # Text and binary serialization text = vec.to_text() # "[1.0,2.0,3.0]" binary = vec.to_binary() # Raw bytes vec_from_text = Vector.from_text(text) vec_from_binary = Vector.from_binary(binary) # Half-precision for memory efficiency hvec = HalfVector([1.0, 2.0, 3.0]) print(hvec.dimensions()) # 3 # Sparse vectors for high-dimensional data sparse = SparseVector({0: 1.0, 100: 2.0, 5000: 3.0}, 5000) print(sparse.dimensions()) # 5000 print(sparse.indices()) # [0, 100, 5000] print(sparse.values()) # [1.0, 2.0, 3.0] # Convert sparse to dense dense_list = sparse.to_list() dense_array = sparse.to_numpy() # Bit strings bit = Bit([True, False, True, False]) print(bit.to_text()) # "1010" bit_from_str = Bit("10101010") # Equality vec1 = Vector([1.0, 2.0]) vec2 = Vector([1.0, 2.0]) print(vec1 == vec2) # True ``` -------------------------------- ### pgvector Distance Operators Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/00-README.md Provides examples of using L2 distance calculations with pgvector in Django, SQLAlchemy, and Peewee ORMs. ```python # Django from pgvector.django import L2Distance Item.objects.order_by(L2Distance('embedding', [1, 2, 3]))[:5] # SQLAlchemy select(Item).order_by(Item.embedding.l2_distance([1, 2, 3])).limit(5) # Peewee Item.select().order_by(Item.embedding.l2_distance([1, 2, 3])).limit(5) ``` -------------------------------- ### Create HNSW Index on Existing Table Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/sqlalchemy-integration.md Create an HNSW index on an existing table's vector column using SQLAlchemy's Index object and the create() method. This example uses L2 distance. ```python from sqlalchemy import Index from sqlalchemy.orm import declarative_base Base = declarative_base() class Item(Base): __tablename__ = 'items' embedding = mapped_column(VECTOR(1536)) # Or create the index on an existing table engine = create_engine(...) index = Index( 'embedding_hnsw_idx', Item.embedding, postgresql_using='hnsw', postgresql_with={'m': 16, 'ef_construction': 64}, postgresql_ops={'embedding': 'vector_l2_ops'} ) index.create(engine) ``` -------------------------------- ### Create Table with Vector Column using Psycopg 3 Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Define a table schema that includes a column of type `vector` with a specified dimension. This example uses a dimension of 3. ```python conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))') ``` -------------------------------- ### Bit Constructor and Methods Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md This section details how to create and use the Bit object, including its constructor and various utility methods for conversion and representation. ```APIDOC ## Bit Binary bit strings with arbitrary length and support for padding. **Import:** `from pgvector import Bit` **Module:** `pgvector/bit.py` ### Constructor ```python Bit(value: bytes | str | list[bool] | np.ndarray[tuple[int], np.dtype[np.bool | np.uint8]]) -> Bit ``` **Parameters:** | Parameter | Type | Description | |-----------|------|-------------| | value | `bytes` | Raw binary bytes | | value | `str` | Binary string of '0' and '1' characters; auto-padded to multiple of 8 | | value | `list[bool]` | List of boolean values; auto-padded to multiple of 8 | | value | `np.ndarray` | NumPy boolean or uint8 array | **Raises:** - `ValueError` — If value is not a supported type or string contains invalid characters **Examples:** ```python # From byte string (padded to 8 bits) bit = Bit("101") # Padded to "10100000" # From bytes bit = Bit(b'\xA0') # 10100000 in binary # From boolean list bit = Bit([True, False, True]) # Padded to 8 bits # From NumPy array import numpy as np arr = np.array([True, False, True, False, True]) bit = Bit(arr) # Padded to 8 bits ``` ### Methods #### to_text() → str Returns the bit string in PostgreSQL text format (without padding). ```python bit = Bit("101") bit.to_text() # "101" (not "10100000") ``` #### to_list() → list[bool] Returns the bit string as a list of booleans (without padding). ```python bit = Bit("10100") bit.to_list() # [True, False, True, False, False] ``` #### to_numpy() → np.ndarray[tuple[int], np.dtype[np.bool]] Returns the bit string as a NumPy boolean array (without padding). ```python bit = Bit("10100") arr = bit.to_numpy() # Returns np.array([True, False, True, False, False], dtype=bool) ``` #### to_binary() → bytes Returns the bit string in PostgreSQL binary format. ```python bit = Bit("101") binary = bit.to_binary() ``` #### from_text(value: str) → Bit Class method to construct a Bit from PostgreSQL text format. ```python bit = Bit.from_text("10101010") ``` #### from_binary(value: bytes) → Bit Class method to construct a Bit from PostgreSQL binary format. ```python bit = Bit.from_binary(binary_data) ``` **Raises:** - `ValueError` — If binary data has invalid length ``` -------------------------------- ### Asyncpg Integration for Vector Operations Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md Demonstrates how to set up an asyncpg connection to use pgvector, create a table with a VECTOR column, insert vector data, and perform nearest neighbor searches using the <-> operator. ```python import asyncio import asyncpg from pgvector.asyncpg import register_vector from pgvector import Vector async def main(): # Create connection with initialization async def init(conn): await register_vector(conn) pool = await asyncpg.create_pool( 'postgresql://localhost/mydb', init=init ) async with pool.acquire() as conn: # Create table await conn.execute(''' CREATE TABLE IF NOT EXISTS items ( id SERIAL PRIMARY KEY, embedding VECTOR(3) ) ''') # Insert data vec = Vector([1.0, 2.0, 3.0]) await conn.execute( 'INSERT INTO items (embedding) VALUES ($1)', vec.to_binary() ) # Query with distance results = await conn.fetch( 'SELECT embedding FROM items ORDER BY embedding <-> $1 LIMIT 5', Vector([1.0, 2.0, 3.0]).to_binary() ) for row in results: vec = Vector.from_binary(row['embedding']) print(vec.to_list()) await pool.close() asyncio.run(main()) ``` -------------------------------- ### Index Configuration: HNSW vs. IVFFlat Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md This snippet shows the correct usage for both HNSW and IVFFlat index configurations. Use `HnswIndex` for HNSW with parameters like `m` and `ef_construction`, and `IvfflatIndex` for IVFFlat with parameters like `lists`. ```python # For HNSW HnswIndex('embedding', m=16, ef_construction=64) # For IVFFlat IvfflatIndex('embedding', lists=100) ``` -------------------------------- ### Get HalfVector Dimensions Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Retrieve the number of dimensions of a HalfVector. ```python vec = HalfVector([1, 2, 3]) vec.dimensions() # Returns 3 ``` -------------------------------- ### Enable pgvector Extension with Psycopg 2 Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Use a cursor to execute the SQL command for creating the `vector` extension if it's not already present. ```python cur = conn.cursor() cur.execute('CREATE EXTENSION IF NOT EXISTS vector') ``` -------------------------------- ### Get SparseVector Values Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Retrieve a list of the non-zero values stored in the SparseVector. ```python from pgvector import SparseVector vec = SparseVector({0: 1.0, 2: 2.0, 4: 3.0}, 5) vec.values() # [1.0, 2.0, 3.0] ``` -------------------------------- ### Asynchronous Psycopg 3 Connection with pgvector Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md Shows how to set up an asynchronous connection using Psycopg 3 with asyncpg, register the vector type, create a table, insert vector data, and query for nearest neighbors asynchronously. ```python import asyncio import asyncpg import numpy as np from pgvector.psycopg import register_vector_async async def main(): # Connect and register types conn = await asyncpg.connect('postgresql://localhost/mydb') await register_vector_async(conn) # Create table await conn.execute('\ CREATE TABLE IF NOT EXISTS items ( id SERIAL PRIMARY KEY, embedding VECTOR(3) ) ') # Insert data embedding = np.array([1.0, 2.0, 3.0], dtype=np.float32) await conn.execute( 'INSERT INTO items (embedding) VALUES ($1)', embedding ) # Query with distance results = await conn.fetch( 'SELECT embedding FROM items ORDER BY embedding <-> $1 LIMIT 5', embedding ) await conn.close() asyncio.run(main()) ``` -------------------------------- ### Get SparseVector Non-Zero Values Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Retrieve a list of the values of the non-zero elements in a SparseVector. ```python values = vec.values() ``` -------------------------------- ### Get Vector Distance Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Calculate the L2 distance between a stored vector and a query vector. ```python session.exec(select(Item.embedding.l2_distance([3, 1, 2]))) ``` -------------------------------- ### from_binary() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Constructs a HalfVector object from its PostgreSQL binary representation. ```APIDOC ## from_binary(value: bytes) -> HalfVector ### Description Class method to construct a HalfVector from PostgreSQL binary format. ### Parameters #### Value - **value** (bytes) - Required - The binary representation of the vector. ### Returns - `HalfVector` - A new HalfVector object. ### Example ```python vec = HalfVector.from_binary(binary_data) ``` ``` -------------------------------- ### Get SparseVector Non-Zero Indices Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Retrieve a list of indices corresponding to the non-zero elements in a SparseVector. ```python indices = vec.indices() ``` -------------------------------- ### Constructing Bit from Binary Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Use the class method `from_binary` to create a Bit object from PostgreSQL binary data. Raises ValueError for invalid data length. ```python bit = Bit.from_binary(binary_data) ``` -------------------------------- ### Get SparseVector Indices Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Retrieve a list of the 0-based indices corresponding to the non-zero elements in a SparseVector. ```python from pgvector import SparseVector vec = SparseVector({0: 1.0, 2: 2.0, 4: 3.0}, 5) vec.indices() # [0, 2, 4] ``` -------------------------------- ### Get SparseVector Dimensions Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Retrieve the total number of dimensions for a SparseVector. This includes zero dimensions. ```python from pgvector import SparseVector vec = SparseVector({0: 1.0, 4: 3.0}, 10) vec.dimensions() # Returns 10 ``` -------------------------------- ### Synchronous Psycopg 3 Connection with pgvector Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md Demonstrates how to establish a synchronous connection using Psycopg 3, register the vector type, create a table, insert vector data, and perform a nearest neighbor search. ```python import psycopg import numpy as np from pgvector import Vector from pgvector.psycopg import register_vector # Connect and register types conn = psycopg.connect('postgresql://localhost/mydb') register_vector(conn) # Create table with conn.cursor() as cur: cur.execute('\ CREATE TABLE IF NOT EXISTS items ( id SERIAL PRIMARY KEY, name VARCHAR(255), embedding VECTOR(3) ) ') conn.commit() # Insert data with conn.cursor() as cur: embedding = np.array([1.0, 2.0, 3.0], dtype=np.float32) cur.execute('INSERT INTO items (name, embedding) VALUES (%s, %s)', ('Item 1', embedding)) conn.commit() # Query with distance with conn.cursor() as cur: cur.execute( 'SELECT name FROM items ORDER BY embedding <-> %s LIMIT 5', (np.array([1.0, 2.0, 3.0], dtype=np.float32),) ) for row in cur: print(row) conn.close() ``` -------------------------------- ### Vector.to_numpy() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Converts the vector into a NumPy array with a float32 data type. Requires the NumPy library to be installed. ```APIDOC ## to_numpy() ### Description Converts the vector into a NumPy array with a float32 data type. Requires the NumPy library to be installed. ### Parameters None ### Method `to_numpy()` ### Endpoint None ### Request Example ```python vec = Vector([1.0, 2.0, 3.0]) arr = vec.to_numpy() # arr is a NumPy array with dtype=float32 ``` ### Response #### Success Response (200) - **numpy_array** (np.ndarray) - The vector represented as a NumPy array with float32 dtype. #### Response Example ```json { "example": "[1.0, 2.0, 3.0] (dtype=float32)" } ``` ``` -------------------------------- ### Get Nearest Neighbors with Half-Precision Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Retrieves nearest neighbors using vectors cast to half-precision for comparison. ```python from pgvector.sqlalchemy import HALFVEC from sqlalchemy.sql import func from sqlalchemy import select order = func.cast(Item.embedding, HALFVEC(3)).l2_distance([3, 1, 2]) session.scalars(select(Item).order_by(order).limit(5)) ``` -------------------------------- ### from_binary() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md A class method to construct a SparseVector from its PostgreSQL binary representation. ```APIDOC #### from_binary(value: bytes) → SparseVector Class method to construct a SparseVector from PostgreSQL binary format. ```python vec = SparseVector.from_binary(binary_data) ``` ``` -------------------------------- ### Get Vector Distance Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Calculates the L2 distance between a stored vector and a query vector using SQLAlchemy. ```python session.scalars(select(Item.embedding.l2_distance([3, 1, 2]))) ``` -------------------------------- ### Bit Constructor: Valid Bit String Input Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Demonstrates constructing a Bit object from a string containing only '0' and '1' characters. This resolves the 'expected bit string' error. ```python bit = Bit("10101010") ``` -------------------------------- ### Convert HalfVector to NumPy Array Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Convert a HalfVector to a NumPy array with float16 dtype. Requires NumPy to be installed. ```python from pgvector import HalfVector vec = HalfVector([1.0, 2.0, 3.0]) arr = vec.to_numpy() # Returns np.ndarray with dtype=float16 ``` -------------------------------- ### Index Configuration Error: Correct HNSW Parameters Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md This snippet demonstrates the correct way to configure an HNSW index with valid integer parameters for 'm' and 'ef_construction'. ```python HnswIndex( 'embedding', m=16, # int ef_construction=64 # int ) ``` -------------------------------- ### Get Nearest Neighbors by Hamming Distance Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Retrieves nearest neighbors by calculating Hamming distance on binary-quantized vectors. ```python from pgvector.sqlalchemy import BIT, VECTOR from sqlalchemy.sql import func from sqlalchemy import select order = func.cast(func.binary_quantize(Item.embedding), BIT(3)).hamming_distance(func.binary_quantize(func.cast([3, -1, 2], VECTOR(3)))) session.scalars(select(Item).order_by(order).limit(5)) ``` -------------------------------- ### Construct HalfVector from PostgreSQL Binary Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Create a HalfVector instance from bytes in PostgreSQL binary format. ```python from pgvector import HalfVector # Assuming binary_data is a bytes object containing valid HalfVector binary representation # vec = HalfVector.from_binary(binary_data) ``` -------------------------------- ### Converting Bit to Binary Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Convert a Bit object to its PostgreSQL binary format. ```python bit = Bit("101") binary = bit.to_binary() ``` -------------------------------- ### Vector String Representation Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Get a developer-friendly string representation of a Vector object using repr(). This is useful for debugging. ```python vec = Vector([1.0, 2.0, 3.0]) repr(vec) # "Vector([1.0, 2.0, 3.0])" ``` -------------------------------- ### Configure Connection Pool with Psycopg 3 Source: https://github.com/pgvector/pgvector-python/blob/master/README.md For connection pools, define a `configure` function that registers vector types and pass it to the `ConnectionPool` constructor. ```python def configure(conn): register_vector(conn) pool = ConnectionPool(..., configure=configure) ``` -------------------------------- ### Construct HalfVector from NumPy Array Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Create a HalfVector instance from a NumPy array with float16 dtype. Requires NumPy to be installed. ```python import numpy as np from pgvector import HalfVector arr = np.array([1.0, 2.0, 3.0], dtype=np.float16) vec = HalfVector(arr) ``` -------------------------------- ### High-Volume Vector Loading with COPY Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md Leverage PostgreSQL's COPY command for faster data loading by preparing CSV data with vectors in text format. Requires `psycopg` and `register_vector`. ```python import psycopg import io conn = psycopg.connect('postgresql://localhost/mydb') register_vector(conn) # Prepare CSV data with vectors in text format csv_buffer = io.StringIO() csv_buffer.write("id,name,embedding\n") for i in range(10000): embedding_text = f"[{i}.0,{i+1}.0,{i+2}.0]" csv_buffer.write(f"{i},Item {i},\"{embedding_text}\"\n") csv_buffer.seek(0) with conn.cursor() as cur: with csv_buffer as f: cur.copy("COPY items (id, name, embedding) FROM STDIN (FORMAT csv, HEADER)", f) conn.commit() conn.close() ``` -------------------------------- ### Getting Vector Dimensions Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Use the dimensions() method to retrieve the number of elements in a Vector. This is useful for verifying vector size. ```python vec = Vector([1, 2, 3]) vec.dimensions() # Returns 3 ``` -------------------------------- ### Enable pgvector Extension with asyncpg Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Use this snippet to enable the pgvector extension on your database connection using asyncpg. Ensure the connection is established before execution. ```python await conn.execute('CREATE EXTENSION IF NOT EXISTS vector') ``` -------------------------------- ### Create Binary Quantization Index with SQLAlchemy Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/sqlalchemy-integration.md Create an HNSW index on binary-quantized vectors using `BIT` for casting and `bit_hamming_ops` for indexing. This enables fast approximate search. ```python from sqlalchemy.sql import func from pgvector.sqlalchemy import BIT, VECTOR # Create index on binary-quantized vectors index = Index( 'embedding_bq_idx', func.cast(func.binary_quantize(Item.embedding), BIT(1536)).label('embedding'), postgresql_using='hnsw', postgresql_ops={'embedding': 'bit_hamming_ops'} ) index.create(engine) # Query with binary-quantized distance query_vec = func.cast([...], VECTOR(1536)) distance = func.cast( func.binary_quantize(Item.embedding), BIT(1536) ).hamming_distance(func.binary_quantize(query_vec)) results = session.scalars( select(Item).order_by(distance).limit(20) ) ``` -------------------------------- ### Define Item Model with VECTOR Type Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/sqlalchemy-integration.md Example of defining SQLAlchemy models with fixed and unconstrained dimensionality using the VECTOR type. ```python from pgvector.sqlalchemy import VECTOR from sqlalchemy.orm import declarative_base, mapped_column Base = declarative_base() class Item(Base): __tablename__ = 'items' # Fixed dimensionality embedding = mapped_column(VECTOR(1536)) # Unconstrained dimensionality dynamic_embedding = mapped_column(VECTOR()) ``` -------------------------------- ### Enable pgvector Extension with Psycopg 3 Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Execute SQL to create the pgvector extension if it does not already exist. This is a prerequisite for using vector types. ```python conn.execute('CREATE EXTENSION IF NOT EXISTS vector') ``` -------------------------------- ### pg8000: Register Vector Type Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Register the vector type with a pg8000 connection. This function will raise a RuntimeError if the vector extension is not installed in the database. ```python from pgvector.pg8000 import register_vector conn = pg8000.connect(...) register_vector(conn) # Raises if vector type not found ``` -------------------------------- ### Converting Vector to NumPy Array Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Convert a Vector object to a NumPy array with float32 dtype using the to_numpy() method. Requires NumPy to be installed. ```python vec = Vector([1.0, 2.0, 3.0]) arr = vec.to_numpy() # Returns np.ndarray with dtype=float32 ``` -------------------------------- ### Create a Vector Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/INDEX.md Instantiate a Vector object with a list of floats. ```python from pgvector import Vector vec = Vector([1.0, 2.0, 3.0]) ``` -------------------------------- ### to_binary() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Serializes the half-vector into a binary format suitable for PostgreSQL. ```APIDOC ## to_binary() ### Description Returns the half-vector in PostgreSQL binary format. ### Returns - `bytes` - The binary representation of the vector. ### Example ```python vec = HalfVector([1.0, 2.0, 3.0]) binary = vec.to_binary() # bytes with header and half-precision elements ``` ``` -------------------------------- ### Psycopg 2: Register Vector Type Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Register the vector type with a Psycopg 2 connection. This will raise a ProgrammingError if the vector extension is not installed in the database. ```python from pgvector.psycopg2 import register_vector conn = psycopg2.connect(...) register_vector(conn) # Raises if vector type not found ``` -------------------------------- ### Register Codecs for asyncpg Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/api-reference-summary.md Use this function to register codecs for asyncpg connections. Specify the schema if it's not the default 'public'. Import from pgvector.asyncpg. ```python async def register_vector(conn: Connection, schema: str = 'public') -> None: """Register codecs for asyncpg connection""" ``` -------------------------------- ### Create HNSW Index on Model Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/sqlalchemy-integration.md Define an HNSW index for a vector column within a SQLAlchemy model using __table_args__. This example uses L2 distance. ```python from sqlalchemy import Index from sqlalchemy.orm import declarative_base Base = declarative_base() class Item(Base): __tablename__ = 'items' embedding = mapped_column(VECTOR(1536)) # HNSW index __table_args__ = ( Index( 'embedding_hnsw_idx', 'embedding', postgresql_using='hnsw', postgresql_with={'m': 16, 'ef_construction': 64}, postgresql_ops={'embedding': 'vector_l2_ops'} ), ) ``` -------------------------------- ### Add Approximate Nearest Neighbor Index (IVFFlat) with asyncpg Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Creates an approximate nearest neighbor index using the Inverted File Flat (IVFFlat) algorithm on the 'embedding' column with L2 distance and specifies 100 lists. ```python await conn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)') ``` -------------------------------- ### Get Nearest Neighbors with Half-Precision Vectors in Django Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Query for nearest neighbors using half-precision vectors. Cast the embedding field to `HalfVectorField` before calculating the distance. ```python from pgvector.django import L2Distance from django.db.models.functions import Cast from pgvector.django import HalfVectorField distance = L2Distance(Cast('embedding', HalfVectorField(dimensions=3)), [3, 1, 2]) Item.objects.order_by(distance)[:5] ``` -------------------------------- ### Constructing Bit Objects Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Instantiate Bit objects from byte strings, binary strings, lists of booleans, or NumPy arrays. Binary strings and boolean lists are auto-padded to multiples of 8 bits. ```python # From byte string (padded to 8 bits) bit = Bit("101") # Padded to "10100000" ``` ```python # From bytes bit = Bit(b'\xA0') # 10100000 in binary ``` ```python # From boolean list bit = Bit([True, False, True]) # Padded to 8 bits ``` ```python # From NumPy array import numpy as np arr = np.array([True, False, True, False, True]) bit = Bit(arr) # Padded to 8 bits ``` -------------------------------- ### Create Table with Vector Column using Psycopg 2 Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Define a table schema with a `vector` type column. This example creates a column named `embedding` with a dimension of 3. ```python cur.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))') ``` -------------------------------- ### SparseVector Constructor: Provide Dimensions Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Demonstrates constructing a SparseVector from a dictionary with the 'dimensions' parameter specified. This resolves the 'missing dimensions' error. ```python vec = SparseVector({0: 1.0, 2: 2.0}, 5) ``` -------------------------------- ### to_binary() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Returns the sparse vector encoded in PostgreSQL's binary format. ```APIDOC #### to_binary() → bytes Returns the sparse vector in PostgreSQL binary format. ```python vec = SparseVector({0: 1.0, 2: 2.0}, 5) binary = vec.to_binary() ``` ``` -------------------------------- ### Query Django Model by L2 Distance Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/api-reference-summary.md Order Django model instances by their L2 distance to a specified vector. This requires the L2Distance function to be available in your Django ORM setup. ```python Item.objects.order_by(L2Distance('embedding', [1.0, 2.0, 3.0]))[:5] ``` -------------------------------- ### Converting Vector to PostgreSQL Text Format Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Use the to_text() method to get a string representation of the vector in PostgreSQL's bracket-delimited text format. This is useful for debugging or manual inspection. ```python vec = Vector([1.0, 2.0, 3.0]) text = vec.to_text() # "[1.0,2.0,3.0]" ``` -------------------------------- ### HNSW Index Configuration Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/types-reference.md Defines the configuration parameters for creating an HNSW index, including connections per layer and the size of the dynamic list. ```APIDOC ## HNSW Index Parameters ```python class HnswIndexConfig(TypedDict, total=False): m: int # Connections per layer ef_construction: int # Size of dynamic list ``` **Typical Values:** - `m=16` — Default, good for general use - `ef_construction=64` — Balance between build time and quality ``` -------------------------------- ### HNSW Index Configuration Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/types-reference.md Defines the configuration options for HNSW indexes, including connections per layer (m) and the size of the dynamic list for construction (ef_construction). ```python class HnswIndexConfig(TypedDict, total=False): m: int # Connections per layer ef_construction: int # Size of dynamic list ``` -------------------------------- ### Django Hybrid Search Example Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md This snippet shows how to perform hybrid search in Django by combining vector similarity with full-text search ranking. It requires Django's `SearchVectorField` and pgvector's `VectorField` and `L2Distance`. ```python from django.db import models from django.contrib.postgres.search import SearchVectorField from pgvector.django import VectorField, L2Distance class Article(models.Model): title = models.CharField(max_length=255) content = models.TextField() embedding = VectorField(dimensions=1536) search_vector = SearchVectorField() # Hybrid search: combine vector and text similarity from django.contrib.postgres.search import SearchQuery, SearchRank query_text = "machine learning" query_vector = [...] # 1536-dimensional embedding results = Article.objects.annotate( search_rank=SearchRank('search_vector', SearchQuery(query_text)), vector_distance=L2Distance('embedding', query_vector) ).filter( search_vector=SearchQuery(query_text) ).order_by( 'search_rank', 'vector_distance' )[:10] ``` -------------------------------- ### IVFFlat Index Configuration Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/types-reference.md Defines the configuration parameters for creating an IVFFlat index, specifically the number of inverted lists. ```APIDOC ## IVFFlat Index Parameters ```python class IvfflatIndexConfig(TypedDict, total=False): lists: int # Number of inverted lists ``` **Typical Values:** - `lists=100` — For 1M+ vectors - `lists=1000` — For 10M+ vectors ``` -------------------------------- ### Insert Data with Different Vector Formats Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md Demonstrates inserting data into the Item model using different formats for the embedding columns: Python list, pgvector Vector object, and pgvector SparseVector object. ```python from pgvector import Vector, SparseVector with Session(engine) as session: # From list item = Item(name='Example', embedding=[1.0, 2.0, 3.0]) session.add(item) # From Vector object item = Item(name='Example', embedding=Vector([1.0, 2.0, 3.0])) session.add(item) # From sparse vector sparse = SparseVector({0: 1.0, 100: 2.0}, 5000) item = Item(name='Example', sparse_embedding=sparse) session.add(item) session.commit() ``` -------------------------------- ### SQL: Create Vector Extension Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Enable the pgvector extension in your PostgreSQL database. This is a prerequisite for using vector types. ```sql CREATE EXTENSION IF NOT EXISTS vector; ``` -------------------------------- ### Construct SparseVector from PostgreSQL Binary Format Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Create a SparseVector instance from binary data, typically obtained from PostgreSQL. ```python from pgvector import SparseVector vec = SparseVector.from_binary(binary_data) ``` -------------------------------- ### Create Approximate Nearest Neighbor (IVFFlat) Index Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Define and create an IVFFlat index for approximate nearest neighbor searches. Configure the number of lists. ```python from sqlmodel import Index index = Index( 'my_index', Item.embedding, postgresql_using='ivfflat', postgresql_with={'lists': 100}, postgresql_ops={'embedding': 'vector_l2_ops'} ) index.create(engine) ``` -------------------------------- ### Vector.to_binary() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Returns the vector in PostgreSQL's binary format, suitable for binary wire protocol communication. ```APIDOC ## to_binary() ### Description Returns the vector in PostgreSQL's binary format, suitable for binary wire protocol communication. ### Parameters None ### Method `to_binary()` ### Endpoint None ### Request Example ```python vec = Vector([1.0, 2.0, 3.0]) binary = vec.to_binary() # bytes with header and elements ``` ### Response #### Success Response (200) - **binary_format** (bytes) - The vector represented in PostgreSQL binary format. #### Response Example ```json { "example": "" } ``` ``` -------------------------------- ### Create pgvector Vector from PostgreSQL Binary Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/api-reference-summary.md Instantiate a Vector object from binary data, often retrieved directly from PostgreSQL. ```python # From PostgreSQL binary vec = Vector.from_binary(binary_data) ``` -------------------------------- ### Bit Constructor: Boolean List Input Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Demonstrates constructing a Bit object from a list of boolean values. This resolves the 'expected list[bool]' error. ```python bit = Bit([True, False, True]) ``` -------------------------------- ### Vector Constructor: List or ndarray Input Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Demonstrates constructing a Vector using a list or a NumPy array. This resolves the 'expected list or ndarray' error. ```python vec = Vector([1.0, 2.0, 3.0]) vec = Vector(np.array([1.0, 2.0, 3.0])) ``` -------------------------------- ### Vector Constructor: 1-D Array Input Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/errors-exceptions.md Demonstrates constructing a Vector from a 1-D NumPy array. This resolves the 'expected ndim to be 1' error. ```python arr = np.array([1.0, 2.0, 3.0]) # 1-D array vec = Vector(arr) ``` -------------------------------- ### Peewee Index Support Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/database-drivers.md Demonstrates how to add approximate nearest neighbor (ANN) indexes to Peewee models using pgvector's indexing capabilities. ```APIDOC ## Peewee Index Support ### Description Add approximate indexes with Peewee: ### Examples ```python # Assuming 'Item' is a Peewee model with an 'embedding' VectorField Item.add_index('embedding vector_l2_ops', using='hnsw') # or Item.add_index('embedding vector_l2_ops', using='ivfflat') ``` ``` -------------------------------- ### Converting Vector to PostgreSQL Binary Format Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Obtain the vector's binary representation using the to_binary() method. This format is used for efficient data transfer with PostgreSQL, especially via the binary wire protocol. ```python vec = Vector([1.0, 2.0, 3.0]) binary = vec.to_binary() # bytes with header and elements ``` -------------------------------- ### to_list() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Converts the sparse vector into a dense Python list, including zero elements. ```APIDOC #### to_list() → list[float] Returns the sparse vector as a dense Python list. ```python vec = SparseVector({0: 1.0, 4: 3.0}, 5) vec.to_list() # [1.0, 0.0, 0.0, 0.0, 3.0] ``` ``` -------------------------------- ### Register Vector Codecs for Psycopg 3 Async Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/api-reference-summary.md Use this function to register codecs for Psycopg 3 asynchronous connections. Import from pgvector.psycopg. ```python async def register_vector_async(context: AsyncConnection[Any]) -> None: """Register codecs for Psycopg 3 async connection""" ``` -------------------------------- ### IVFFlat Index Configuration Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/types-reference.md Defines the configuration option for IVFFlat indexes, specifying the number of inverted lists. ```python class IvfflatIndexConfig(TypedDict, total=False): lists: int # Number of inverted lists ``` -------------------------------- ### __eq__() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Compares two HalfVector objects for equality based on their binary representations. ```APIDOC ## __eq__(other: object) -> bool ### Description Half-vectors are equal if their binary representations match. ### Parameters #### Other - **other** (object) - Required - The object to compare with. ### Returns - `bool` - True if the vectors are equal, False otherwise. ### Example ```python vec1 = HalfVector([1.0, 2.0]) vec2 = HalfVector([1.0, 2.0]) vec1 == vec2 # True ``` ``` -------------------------------- ### from_text() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Constructs a HalfVector object from its PostgreSQL text representation. ```APIDOC ## from_text(value: str) -> HalfVector ### Description Class method to construct a HalfVector from PostgreSQL text format. ### Parameters #### Value - **value** (str) - Required - The string representation of the vector in PostgreSQL text format. ### Returns - `HalfVector` - A new HalfVector object. ### Example ```python vec = HalfVector.from_text("[1.0,2.0,3.0]") ``` ``` -------------------------------- ### Create Approximate Nearest Neighbor (HNSW) Index Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Define and create an HNSW index for efficient approximate nearest neighbor searches. Configure parameters like 'm' and 'ef_construction'. ```python from sqlmodel import Index index = Index( 'my_index', Item.embedding, postgresql_using='hnsw', postgresql_with={'m': 16, 'ef_construction': 64}, postgresql_ops={'embedding': 'vector_l2_ops'} ) index.create(engine) ``` -------------------------------- ### Constructing Bit from Text Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Use the class method `from_text` to create a Bit object from a PostgreSQL text format string. ```python bit = Bit.from_text("10101010") ``` -------------------------------- ### Add Approximate Nearest Neighbor Index (IVFFlat) with pg8000 Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Creates an approximate nearest neighbor index using the Inverted File Flat (IVFFlat) algorithm on the 'embedding' column with L2 distance and specifies 100 lists using pg8000. ```python conn.run('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)') ``` -------------------------------- ### from_text() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md A class method to construct a SparseVector from its PostgreSQL text representation. ```APIDOC #### from_text(value: str) → SparseVector Class method to construct a SparseVector from PostgreSQL text format. ```python # Text format: "{1:1.0,3:2.0,5:3.0}/5" # Note: indices in text are 1-based vec = SparseVector.from_text("{1:1.0,3:2.0,5:3.0}/5") ``` ``` -------------------------------- ### Vector Constructor Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Initializes a Vector object from a list of floats or a NumPy array. It validates the input to ensure it's a 1-dimensional array or list convertible to floats. ```APIDOC ## Vector Constructor ### Description Initializes a Vector object from a list of floats or a NumPy array. It validates the input to ensure it's a 1-dimensional array or list convertible to floats. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body * **value** (list[float] | np.ndarray) - Required - List of floats or NumPy float array ### Request Example ```python # From list vec = Vector([1.0, 2.0, 3.0]) # From NumPy array import numpy as np arr = np.array([1.0, 2.0, 3.0], dtype=np.float32) vec = Vector(arr) # In database query conn.execute('INSERT INTO items (embedding) VALUES (%s)', (Vector([1, 2, 3]),)) ``` ### Response #### Success Response (200) Returns a Vector object. #### Response Example ```json { "example": "Vector([1.0, 2.0, 3.0])" } ``` ### Raises * `ValueError` — If value is not a list or ndarray, if ndarray is not 1-dimensional, or if list elements cannot be converted to floats ``` -------------------------------- ### Vector.from_binary() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md A class method to construct a Vector object from bytes representing the vector in PostgreSQL binary format. Typically used internally by database drivers. ```APIDOC ## from_binary(value: bytes) ### Description A class method to construct a Vector object from bytes representing the vector in PostgreSQL binary format. Typically used internally by database drivers. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body * **value** (bytes) - Required - Bytes in PostgreSQL binary format ### Request Example ```python # Typically used by database drivers vec = Vector.from_binary(binary_data) ``` ### Response #### Success Response (200) Returns a Vector object constructed from the binary input. #### Response Example ```json { "example": "Vector([1.0, 2.0, 3.0])" } ``` ### Raises * `ValueError` — If the binary data has invalid length or unused header field is non-zero ``` -------------------------------- ### Enable pgvector Extension with pg8000 Source: https://github.com/pgvector/pgvector-python/blob/master/README.md Use this snippet to enable the pgvector extension on your database connection using pg8000. Ensure the connection is established before execution. ```python conn.run('CREATE EXTENSION IF NOT EXISTS vector') ``` -------------------------------- ### Vector Aggregation in Django Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/examples.md Demonstrates how to perform aggregation operations on vector fields in Django models, specifically calculating the average and sum of embeddings. ```python from django.db.models import Avg, Sum # Average embedding across all items avg_result = Item.objects.aggregate( avg_emb=Avg('embedding') ) avg_vector = avg_result['avg_emb'] # Vector object if avg_vector: print(avg_vector.to_list()) # Sum of embeddings sum_result = Item.objects.aggregate( sum_emb=Sum('embedding') ) sum_vector = sum_result['sum_emb'] # Vector object ``` -------------------------------- ### to_text() Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/core-vector-types.md Serializes the half-vector into a string format suitable for PostgreSQL. ```APIDOC ## to_text() ### Description Returns the half-vector in PostgreSQL text format. ### Returns - `str` - The string representation of the vector. ### Example ```python vec = HalfVector([1.0, 2.0, 3.0]) text = vec.to_text() # "[1.0,2.0,3.0]" ``` ``` -------------------------------- ### Create Half-Precision Index with SQLAlchemy Source: https://github.com/pgvector/pgvector-python/blob/master/_autodocs/sqlalchemy-integration.md Create an HNSW index on a cast half-precision version of a full-precision vector column. Use `HALFVEC` for casting and specify `halfvec_l2_ops` for indexing. ```python from sqlalchemy.sql import func from sqlalchemy import cast from pgvector.sqlalchemy import HALFVEC # Create index on cast column index = Index( 'embedding_halfvec_idx', func.cast(Item.embedding, HALFVEC(1536)).label('embedding'), postgresql_using='hnsw', postgresql_ops={'embedding': 'halfvec_l2_ops'} ) index.create(engine) # Query with the cast order = func.cast(Item.embedding, HALFVEC(1536)).l2_distance([...]) results = session.scalars(select(Item).order_by(order).limit(5)) ```