### Install Python Libraries for OpenAI and Vecs Source: https://github.com/supabase/vecs/blob/main/docs/integrations_openai.md Installs the necessary Python libraries, 'vecs' and 'openai', using pip. Ensure you have Python 3.7+ installed. ```sh pip install vecs openai ``` -------------------------------- ### Install Supabase CLI for Windows Source: https://github.com/supabase/vecs/blob/main/docs/hosting.md Installs the Supabase command-line interface on Windows using Scoop. This is a prerequisite for managing local Supabase projects. ```shell scoop bucket add supabase https://github.com/supabase/scoop-bucket.git scoop install supabase ``` -------------------------------- ### Install Supabase CLI for macOS Source: https://github.com/supabase/vecs/blob/main/docs/hosting.md Installs the Supabase command-line interface on macOS using Homebrew. This is a prerequisite for managing local Supabase projects. ```shell brew install supabase/tap/supabase ``` -------------------------------- ### Start Supabase Local Development Environment Source: https://github.com/supabase/vecs/blob/main/docs/hosting.md Starts the local Supabase development environment, downloading necessary containers and providing URLs for various services, including the database. This command is essential for local development. ```shell supabase start ``` -------------------------------- ### Install Vecs Python Package Source: https://github.com/supabase/vecs/blob/main/docs/index.md This command installs the Vecs Python package using pip. Ensure you have a compatible Python version (3.7+) installed. ```sh pip install vecs ``` -------------------------------- ### Install Supabase CLI using npm Source: https://github.com/supabase/vecs/blob/main/docs/hosting.md Installs the Supabase command-line interface as a development dependency using npm. This is a prerequisite for managing local Supabase projects. ```shell npm install supabase --save-dev ``` -------------------------------- ### Vecs Python Usage Example: Create Client, Collection, Upsert, Index, and Query Source: https://github.com/supabase/vecs/blob/main/docs/index.md This Python snippet demonstrates the core functionality of the Vecs library. It shows how to create a client connection, get or create a vector collection, add records (vectors with metadata) using upsert, create an index for efficient searching, and perform a query with metadata filtering. Finally, it shows how to disconnect from the database. ```python import vecs DB_CONNECTION = "postgresql://:@:/" # create vector store client vx = vecs.create_client(DB_CONNECTION) # create a collection of vectors with 3 dimensions docs = vx.get_or_create_collection(name="docs", dimension=3) # add records to the *docs* collection docs.upsert( records=[ ( "vec0", # the vector's identifier [0.1, 0.2, 0.3], # the vector. list or np.array {"year": 1973} # associated metadata ), ( "vec1", [0.7, 0.8, 0.9], {"year": 2012} ) ] ) # index the collection for fast search performance docs.create_index() # query the collection filtering metadata for "year" = 2012 docs.query( data=[0.4,0.5,0.6], # required limit=1, # number of records to return filters={"year": {"$eq": 2012}}, # metadata filters ) # Returns: ["vec1"] # Disconnect from the database vx.disconnect() ``` -------------------------------- ### Initialize Supabase Project Locally Source: https://github.com/supabase/vecs/blob/main/docs/hosting.md Initializes a new Supabase project in the current directory by creating the necessary 'supabase/' subdirectory. This command is used before starting a local Supabase development environment. ```shell supabase init ``` -------------------------------- ### Install Supabase CLI for Linux Source: https://github.com/supabase/vecs/blob/main/docs/hosting.md Installs the Supabase command-line interface on Linux using package managers like apk, dpkg, or rpm. This is a prerequisite for managing local Supabase projects. ```shell sudo apk add --allow-untrusted <...>.apk # or sudo dpkg -i <...>.deb # or sudo rpm -i <...>.rpm ``` -------------------------------- ### Install vecs and boto3 Libraries Source: https://github.com/supabase/vecs/blob/main/docs/integrations_amazon_bedrock.md Installs the necessary Python libraries, vecs and boto3, using pip. These are essential for interacting with vecs and Amazon Bedrock. ```sh pip install vecs boto3 ``` -------------------------------- ### Start Supabase Postgres Docker Container Source: https://github.com/supabase/vecs/blob/main/docs/hosting.md Runs a PostgreSQL 15.1 container named 'vecs_hosting_guide' locally. It maps port 5019 to the container's 5432, sets database credentials, and uses the 'supabase/postgres:15.1.0.74' image. This is for setting up a local PostgreSQL instance for 'vecs'. ```shell docker run --rm -d \ --name vecs_hosting_guide \ -p 5019:5432 \ -e POSTGRES_DB=vecs_db \ -e POSTGRES_PASSWORD=password \ -e POSTGRES_USER=postgres \ supabase/postgres:15.1.0.74 ``` -------------------------------- ### Install Dependencies for Vecs and Requests Source: https://github.com/supabase/vecs/blob/main/docs/integrations_huggingface_inference_endpoints.md Installs the necessary Python packages, `vecs` for vector database operations and `requests` for making HTTP requests to the Hugging Face API. Ensure you have Python 3.7+ installed. ```bash pip install vecs requests ``` -------------------------------- ### Vector Record Example (Python) Source: https://github.com/supabase/vecs/blob/main/docs/concepts_collections.md Provides a concrete example of a vector record in Python, illustrating the expected format for ID, vector data, and metadata. ```python ("vec1", [0.1, 0.2, 0.3], {"year": 1990}) ``` -------------------------------- ### Install Vecs with Text Embedding Dependencies Source: https://github.com/supabase/vecs/blob/main/docs/api.md Installs the Vecs library along with optional dependencies required for text embedding functionalities. This command is necessary before using adapters that process text data for embedding. ```sh pip install "vecs[text_embedding]" ``` -------------------------------- ### Markdown Chunking Adapter for Document Analysis Source: https://context7.com/supabase/vecs/llms.txt This example demonstrates the Markdown Chunking Adapter, which automatically splits markdown documents into chunks based on headings. Each chunk is then embedded, allowing for targeted querying of specific sections within the documentation. ```python import vecs from vecs.adapter import Adapter, MarkdownChunker, TextEmbedding vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") # Create collection that chunks markdown by headings docs = vx.get_or_create_collection( name="documentation", adapter=Adapter([ MarkdownChunker(skip_during_query=True), TextEmbedding(model='all-MiniLM-L6-v2') ]) ) # Upsert markdown documentation docs.upsert([ ( "readme", """ # Getting Started Welcome to our project! ## Installation Run pip install to get started. ## Usage Here's how to use the library. ### Basic Example Code example here. ### Advanced Example More complex code here. ## Contributing We welcome contributions! """, {"type": "guide", "version": "1.0"} ) ]) # Creates multiple vectors: # - readme_head_0: "# Getting Started\nWelcome to our project!" # - readme_head_1: "## Installation\nRun pip install to get started." # - readme_head_2: "## Usage\nHere's how to use the library." # - readme_head_3: "### Basic Example\nCode example here." # - readme_head_4: "### Advanced Example\nMore complex code here." # - readme_head_5: "## Contributing\nWe welcome contributions!" # Query documentation with text results = docs.query( data="how do I install this?", limit=3, include_metadata=True ) vx.disconnect() ``` -------------------------------- ### Perform Vector Search with Metadata Filtering Source: https://github.com/supabase/vecs/blob/main/docs/api.md Performs a vector search query while applying filters based on the associated metadata. The example demonstrates filtering by a specific year using the '$eq' operator. Refer to the metadata guide for a comprehensive list of filtering options. ```python docs.query( data=[0.4,0.5,0.6], filters={"year": {"$eq": 2012}}, # metadata filters ) ``` -------------------------------- ### Create Custom Index with Parameters in Python Source: https://github.com/supabase/vecs/blob/main/docs/concepts_indexes.md Illustrates how to create an index with specific configurations for method, distance measure, and index arguments in Python. This allows for fine-tuning performance based on specific needs, using HNSW as an example. ```python docs.create_index( method=IndexMethod.hnsw, measure=IndexMeasure.cosine_distance, index_arguments=IndexArgsHNSW(m=8), ) ``` -------------------------------- ### JSON Metadata Query Examples for Supabase Vecs Source: https://github.com/supabase/vecs/blob/main/docs/concepts_metadata.md Illustrates various metadata filtering scenarios using JSON format, including equality, range, logical AND/OR, and list containment. ```json { "year": {"$eq": 2020} } ``` ```json { "$or": [ {"year": {"$eq": 2020}}, {"gross": {"$gte": 5000.0}} ] } ``` ```json { "$and": [ {"last_name": {"$lt": "Brown"}}, {"is_priority_customer": {"$gte": 5000.00}} ] } ``` ```json { "priority": {"$in": ["enterprise", "pro"]} } ``` ```json { "tags": {"$contains": "important"} } ``` -------------------------------- ### Create Collection with Text Embedding Adapter Source: https://github.com/supabase/vecs/blob/main/docs/api.md Creates or retrieves a collection named 'docs' and configures it with an adapter. This adapter chunks text into paragraphs and then embeds each chunk using the 'all-MiniLM-L6-v2' model. This setup allows upserting and querying using raw text. ```python import vecs from vecs.adapter import Adapter, ParagraphChunker, TextEmbedding # create vector store client vx = vecs.Client("postgresql://:@:/") # create a collection with an adapter docs = vx.get_or_create_collection( name="docs", adapter=Adapter( [ ParagraphChunker(skip_during_query=True), TextEmbedding(model='all-MiniLM-L6-v2'), ] ) ) ``` -------------------------------- ### Configure ParagraphChunker Adapter Step (Python) Source: https://github.com/supabase/vecs/blob/main/docs/concepts_adapters.md Example of configuring the ParagraphChunker adapter step within a Supabase Vecs collection. The `skip_during_query=True` argument is used to prevent text chunking when performing queries, which is often desired. ```python from vecs.adapter import Adapter, ParagraphChunker ... vx.get_or_create_collection( name="docs", adapter=Adapter( [ ParagraphChunker(skip_during_query=True), ... ] ) ) ``` -------------------------------- ### Create Collection with Text Embedding Adapter (Python) Source: https://github.com/supabase/vecs/blob/main/docs/concepts_adapters.md This snippet demonstrates how to create a Supabase Vecs collection with an adapter that chunks text into paragraphs and then converts each paragraph into an embedding vector using the 'all-MiniLM-L6-v2' model. It requires the 'vecs[text_embedding]' extra to be installed. ```python import vecs from vecs.adapter import Adapter, ParagraphChunker, TextEmbedding # create vector store client vx = vecs.Client("postgresql://:@:/") # create a collection with an adapter docs = vx.get_or_create_collection( name="docs", adapter=Adapter( [ ParagraphChunker(skip_during_query=True), TextEmbedding(model='all-MiniLM-L6-v2'), ] ) ) ``` -------------------------------- ### Get Existing Collection (Deprecated) Source: https://github.com/supabase/vecs/blob/main/docs/api.md Deprecated method for retrieving an existing vector collection by its name. The recommended alternative is `get_or_create_collection`. ```python docs = vx.get_collection(name="docs") ``` -------------------------------- ### Store Embeddings in PostgreSQL with Vecs (Python) Source: https://github.com/supabase/vecs/blob/main/docs/integrations_openai.md Stores generated embeddings into a PostgreSQL database using the vecs library. It connects to the database, creates or gets a collection named 'sentences' with the appropriate dimension, and upserts the records. An index is then created for efficient querying. Requires database connection details and the 'vecs' Python library. ```python import vecs DB_CONNECTION = "postgresql://:@:/" # create vector store client vx = vecs.Client(DB_CONNECTION) # create a collection named 'sentences' with 1536 dimensional vectors (default dimension for text-embedding-ada-002) sentences = vx.get_or_create_collection(name="sentences", dimension=1536) # upsert the embeddings into the 'sentences' collection sentences.upsert(records=embeddings) # create an index for the 'sentences' collection sentences.create_index() ``` -------------------------------- ### Get or Create a Collection in Vecs Source: https://github.com/supabase/vecs/blob/main/docs/api.md Retrieves an existing vector collection by name or creates a new one if it does not exist. Requires specifying the collection name and the dimension of the vectors it will store. ```python docs = vx.get_or_create_collection(name="docs", dimension=3) ``` -------------------------------- ### Store Embeddings in PostgreSQL with vecs Source: https://github.com/supabase/vecs/blob/main/docs/integrations_amazon_bedrock.md Inserts generated text embeddings into a PostgreSQL database using the vecs client. It creates or gets a collection, upserts the records, and then creates an index for efficient querying. ```python import vecs DB_CONNECTION = "postgresql://:@:/" # create vector store client vx = vecs.Client(DB_CONNECTION) # create a collection named 'sentences' with 1536 dimensional vectors # to match the default dimension of the Titan Embeddings G1 - Text model sentences = vx.get_or_create_collection(name="sentences", dimension=1536) # upsert the embeddings into the 'sentences' collection sentences.upsert(records=embeddings) # create an index for the 'sentences' collection sentences.create_index() ``` -------------------------------- ### Create and Manage Collections in vecs Source: https://context7.com/supabase/vecs/llms.txt Demonstrates how to create or retrieve existing vector collections in PostgreSQL using the vecs client. It includes listing collections, showing their dimensions and vector counts, and deleting collections. Requires 'vecs' and a database connection. ```python import vecs vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") # Create or get collection with 384 dimensions (e.g., for all-MiniLM-L6-v2 embeddings) docs = vx.get_or_create_collection(name="documents", dimension=384) # List all collections in database collections = vx.list_collections() for collection in collections: print(f"{collection.name}: {collection.dimension}D, {len(collection)} vectors") # Delete a collection vx.delete_collection("old_collection") # Always disconnect when done vx.disconnect() ``` -------------------------------- ### Custom Adapter Implementation with AdapterStep Source: https://context7.com/supabase/vecs/llms.txt This code snippet provides the necessary imports for implementing custom adapter steps in Vecs. It shows how to define custom transformations for data processing within the adapter pipeline, offering flexibility for specialized use cases. ```python import vecs from vecs.adapter import Adapter, AdapterStep, AdapterContext from typing import Iterable, Tuple, Any, Optional, Dict, Generator ``` -------------------------------- ### Create Indexes for Vector Search Performance in vecs Source: https://context7.com/supabase/vecs/llms.txt Illustrates how to create indexes on vector collections to accelerate similarity search queries. It supports automatic index selection (HNSW or IVFFlat) or explicit configuration of index method, distance measure, and specific index arguments (e.g., M, ef_construction for HNSW, n_lists for IVFFlat). Requires 'vecs' and a database connection. ```python import vecs import numpy as np vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") docs = vx.get_or_create_collection(name="articles", dimension=384) # Insert vectors first records = [(f"vec{i}", np.random.random(384), {"idx": i}) for i in range(10000)] docs.upsert(records) # Auto-select best index method (HNSW if available, otherwise IVFFlat) docs.create_index() # Or specify index method and distance measure explicitly docs.create_index( measure=vecs.IndexMeasure.cosine_distance, # Options: cosine_distance, l2_distance, l1_distance, max_inner_product method=vecs.IndexMethod.hnsw, # Options: auto, hnsw, ivfflat index_arguments=vecs.IndexArgsHNSW( m=16, # Max connections per layer (higher = better recall, more memory) ef_construction=64 # Size of candidate list during construction (higher = better quality, slower build) ) ) # IVFFlat index example (for very large datasets) docs.create_index( measure=vecs.IndexMeasure.cosine_distance, method=vecs.IndexMethod.ivfflat, index_arguments=vecs.IndexArgsIVFFlat( n_lists=100 # Number of inverted lists (rule of thumb: rows/1000 for small datasets) ), replace=True # Replace existing index ) ``` -------------------------------- ### Python: Initialize Vecs Client, Create Collection, Upsert, Index, and Query Source: https://github.com/supabase/vecs/blob/main/README.md This Python code demonstrates the core functionalities of the 'vecs' client. It initializes a client connection to a PostgreSQL database with the pgvector extension, creates or retrieves a vector collection, adds records with associated metadata, indexes the collection for search, and performs a similarity search with metadata filtering. The input is a database connection string and data for upserting and querying. The output of the query is a list of record identifiers. ```python import vecs DB_CONNECTION = "postgresql://:@:/" # create vector store client vx = vecs.create_client(DB_CONNECTION) # create a collection of vectors with 3 dimensions docs = vx.get_or_create_collection(name="docs", dimension=3) # add records to the *docs* collection docs.upsert( records=[ ( "vec0", # the vector's identifier [0.1, 0.2, 0.3], # the vector. list or np.array {"year": 1973} # associated metadata ), ( "vec1", [0.7, 0.8, 0.9], {"year": 2012} ) ] ) # index the collection for fast search performance docs.create_index() # query the collection filtering metadata for "year" = 2012 docs.query( data=[0.4,0.5,0.6], # required limit=1, # number of records to return filters={"year": {"$eq": 2012}}, # metadata filters ) # Returns: ["vec1"] ``` -------------------------------- ### Connect to PostgreSQL with vecs Client Source: https://context7.com/supabase/vecs/llms.txt Establishes a connection to a PostgreSQL database with the pgvector extension enabled using a connection string. It supports direct connection and context manager for automatic resource cleanup. Requires the 'vecs' library. ```python import vecs # Connect to database DB_CONNECTION = "postgresql://user:password@localhost:5432/dbname" vx = vecs.create_client(DB_CONNECTION) # Or use context manager for automatic cleanup with vecs.create_client(DB_CONNECTION) as vx: # Work with collections docs = vx.get_or_create_collection(name="documents", dimension=384) # Connection automatically closes on exit ``` -------------------------------- ### Create Default Index in Python Source: https://github.com/supabase/vecs/blob/main/docs/concepts_indexes.md Demonstrates the basic usage of creating a default index for a vector collection using the `create_index` method in Python. This is suitable for most use cases where default settings are sufficient. ```python docs.create_index() ``` -------------------------------- ### Embed and Search Documents with OpenAI and Vecs Source: https://context7.com/supabase/vecs/llms.txt Generates embeddings for documents using OpenAI's text-embedding-3-small model and upserts them into a vecs collection. It then performs a similarity search using a query embedding and prints the results. Requires `vecs` and `openai` libraries. ```python import vecs from openai import OpenAI # Initialize clients vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") openai_client = OpenAI(api_key="your-api-key") # Create collection for OpenAI embeddings (text-embedding-3-small = 1536 dimensions) docs = vx.get_or_create_collection(name="openai_docs", dimension=1536) # Generate embeddings with OpenAI documents = [ {"id": "doc1", "text": "Python is a programming language", "category": "tech"}, {"id": "doc2", "text": "Machine learning is a subset of AI", "category": "ai"}, {"id": "doc3", "text": "PostgreSQL is a database system", "category": "database"} ] # Batch embed documents texts = [doc["text"] for doc in documents] response = openai_client.embeddings.create( model="text-embedding-3-small", input=texts ) # Prepare records for vecs records = [ ( doc["id"], embedding.embedding, # OpenAI returns list of floats {"text": doc["text"], "category": doc["category"]} ) for doc, embedding in zip(documents, response.data) ] # Upsert to vecs docs.upsert(records) # Create index docs.create_index() # Query with new text query_text = "Tell me about databases" query_response = openai_client.embeddings.create( model="text-embedding-3-small", input=[query_text] ) query_embedding = query_response.data[0].embedding # Search for similar documents results = docs.query( data=query_embedding, limit=3, include_value=True, include_metadata=True ) for doc_id, distance, metadata in results: print(f"ID: {doc_id}") print(f"Text: {metadata['text']}") print(f"Distance: {distance:.4f}") print() vx.disconnect() ``` -------------------------------- ### Query for Similar Sentences using Vecs (Python) Source: https://github.com/supabase/vecs/blob/main/docs/integrations_openai.md Queries the 'sentences' collection in the PostgreSQL database to find sentences most similar to a given query sentence. It first creates an embedding for the query sentence using OpenAI's API, then uses vecs to perform the similarity search, returning the top 3 most similar records and their distances. Requires 'openai' and 'vecs' libraries, and a configured database connection. ```python query_sentence = "A quick animal jumps over a lazy one." # create an embedding for the query sentence response = openai.Embedding.create( model="text-embedding-ada-002", input=[query_sentence] ) query_embedding = response["data"][0]["embedding"] # query the 'sentences' collection for the most similar sentences results = sentences.query( data=query_embedding, limit=3, include_value = True ) # print the results for result in results: print(result) ``` -------------------------------- ### Create Embeddings using OpenAI API (Python) Source: https://github.com/supabase/vecs/blob/main/docs/integrations_openai.md Generates embeddings for a list of sentences using OpenAI's 'text-embedding-ada-002' model. Requires an OpenAI API key and the 'openai' Python library. Outputs a list of tuples, each containing the original sentence, its embedding, and metadata. ```python import openai openai.api_key = '' dataset = [ "The cat sat on the mat.", "The quick brown fox jumps over the lazy dog.", "Friends, Romans, countrymen, lend me your ears", "To be or not to be, that is the question.", ] embeddings = [] for sentence in dataset: response = openai.Embedding.create( model="text-embedding-ada-002", input=[sentence] ) embeddings.append((sentence, response["data"][0]["embedding"], {})) ``` -------------------------------- ### Direct SQL Access to Vecs Collections Source: https://context7.com/supabase/vecs/llms.txt Demonstrates how to access vecs collections directly using SQLAlchemy for advanced SQL queries. This allows for filtering metadata, checking vector dimensions, and counting vectors. Requires `vecs` and `sqlalchemy` libraries. ```python import vecs from sqlalchemy import text vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") # Create collection through vecs docs = vx.get_or_create_collection(name="articles", dimension=384) # Upsert some data docs.upsert([ ("article1", [0.1] * 384, {"title": "First Article", "views": 100}), ("article2", [0.2] * 384, {"title": "Second Article", "views": 200}) ]) # Access underlying SQLAlchemy session for direct SQL with vx.Session() as session: # Collections are stored in 'vecs' schema # Table structure: (id TEXT PRIMARY KEY, vec VECTOR(dimension), metadata JSONB) # Query metadata directly result = session.execute( text("SELECT id, metadata FROM vecs.articles WHERE (metadata->>'views')::int > 150") ) for row in result: print(f"ID: {row.id}, Metadata: {row.metadata}") # Get vector dimension result = session.execute( text("SELECT vector_dims(vec) as dims FROM vecs.articles LIMIT 1") ) print(f"Vector dimensions: {result.scalar()}") # Count vectors result = session.execute( text("SELECT COUNT(*) FROM vecs.articles") ) print(f"Total vectors: {result.scalar()}") session.commit() # Note: Avoid DDL operations (CREATE, DROP, ALTER) via SQL # Always use vecs client methods for schema changes vx.disconnect() ``` -------------------------------- ### Configure TextEmbedding Adapter Step (Python) Source: https://github.com/supabase/vecs/blob/main/docs/concepts_adapters.md This Python code demonstrates configuring the TextEmbedding adapter step in Supabase Vecs, specifying the embedding model to be used (e.g., 'all-MiniLM-L6-v2'). This adapter handles the conversion of text input into vector embeddings. ```python from vecs.adapter import Adapter, TextEmbedding ... vx.get_or_create_collection( name="docs", adapter=Adapter( [ TextEmbedding(model='all-MiniLM-L6-v2') ] ) ) # search by text docs.query(data="foo bar") ``` -------------------------------- ### Text Embedding Adapter for Automatic Chunking and Embedding Source: https://context7.com/supabase/vecs/llms.txt This snippet shows how to set up and use the Text Embedding Adapter in Vecs. It automatically chunks text into paragraphs and embeds them using a specified model. The adapter can also be configured to skip chunking during queries. ```python import vecs from vecs.adapter import Adapter, ParagraphChunker, TextEmbedding # Install text embedding support first: # pip install "vecs[text_embedding]" vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") # Create collection with text adapter pipeline docs = vx.get_or_create_collection( name="blog_posts", adapter=Adapter([ ParagraphChunker(skip_during_query=True), # Split text into paragraphs TextEmbedding(model='all-MiniLM-L6-v2') # Embed text (384 dimensions) ]) ) # Upsert text directly (automatically chunked and embedded) docs.upsert([ ( "post_1", """ This is the first paragraph of my blog post. This is the second paragraph with more content. And here's a third paragraph with conclusions. """, {"author": "Alice", "date": "2024-01-15"} ), ( "post_2", "Short post without paragraph breaks.", {"author": "Bob", "date": "2024-01-16"} ) ]) # Note: post_1 creates 3 vectors (post_1_para_0, post_1_para_1, post_1_para_2) # post_2 creates 1 vector (post_2) # Query with text (automatically embedded, no chunking due to skip_during_query=True) results = docs.query( data="tell me about blog conclusions", limit=5, include_value=True, include_metadata=True ) # Can still query with raw vectors by skipping adapter import numpy as np results = docs.query( data=np.random.random(384), limit=5, skip_adapter=True # Bypass adapter pipeline ) # Available models (from sentence-transformers) # - 'all-mpnet-base-v2' (768 dim) # - 'all-MiniLM-L6-v2' (384 dim) - fast and good quality # - 'multi-qa-mpnet-base-dot-v1' (768 dim) # - 'all-distilroberta-v1' (768 dim) # See vecs.adapter.TextEmbeddingModel for full list vx.disconnect() ``` -------------------------------- ### Connect to PostgreSQL Vector Store with Vecs Source: https://github.com/supabase/vecs/blob/main/docs/api.md Establishes a connection to a PostgreSQL database with the pgvector extension using the vecs client. Requires a valid database connection string. ```python import vecs DB_CONNECTION = "postgresql://:@:/" # create vector store client vx = vecs.create_client(DB_CONNECTION) ``` -------------------------------- ### Create Sentence Embeddings using Hugging Face API Source: https://github.com/supabase/vecs/blob/main/docs/integrations_huggingface_inference_endpoints.md Generates embeddings for a list of sentences by sending them to a Hugging Face Inference Endpoint. It requires the endpoint URL and an API key for authentication. The output is a list of tuples, each containing the original sentence and its corresponding embedding vector. ```python import requests import json huggingface_endpoint_url = '' huggingface_api_key = '' dataset = [ "The cat sat on the mat.", "The quick brown fox jumps over the lazy dog.", "Friends, Romans, countrymen, lend me your ears", "To be or not to be, that is the question.", ] records = [] for sentence in dataset: response = requests.post( huggingface_endpoint_url, headers={ "Authorization": f"Bearer {huggingface_api_key}", "Content-Type": "application/json" }, json={"inputs": sentence} ) embedding = response.json()["embeddings"] records.append((sentence, embedding, {{}})) ``` -------------------------------- ### Metadata Filter Operators for Vector Queries Source: https://context7.com/supabase/vecs/llms.txt Provides a comprehensive reference for using metadata filter operators in vector queries. It demonstrates various comparison operators like `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$in`, and `$contains`, as well as logical operators `$and` and `$or` for building complex filter conditions. ```python import vecs vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") docs = vx.get_or_create_collection(name="articles", dimension=384) query_vec = [0.1] * 384 # Equality: $eq docs.query(data=query_vec, filters={"status": {"$eq": "published"}}) # Not equal: $ne docs.query(data=query_vec, filters={"status": {"$ne": "draft"}}) # Greater than: $gt docs.query(data=query_vec, filters={"score": {"$gt": 8.5}}) # Greater than or equal: $gte docs.query(data=query_vec, filters={"year": {"$gte": 2020}}) # Less than: $lt docs.query(data=query_vec, filters={"priority": {"$lt": 5}}) # Less than or equal: $lte docs.query(data=query_vec, filters={"age": {"$lte": 30}}) # In list: $in docs.query(data=query_vec, filters={"category": {"$in": ["tech", "science", "engineering"]}}) # Array contains scalar: $contains docs.query(data=query_vec, filters={"tags": {"$contains": "important"}}) # Logical AND: $and docs.query( data=query_vec, filters={ "$and": [ {"year": {"$eq": 2023}}, {"category": {"$eq": "tech"}}, {"views": {"$gte": 1000}} ] } ) # Logical OR: $or docs.query( data=query_vec, filters={ "$or": [ {"priority": {"$eq": "high"}}, {"urgent": {"$eq": True}} ] } ) # Nested logical operators docs.query( data=query_vec, filters={ "$and": [ { "$or": [ {"category": {"$eq": "tech"}}, {"category": {"$eq": "science"}} ] }, {"year": {"$gte": 2022}} ] } ) vx.disconnect() ``` -------------------------------- ### Query for Similar Sentences using vecs Source: https://github.com/supabase/vecs/blob/main/docs/integrations_amazon_bedrock.md Queries the 'sentences' collection to find sentences most similar to a given query sentence. It first creates an embedding for the query and then uses the vecs collection's query method to retrieve the nearest neighbors. ```python query_sentence = "A quick animal jumps over a lazy one." # create vector store client vx = vecs.Client(DB_CONNECTION) # create an embedding for the query sentence response = client.invoke_model( body= json.dumps({"inputText": query_sentence}), modelId= "amazon.titan-embed-text-v1", accept = "application/json", contentType = "application/json" ) response_body = json.loads(response["body"].read()) query_embedding = response_body.get("embedding") # query the 'sentences' collection for the most similar sentences results = sentences.query( data=query_embedding, limit=3, include_value = True ) # print the results for result in results: print(result) ``` -------------------------------- ### Query Vectors with Similarity Search and Metadata Filtering Source: https://context7.com/supabase/vecs/llms.txt Performs similarity searches for vectors within a collection. Supports basic queries, filtering by metadata (e.g., year, category, views), including different types of results (scores, metadata, vectors), and fine-tuning performance with specific index parameters like `ef_search` for HNSW or `probes` for IVFFlat. ```python import vecs import numpy as np vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") docs = vx.get_or_create_collection(name="articles", dimension=384) # Basic query: find 5 most similar vectors query_vector = np.random.random(384) results = docs.query( data=query_vector, limit=5 ) # Returns: ["article_1", "article_5", ...] (just IDs by default) # Query with metadata filters results = docs.query( data=query_vector, limit=10, filters={"year": {"$gte": 2023}}, # Only articles from 2023 or later include_value=True, # Include distance/similarity scores include_metadata=True, # Include metadata in results include_vector=False # Don't include vectors (save bandwidth) ) # Returns: [("article_1", 0.234, {"title": "...", "year": 2023}), ...] # Query with complex metadata filters results = docs.query( data=query_vector, limit=5, filters={ "$and": [ {"category": {"$eq": "tech"}}, {"views": {"$gt": 1000}}, {"year": {"$in": [2022, 2023]}} ] }, include_value=True, include_metadata=True ) # Query with OR logic results = docs.query( data=query_vector, limit=5, filters={ "$or": [ {"category": {"$eq": "urgent"}}, {"views": {"$gte": 5000}} ] } ) # Fine-tune query performance (HNSW-specific) results = docs.query( data=query_vector, limit=10, ef_search=100 # Larger = more accurate but slower (default: 40) ) # Fine-tune query performance (IVFFlat-specific) results = docs.query( data=query_vector, limit=10, probes=20 # Number of lists to scan (higher = more accurate but slower) ) vx.disconnect() ``` -------------------------------- ### Store Embeddings in Vecs and Create Index Source: https://github.com/supabase/vecs/blob/main/docs/integrations_huggingface_inference_endpoints.md Connects to a PostgreSQL database using a connection string, creates or retrieves a vector collection named 'sentences' with a specified dimension, and then upserts the generated embeddings into this collection. Finally, it creates an index on the collection for efficient querying. ```python import vecs DB_CONNECTION = "postgresql://:@:/" # create vector store client vx = vecs.Client(DB_CONNECTION) # create a collection named 'sentences' with 384 dimensional vectors (default dimension for paraphrase-MiniLM-L6-v2) sentences = vx.get_or_create_collection(name="sentences", dimension=384) # upsert the embeddings into the 'sentences' collection sentences.upsert(records=records) # create an index for the 'sentences' collection sentences.create_index() ``` -------------------------------- ### Create Text Embeddings with Amazon Bedrock Source: https://github.com/supabase/vecs/blob/main/docs/integrations_amazon_bedrock.md Generates text embeddings using Amazon's Titan Embeddings G1 - Text v1.2 model. It iterates through a list of sentences, invokes the Bedrock runtime client, and collects the resulting embeddings. ```python import boto3 import vecs import json client = boto3.client( 'bedrock-runtime', region_name='us-east-1', # Credentials from your AWS account aws_access_key_id='', aws_secret_access_key='', aws_session_token='', ) dataset = [ "The cat sat on the mat.", "The quick brown fox jumps over the lazy dog.", "Friends, Romans, countrymen, lend me your ears", "To be or not to be, that is the question.", ] embeddings = [] for sentence in dataset: # invoke the embeddings model for each sentence response = client.invoke_model( body= json.dumps({"inputText": sentence}), modelId= "amazon.titan-embed-text-v1", accept = "application/json", contentType = "application/json" ) # collect the embedding from the response response_body = json.loads(response["body"].read()) # add the embedding to the embedding list embeddings.append((sentence, response_body.get("embedding"), {})) ``` -------------------------------- ### Query Collection Using Text Source: https://github.com/supabase/vecs/blob/main/docs/api.md Queries a collection using plain text as the input data. The collection's configured adapter will automatically process the text, generate its vector representation, and perform the search. ```python # search by text docs.query(data="foo bar") ``` -------------------------------- ### Upsert Vectors with Metadata using vecs Source: https://context7.com/supabase/vecs/llms.txt Shows how to insert or update vectors along with their associated metadata in a PostgreSQL collection using the vecs library. Supports both list and NumPy array formats for vectors. Requires 'vecs', 'numpy', and a database connection. ```python import vecs import numpy as np vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") docs = vx.get_or_create_collection(name="articles", dimension=384) # Prepare records: (id, vector, metadata) records = [ ( "article_1", [0.1, 0.2, 0.3, ...], # 384-dimensional vector (can use list or np.array) {"title": "Introduction to AI", "year": 2023, "category": "tech", "views": 1500} ), ( "article_2", np.random.random(384), {"title": "Machine Learning Basics", "year": 2022, "category": "tech", "views": 2300} ), ( "article_3", np.random.random(384), {"title": "Data Science Guide", "year": 2023, "category": "data", "views": 980} ) ] # Upsert records (insert new or update existing by ID) docs.upsert(records) print(f"Collection now has {len(docs)} vectors") # Update existing record (same ID overwrites) updated_record = ("article_1", np.random.random(384), {"title": "AI Updated", "year": 2024}) docs.upsert([updated_record]) vx.disconnect() ``` -------------------------------- ### Upsert Records with Text Media Type (Python) Source: https://github.com/supabase/vecs/blob/main/docs/concepts_adapters.md After setting up a collection with an adapter, this Python code shows how to upsert records using plain text as the media type. The adapter will automatically handle the transformation (e.g., chunking and embedding) before ingestion. ```python # add records to the collection using text as the media type docs.upsert( records=[ ( "vec0", "four score and ....", # <- note that we can now pass text here {"year": 1973} ), ( "vec1", "hello, world!", {"year": "2012"} ) ] ) ``` -------------------------------- ### Create Collection (Deprecated) Source: https://github.com/supabase/vecs/blob/main/docs/api.md Deprecated method for creating a new vector collection. It requires a name and the dimensionality of the vectors to be stored. The recommended alternative is `get_or_create_collection`. ```python docs = vx.create_collection(name="docs", dimension=3) ``` -------------------------------- ### Query Collection with Text Data (Python) Source: https://github.com/supabase/vecs/blob/main/docs/concepts_adapters.md This Python snippet illustrates querying a Supabase Vecs collection using text as the query input. The adapter associated with the collection will process the text query, transforming it into the appropriate format (e.g., an embedding vector) for searching. ```python # search by text docs.query(data="foo bar") ``` -------------------------------- ### Perform Basic Vector Search Source: https://github.com/supabase/vecs/blob/main/docs/api.md Executes a basic search query using a provided vector. It allows specifying the number of results, metadata filters, distance measure, and whether to include additional data like distance values, metadata, or vectors in the response. Indexes are crucial for performance; missing them results in a warning. ```python docs.query( data=[0.4,0.5,0.6], # required limit=5, # number of records to return filters={}, # metadata filters measure="cosine_distance", # distance measure to use include_value=False, # should distance measure values be returned? include_metadata=False, # should record metadata be returned? include_vector=False, # should vectors be returned? ) ``` -------------------------------- ### Fetch Specific Vectors by ID Source: https://context7.com/supabase/vecs/llms.txt Retrieves vectors from a collection based on their unique identifiers. It supports fetching multiple vectors simultaneously using a list of IDs or fetching a single vector by using dictionary-like indexing. Non-existent IDs are silently ignored. ```python import vecs vx = vecs.create_client("postgresql://user:password@localhost:5432/dbname") docs = vx.get_or_create_collection(name="articles", dimension=384) # Fetch multiple vectors by ID ids_to_fetch = ["article_1", "article_5", "article_10"] records = docs.fetch(ids=ids_to_fetch) # Returns: [("article_1", [0.1, 0.2, ...], {"title": "...", "year": 2023}), ...] for record_id, vector, metadata in records: print(f"ID: {record_id}, Metadata: {metadata}") # Fetch single vector using indexing single_record = docs["article_1"] # Returns: ("article_1", array([0.1, 0.2, ...]), {"title": "...", "year": 2023}) # Non-existent IDs are silently skipped (no error) records = docs.fetch(ids=["article_1", "does_not_exist", "article_5"]) # Returns only 2 records (article_1 and article_5) vx.disconnect() ``` -------------------------------- ### Query for Most Similar Sentences using Vecs Source: https://github.com/supabase/vecs/blob/main/docs/integrations_huggingface_inference_endpoints.md Embeds a query sentence using the Hugging Face API and then queries the 'sentences' collection in vecs to find the top 3 most similar sentences. The results include the sentence text and its similarity score (distance) to the query. ```python query_sentence = "A quick animal jumps over a lazy one." # create an embedding for the query sentence response = requests.post( huggingface_endpoint_url, headers={ "Authorization": f"Bearer {huggingface_api_key}", "Content-Type": "application/json" }, json={"inputs": query_sentence} ) query_embedding = response.json()["embeddings"] # query the 'sentences' collection for the most similar sentences results = sentences.query( data=query_embedding, limit=3, include_value = True ) # print the results for result in results: print(result) ``` -------------------------------- ### PostgreSQL Table Schema for Collections (SQL) Source: https://github.com/supabase/vecs/blob/main/docs/concepts_collections.md Illustrates the underlying PostgreSQL table schema for a Supabase Vecs collection. This schema maps directly to vector records, with columns for ID, vector data, and metadata. ```sql create table ( id string primary key, vec vector(), metadata jsonb ) ``` -------------------------------- ### Disconnect Vecs Client Source: https://github.com/supabase/vecs/blob/main/docs/api.md Demonstrates how to explicitly disconnect the Vecs client from the database when no longer needed. Alternatively, the client can be used as a context manager, ensuring automatic disconnection upon exiting the block. ```python vx.disconnect() ``` ```python import vecs DB_CONNECTION = "postgresql://:@:/" # create vector store client with vecs.create_client(DB_CONNECTION) as vx: # do some work here pass # connections are now closed ``` -------------------------------- ### Delete Vectors by IDs, Metadata Filter, or Complex Filter Source: https://context7.com/supabase/vecs/llms.txt Demonstrates how to delete vectors from a collection using various criteria: a list of IDs, a metadata filter (e.g., less than a certain year), or a complex filter combining multiple conditions. It also shows that attempting to delete non-existent records does not raise an error. ```python deleted_ids = docs.delete(ids=["article_1", "article_2", "article_3"]) print(f"Deleted {len(deleted_ids)} vectors: {deleted_ids}") deleted_ids = docs.delete(filters={"year": {"$lt": 2020}}) print(f"Deleted {len(deleted_ids)} old articles") deleted_ids = docs.delete( filters={ "$and": [ {"status": {"$eq": "archived"}}, {"views": {"$lt": 100}} ] } ) # Note: Attempting to delete non-existent records does not raise an error docs.delete(ids=["does_not_exist"]) # Returns [] vx.disconnect() ``` -------------------------------- ### Upsert Records with Text Data Source: https://github.com/supabase/vecs/blob/main/docs/api.md Upserts records into a collection that has a text embedding adapter configured. Instead of providing vectors, raw text and associated metadata can be passed directly. The adapter handles the text chunking and vector embedding process. ```python # add records to the collection using text as the media type docs.upsert( records=[ ( "vec0", "four score and ....", # <- note that we can now pass text here {"year": 1973} ), ( "vec1", "hello, world!", {"year": "2012"} ) ] ) ```