Try Live
Add Docs
Rankings
Pricing
Docs
Install
Theme
Install
Docs
Pricing
More...
More...
Try Live
Rankings
Enterprise
Create API Key
Add Docs
Chroma
https://github.com/chroma-core/chroma
Admin
Chroma is the open-source embedding database, offering the fastest way to build LLM apps with memory
...
Tokens:
227,013
Snippets:
1,814
Trust Score:
8.7
Update:
3 weeks ago
Context
Skills
Chat
Benchmark
81.2
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# ChromaDB ChromaDB is the open-source AI-native vector database designed for building AI applications. It provides everything needed to store embeddings, documents, and metadata, enabling efficient similarity search and retrieval. ChromaDB handles tokenization, embedding, and indexing automatically, making it simple to build semantic search, RAG (Retrieval-Augmented Generation), and other AI-powered applications. The core API is remarkably simple, consisting of just four main operations: create a collection, add data, query for similar items, and manage your data. ChromaDB supports multiple client types including in-memory for prototyping, persistent for local development, HTTP client for client-server deployments, and CloudClient for Chroma Cloud. It offers native support for Python, TypeScript/JavaScript, and Rust, with automatic embedding generation using built-in embedding functions or custom providers like OpenAI, Cohere, and more. ## Client Initialization ChromaDB provides several client types for different deployment scenarios. The in-memory client is perfect for prototyping, PersistentClient saves data to disk, HttpClient connects to a Chroma server, and CloudClient connects to Chroma Cloud. ```python import chromadb # In-memory client (ephemeral, data lost on restart) client = chromadb.Client() # Persistent client (data saved to disk) client = chromadb.PersistentClient(path="./chroma_db") # HTTP client (connect to a running Chroma server) client = chromadb.HttpClient(host="localhost", port=8000) # Cloud client (connect to Chroma Cloud) client = chromadb.CloudClient( tenant="your-tenant-id", database="your-database", api_key="your-api-key" ) # Check server connection heartbeat = client.heartbeat() # Returns nanosecond timestamp print(f"Server heartbeat: {heartbeat}") ``` ## Create Collection Collections are the fundamental unit of storage in ChromaDB. They store embeddings, documents, and metadata. Collection names must be 3-512 characters, start/end with alphanumeric characters, and be unique within a database. ```python import chromadb from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction client = chromadb.Client() # Create a basic collection (uses default sentence-transformer embedding) collection = client.create_collection(name="my_documents") # Create collection with custom embedding function collection = client.create_collection( name="openai_collection", embedding_function=OpenAIEmbeddingFunction( api_key="your-openai-api-key", model_name="text-embedding-3-small" ), metadata={ "description": "Documents embedded with OpenAI", "created": "2024-01-15" } ) # Get or create (idempotent - won't fail if exists) collection = client.get_or_create_collection(name="my_collection") # Get existing collection collection = client.get_collection(name="my_collection") # List all collections with pagination collections = client.list_collections(limit=100, offset=0) # Count collections count = client.count_collections() # Delete collection (destructive, cannot be undone) client.delete_collection(name="old_collection") ``` ## Add Data to Collection Add documents, embeddings, and metadata to a collection. Each record requires a unique string ID. ChromaDB automatically generates embeddings from documents if not provided. ```python import chromadb client = chromadb.Client() collection = client.get_or_create_collection(name="books") # Add documents (embeddings generated automatically) collection.add( ids=["doc1", "doc2", "doc3"], documents=[ "The quick brown fox jumps over the lazy dog", "Machine learning is transforming industries", "Vector databases enable semantic search" ], metadatas=[ {"source": "tutorial", "chapter": 1, "tags": ["animals", "classic"]}, {"source": "article", "chapter": 2, "year": 2024}, {"source": "documentation", "chapter": 3, "priority": 1} ] ) # Add with pre-computed embeddings collection.add( ids=["vec1", "vec2"], embeddings=[[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]], documents=["First document", "Second document"], metadatas=[{"type": "example"}, {"type": "example"}] ) # Add embeddings only (no documents, useful for external document storage) collection.add( ids=["emb1", "emb2"], embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], metadatas=[{"external_id": "ext-001"}, {"external_id": "ext-002"}] ) ``` ## Query Collection Query a collection to find the most similar documents using semantic similarity search. Supports text queries, embedding queries, metadata filtering, and document content filtering. ```python import chromadb client = chromadb.Client() collection = client.get_collection(name="books") # Basic text query (embeddings generated automatically) results = collection.query( query_texts=["What is machine learning?"], n_results=5 ) print(f"IDs: {results['ids']}") print(f"Documents: {results['documents']}") print(f"Distances: {results['distances']}") # Query with embeddings directly results = collection.query( query_embeddings=[[0.1, 0.2, 0.3, 0.4]], n_results=10 ) # Batch query (multiple queries at once) results = collection.query( query_texts=["first query", "second query"], n_results=5 ) # Results are grouped by query: results['ids'][0] for first query, results['ids'][1] for second # Query with metadata filter results = collection.query( query_texts=["search term"], n_results=10, where={"chapter": {"$gt": 1}}, # Only chapters > 1 where_document={"$contains": "keyword"} # Document must contain "keyword" ) # Choose what fields to return results = collection.query( query_texts=["my query"], n_results=5, include=["documents", "metadatas", "embeddings", "distances"] ) # Constrain search to specific IDs results = collection.query( query_texts=["query"], n_results=5, ids=["doc1", "doc2", "doc3"] # Only search within these IDs ) ``` ## Get Records by ID Retrieve records by ID or filters without similarity ranking. Useful for fetching specific documents or paginating through a collection. ```python import chromadb client = chromadb.Client() collection = client.get_collection(name="books") # Get by specific IDs results = collection.get(ids=["doc1", "doc2"]) print(f"IDs: {results['ids']}") print(f"Documents: {results['documents']}") print(f"Metadatas: {results['metadatas']}") # Get with pagination results = collection.get(limit=100, offset=0) # Get with metadata filter results = collection.get( where={"source": "tutorial"} ) # Get with document content filter results = collection.get( where_document={"$contains": "machine learning"} ) # Combine filters results = collection.get( where={"chapter": {"$gte": 2}}, where_document={"$contains": "search"}, limit=50 ) # Choose which fields to return results = collection.get( ids=["doc1"], include=["documents", "embeddings", "metadatas"] ) # Convenience methods count = collection.count() # Total records in collection preview = collection.peek(limit=10) # First 10 records ``` ## Update and Upsert Records Update existing records or use upsert to update if exists, insert if not. ```python import chromadb client = chromadb.Client() collection = client.get_collection(name="books") # Update existing records (fails silently if ID not found) collection.update( ids=["doc1", "doc2"], documents=["Updated document 1", "Updated document 2"], metadatas=[{"updated": True, "version": 2}, {"updated": True, "version": 2}] ) # Update embeddings directly collection.update( ids=["doc1"], embeddings=[[0.9, 0.8, 0.7, 0.6]] ) # Upsert: update if exists, insert if not collection.upsert( ids=["doc1", "new_doc"], documents=["Updated or new document 1", "Brand new document"], metadatas=[{"status": "upserted"}, {"status": "upserted"}] ) # Upsert with embeddings collection.upsert( ids=["vec1", "vec2"], embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], documents=["Doc 1", "Doc 2"], metadatas=[{"type": "vector"}, {"type": "vector"}] ) ``` ## Delete Records Delete records from a collection by ID or using filters. This operation is destructive and cannot be undone. ```python import chromadb client = chromadb.Client() collection = client.get_collection(name="books") # Delete by specific IDs collection.delete(ids=["doc1", "doc2", "doc3"]) # Delete by metadata filter (deletes all matching records) collection.delete( where={"chapter": "20"} ) # Delete by document content filter collection.delete( where_document={"$contains": "deprecated"} ) # Combine ID and filter (deletes records matching both) collection.delete( ids=["doc1", "doc2"], where={"status": "archived"} ) ``` ## Metadata Filtering Filter query and get results using metadata conditions with comparison operators, logical operators, and inclusion operators. ```python import chromadb client = chromadb.Client() collection = client.get_collection(name="products") # Comparison operators: $eq, $ne, $gt, $gte, $lt, $lte results = collection.query( query_texts=["laptop"], where={"price": {"$lt": 1000}} # price less than 1000 ) results = collection.query( query_texts=["laptop"], where={"rating": {"$gte": 4.5}} # rating >= 4.5 ) # Equality (shorthand) results = collection.get(where={"category": "electronics"}) # Equivalent to: results = collection.get(where={"category": {"$eq": "electronics"}}) # Logical operators: $and, $or results = collection.query( query_texts=["phone"], where={ "$and": [ {"price": {"$gte": 500}}, {"price": {"$lte": 1000}}, {"brand": "Apple"} ] } ) results = collection.get( where={ "$or": [ {"color": "red"}, {"color": "blue"} ] } ) # Inclusion operators: $in, $nin results = collection.get( where={"author": {"$in": ["Rowling", "Tolkien", "Martin"]}} ) results = collection.get( where={"status": {"$nin": ["deleted", "archived"]}} ) # Array metadata with $contains and $not_contains collection.add( ids=["movie1", "movie2"], documents=["Action movie", "Drama movie"], metadatas=[ {"genres": ["action", "thriller"], "year": 2023}, {"genres": ["drama", "romance"], "year": 2024} ] ) results = collection.get( where={"genres": {"$contains": "action"}} ) results = collection.get( where={"genres": {"$not_contains": "horror"}} ) # Combine metadata and document filters results = collection.query( query_texts=["exciting story"], where={"year": {"$gte": 2020}}, where_document={"$contains": "adventure"} ) ``` ## Document Content Filtering Filter documents by their text content using contains, regex, and logical operators. ```python import chromadb client = chromadb.Client() collection = client.get_collection(name="articles") # Contains filter results = collection.get( where_document={"$contains": "machine learning"} ) # Not contains filter results = collection.get( where_document={"$not_contains": "deprecated"} ) # Regex filter results = collection.get( where_document={"$regex": "chapter \\d+"} # Matches "chapter 1", "chapter 2", etc. ) # Not regex filter results = collection.get( where_document={"$not_regex": "^DRAFT:"} # Exclude documents starting with "DRAFT:" ) # Combine with logical operators results = collection.get( where_document={ "$and": [ {"$contains": "python"}, {"$not_contains": "deprecated"} ] } ) # Combine with query and metadata filter results = collection.query( query_texts=["programming tutorials"], n_results=10, where={"category": "tech"}, where_document={ "$or": [ {"$contains": "beginner"}, {"$contains": "tutorial"} ] } ) ``` ## Modify Collection Update a collection's name or metadata after creation. ```python import chromadb client = chromadb.Client() collection = client.get_collection(name="my_collection") # Modify collection name collection.modify(name="renamed_collection") # Modify collection metadata collection.modify( metadata={ "description": "Updated description", "last_modified": "2024-01-15" } ) # Modify both name and metadata collection.modify( name="new_name", metadata={"version": "2.0"} ) ``` ## TypeScript Client ChromaDB provides a TypeScript/JavaScript client that connects to a running Chroma server. ```typescript import { ChromaClient, CloudClient } from "chromadb"; // Connect to local server const client = new ChromaClient({ host: "localhost", port: 8000, }); // Or connect to Chroma Cloud const cloudClient = new CloudClient({ tenant: "your-tenant", database: "your-database", apiKey: "your-api-key", }); // Create collection const collection = await client.createCollection({ name: "my_collection", metadata: { description: "My documents" }, }); // Add documents await collection.add({ ids: ["id1", "id2"], documents: ["Document about cats", "Document about dogs"], metadatas: [{ animal: "cat" }, { animal: "dog" }], }); // Query const results = await collection.query({ queryTexts: ["pets"], nResults: 5, where: { animal: "cat" }, }); // Iterate over results for (const batch of results.rows()) { for (const row of batch) { console.log(row.id, row.document, row.metadata, row.distance); } } // Get by ID const docs = await collection.get({ ids: ["id1"], include: ["documents", "metadatas"], }); // Update await collection.update({ ids: ["id1"], documents: ["Updated document about cats"], metadatas: [{ animal: "cat", updated: true }], }); // Upsert await collection.upsert({ ids: ["id1", "id3"], documents: ["Upserted doc 1", "New doc 3"], metadatas: [{ status: "upserted" }, { status: "new" }], }); // Delete await collection.delete({ ids: ["id1"] }); // List collections const collections = await client.listCollections({ limit: 100 }); // Delete collection await client.deleteCollection({ name: "my_collection" }); ``` ## Rust Client The Rust client connects to a running Chroma server and requires embeddings to be provided directly. ```rust use chroma::{ChromaHttpClient, ChromaHttpClientOptions}; use chroma_types::{IncludeList, Include, Where, MetadataExpression, MetadataComparison, PrimitiveOperator, MetadataValue}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { // Connect to local server let client = ChromaHttpClient::new(Default::default()); // Or connect to Chroma Cloud let options = ChromaHttpClientOptions::cloud("api-key", "database-name")?; let cloud_client = ChromaHttpClient::new(options); // Create collection let collection = client .create_collection("my_collection", None, None) .await?; // Add documents with embeddings (must provide embeddings directly) collection.add( vec!["id1".to_string(), "id2".to_string()], vec![vec![0.1, 0.2, 0.3], vec![0.4, 0.5, 0.6]], Some(vec![ Some("Document about cats".to_string()), Some("Document about dogs".to_string()), ]), None, // uris None, // metadatas ).await?; // Query with embeddings let results = collection .query( vec![vec![0.1, 0.2, 0.3]], // query embeddings Some(5), // n_results None, // where None, // ids None, // include ) .await?; // Query with filter let where_clause = Where::Metadata(MetadataExpression { key: "animal".to_string(), comparison: MetadataComparison::Primitive( PrimitiveOperator::Equal, MetadataValue::Str("cat".to_string()), ), }); let filtered_results = collection .query( vec![vec![0.1, 0.2, 0.3]], Some(10), Some(where_clause), None, Some(IncludeList(vec![Include::Document, Include::Metadata])), ) .await?; // Get by IDs let docs = collection .get( Some(vec!["id1".to_string()]), None, Some(10), Some(0), Some(IncludeList::default_get()), ) .await?; // Delete collection.delete( Some(vec!["id1".to_string()]), None, ).await?; Ok(()) } ``` ## Running Chroma Server Run a Chroma server for client-server deployments. ```bash # Install chromadb pip install chromadb # Run server with persistent storage chroma run --path /path/to/db --host 0.0.0.0 --port 8000 # Run with Docker docker pull chromadb/chroma docker run -p 8000:8000 -v /path/to/db:/chroma/chroma chromadb/chroma # Using npm/npx (for TypeScript projects) npx chroma run --path ./chroma_data ``` ## Summary ChromaDB excels at powering AI applications that require semantic search and retrieval. Its primary use cases include Retrieval-Augmented Generation (RAG) systems where relevant context is retrieved from a knowledge base to augment LLM prompts, semantic search applications that find similar documents based on meaning rather than keywords, recommendation systems that suggest similar items based on embeddings, and document Q&A systems that answer questions using a corpus of documents. Integration with ChromaDB follows straightforward patterns: initialize a client based on your deployment (in-memory for development, persistent for local production, HTTP/Cloud for distributed systems), create collections with optional embedding functions, add your data with documents and metadata, and query using text or embeddings with optional filters. The column-major result format allows efficient batch processing of query results. For production deployments, Chroma Cloud provides a fully managed, serverless solution with automatic scaling and $5 in free credits to get started.