# Pgvector Scout Pgvector Scout is a Laravel Scout driver that enables vector similarity search using PostgreSQL's pgvector extension. It allows Laravel applications to store and search vector embeddings directly in the database, automatically generating embeddings when models are created or updated through Scout's model observers. The package supports multiple embedding providers including OpenAI, Google Gemini, and Ollama with a configurable handler system. The package follows a handler-based architecture where each embedding provider has its own configuration for table name, vector dimensions, model, and API settings. It uses content hashing to prevent unnecessary embedding regeneration, supports soft deletes through database joins, and provides the familiar Scout interface while leveraging PostgreSQL's native vector capabilities for cosine similarity searches. ## Installation Install the package via Composer and publish the required configuration files. ```bash # Install the package composer require benbjurstrom/pgvector-scout # Publish Scout and package configurations php artisan vendor:publish --tag="scout-config" php artisan vendor:publish --tag="pgvector-scout-config" # Create migration for embeddings table php artisan scout:index openai php artisan migrate ``` ## Configuration The package configuration file defines embedding indexes for different providers. Each index specifies the handler class, embedding model, vector dimensions, API endpoint, and table name. ```php // config/pgvector-scout.php return [ 'indexes' => [ 'openai' => [ 'handler' => BenBjurstrom\PgvectorScout\Handlers\OpenAiHandler::class, 'model' => 'text-embedding-3-small', 'dimensions' => 256, 'url' => 'https://api.openai.com/v1', 'api_key' => env('OPENAI_API_KEY'), 'table' => 'openai_embeddings', ], 'gemini' => [ 'handler' => BenBjurstrom\PgvectorScout\Handlers\GeminiHandler::class, 'model' => 'text-embedding-004', 'dimensions' => 256, 'url' => 'https://generativelanguage.googleapis.com/v1beta', 'api_key' => env('GEMINI_API_KEY'), 'table' => 'gemini_embeddings', 'task' => 'SEMANTIC_SIMILARITY', ], 'ollama' => [ 'handler' => BenBjurstrom\PgvectorScout\Handlers\OllamaHandler::class, 'model' => 'nomic-embed-text', 'dimensions' => 768, 'url' => 'http://localhost:11434/api/embeddings', 'api_key' => 'none', 'table' => 'ollama_embeddings', ], ], ]; ``` ## Environment Configuration Set the Scout driver and API keys in your environment file. ```env SCOUT_DRIVER=pgvector OPENAI_API_KEY=your-api-key GEMINI_API_KEY=your-gemini-key ``` ## Making Models Searchable Add the `HasEmbeddings` and `Searchable` traits to your model, implement the `searchableAs()` method to specify the index name, and `toSearchableArray()` to define which content should be converted into embeddings. ```php $this->title, 'content' => $this->content, ]; } } // Embeddings are automatically created when models are saved $article = Article::create([ 'title' => 'Introduction to Vector Search', 'content' => 'Vector search enables semantic similarity matching...', 'category' => 'technology', ]); // Import existing models // php artisan scout:import "App\Models\Article" ``` ## Basic Vector Search Use the standard Scout search syntax to perform vector similarity searches. The search query is converted to an embedding vector using the configured handler, then matched against stored embeddings using cosine similarity. ```php get(); // Access search results with their embeddings foreach ($results as $article) { echo $article->title . "\n"; echo "Similarity distance: " . $article->embedding->neighbor_distance . "\n"; } // Limit the number of results $results = Article::search('natural language processing') ->take(5) ->get(); // Get only the model IDs $ids = Article::search('deep learning')->keys(); // Returns: Collection [1, 5, 3, 8, ...] ``` ## Search with Existing Vector Pass an existing embedding vector as the search parameter to find similar models. This is useful for finding related content based on a model's existing embedding. ```php embedding->vector; // Find similar articles using the vector $similarArticles = Article::search($vector)->get(); // Create a custom vector for search $customVector = new Vector([0.1, 0.2, 0.3, /* ... 256 dimensions */]); $results = Article::search($customVector)->take(10)->get(); ``` ## Filtering Search Results Apply Eloquent-style where constraints to filter results before or after the vector similarity calculation. Standard Scout where methods filter on model properties. ```php where('category', 'technology') ->get(); // Filter with whereIn $results = Article::search('programming tutorials') ->whereIn('category', ['technology', 'education']) ->get(); // Filter with whereNotIn $results = Article::search('business strategies') ->whereNotIn('status', ['draft', 'archived']) ->get(); // Combine multiple filters $results = Article::search('data analysis') ->where('published', true) ->where('author_id', 5) ->take(20) ->get(); ``` ## Advanced Filtering with whereSearchable Use the `whereSearchable()` macro to apply Eloquent query constraints before the vector similarity search. This improves efficiency by filtering the dataset before computing expensive vector distances. ```php whereSearchable(fn ($query) => $query->whereHas('document', fn ($doc) => $doc->where('client_id', $clientId) ->where('type', 'contract') ) ) ->get(); // Filter with nested relationships and tags $results = DocumentChunk::search('quarterly results') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($doc) => $doc->where('status', 'active') ->where('user_id', $userId) ->whereHas('tags', fn ($tag) => $tag->where('slug', 'financial-reports') ) ) ) ->get(); // Chain multiple whereSearchable calls $results = DocumentChunk::search('meeting notes') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('status', 'active')) ) ->get(); // Combine with standard where constraints $results = DocumentChunk::search('project requirements') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->where('chunk_number', 1) ->get(); // Use with joins for complex queries $results = DocumentChunk::search('test results') ->whereSearchable(fn ($query) => $query->join('documents', 'document_chunks.document_id', '=', 'documents.id') ->where('documents.user_id', $userId) ->where('documents.status', 'active') ) ->get(); ``` ## Pagination Paginate search results using Scout's built-in pagination support. The paginator includes total count, current page, and navigation metadata. ```php paginate(perPage: 10, page: 1); echo "Total results: " . $results->total(); echo "Current page: " . $results->currentPage(); echo "Last page: " . $results->lastPage(); echo "Has more pages: " . ($results->hasMorePages() ? 'Yes' : 'No'); // Paginate with filters $results = Article::search('web frameworks') ->where('category', 'technology') ->paginate(15, page: 2); // Paginate with whereSearchable $results = DocumentChunk::search('user authentication') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->paginate(5, page: 1); // Iterate through paginated results foreach ($results as $article) { echo $article->title . "\n"; } ``` ## Lazy Collections with Cursor Use cursor-based iteration for memory-efficient processing of large result sets. ```php cursor(); // Process results one at a time without loading all into memory $results->each(function ($article) { processArticle($article); }); // With filters $results = Article::search('data science') ->where('published', true) ->cursor(); // With whereSearchable $results = DocumentChunk::search('api documentation') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->cursor(); // Count results from cursor $count = Article::search('neural networks')->cursor()->count(); ``` ## Soft Delete Support The package integrates with Laravel Scout's soft delete functionality, allowing you to include or exclude soft-deleted models from search results. ```php true, // By default, soft-deleted models are excluded $results = Article::search('archived content')->get(); // Include soft-deleted models in results $results = Article::search('all content') ->withTrashed() ->get(); // Search only soft-deleted models $results = Article::search('deleted items') ->onlyTrashed() ->get(); // Combine with where constraints $results = Article::search('old articles') ->where('category', 'news') ->withTrashed() ->get(); // Soft deleting a model updates its embedding's __soft_deleted flag $article = Article::find(1); $article->delete(); // Embedding retained with __soft_deleted = true // Force delete removes the embedding entirely $article->forceDelete(); ``` ## Listening to Embedding Events Subscribe to the `EmbeddingSaved` event to monitor embedding creation and updates. This is useful for logging, analytics, or triggering follow-up actions. ```php wasRecentlyCreated ? 'created' : 'updated'; Log::info("Embedding {$action}", [ 'model' => $event->modelName, // e.g., "App\Models\Article" 'id' => $event->modelId, // e.g., 123 'handler' => $event->handler, // e.g., "OpenAiHandler" ]); } } // Register in EventServiceProvider // app/Providers/EventServiceProvider.php use BenBjurstrom\PgvectorScout\Events\EmbeddingSaved; use App\Listeners\LogEmbeddingSaved; protected $listen = [ EmbeddingSaved::class => [ LogEmbeddingSaved::class, ], ]; ``` ## Custom Embedding Handlers Create custom handlers to integrate with different embedding providers. Implement the `HandlerContract` interface and define the `handle()` method to convert text to a vector. ```php name . ':' . $config->model . ':' . sha1($input); $embedding = Cache::rememberForever($cacheKey, function () use ($input, $config) { $response = Http::withHeaders([ 'Authorization' => 'Bearer ' . $config->apiKey, 'Content-Type' => 'application/json', ])->post($config->url . '/embed', [ 'texts' => [$input], 'model' => $config->model, 'input_type' => 'search_document', 'truncate' => 'END', ]); if (!$response->successful()) { throw new RuntimeException( 'Cohere API request failed: ' . $response->body() ); } $embedding = $response->json('embeddings.0'); if (empty($embedding)) { throw new RuntimeException('No embedding in Cohere response'); } return $embedding; }); return new Vector($embedding); } } // Add to config/pgvector-scout.php 'cohere' => [ 'handler' => App\Handlers\CohereHandler::class, 'model' => 'embed-english-v3.0', 'dimensions' => 1024, 'url' => 'https://api.cohere.ai/v1', 'api_key' => env('COHERE_API_KEY'), 'table' => 'cohere_embeddings', ], ``` ## Embedding Model The `Embedding` model represents stored vector embeddings and provides polymorphic relationships to searchable models. It uses the pgvector Laravel package for vector operations. ```php find(1); $embedding = $article->embedding; echo $embedding->embeddable_type; // "App\Models\Article" echo $embedding->embeddable_id; // 1 echo $embedding->embedding_model; // "text-embedding-3-small" echo $embedding->content_hash; // UUID hash of content echo $embedding->vector; // Pgvector\Laravel\Vector instance echo $embedding->neighbor_distance; // Distance from search (after search) // Query embeddings directly for a specific index $embedding = (new Embedding)->forIndex('openai'); $allEmbeddings = $embedding->where('embeddable_type', Article::class)->get(); // Access the parent model from an embedding $parentModel = $embedding->embeddable; ``` ## IndexConfig The `IndexConfig` class encapsulates embedding index configuration and validates settings. It is used internally to resolve handler configuration. ```php name; // "openai" echo $config->handler; // "BenBjurstrom\PgvectorScout\Handlers\OpenAiHandler" echo $config->model; // "text-embedding-3-small" echo $config->dimensions; // 256 echo $config->table; // "openai_embeddings" echo $config->url; // "https://api.openai.com/v1" echo $config->apiKey; // API key from config echo $config->task; // null (or task type for Gemini) // Create config from a searchable model $article = new Article; $config = IndexConfig::fromModel($article); // Use the handler to generate embeddings manually $vector = $config->handler::handle('Some text to embed', $config); ``` ## Flush and Remove from Search Remove embeddings for models using Scout's built-in methods. ```php removeAllFromSearch(); // Or use the artisan command // php artisan scout:flush "App\Models\Article" // Delete a specific model (embedding removed automatically) $article = Article::find(1); $article->delete(); // Unsearch specific models without deleting them $articles = Article::whereIn('id', [1, 2, 3])->get(); $articles->unsearchable(); ``` ## Summary Pgvector Scout enables semantic search capabilities in Laravel applications by bridging Laravel Scout with PostgreSQL's pgvector extension. The primary use cases include implementing AI-powered search features, building recommendation systems, finding semantically similar content, and creating RAG (Retrieval-Augmented Generation) pipelines. The package excels in scenarios where traditional keyword search falls short, such as finding documents that are conceptually similar even when they don't share exact keywords. Integration follows standard Laravel Scout patterns, making adoption straightforward for developers familiar with the Scout ecosystem. The handler-based architecture allows easy switching between embedding providers (OpenAI, Gemini, Ollama) or implementing custom handlers. For large content, the recommended pattern is chunking data into separate models (e.g., `DocumentChunk`) for optimal search granularity. The `whereSearchable()` macro enables efficient pre-filtering of results before vector computation, which is essential for multi-tenant applications or complex access control requirements.