Try Live
Add Docs
Rankings
Pricing
Docs
Install
Install
Docs
Pricing
More...
More...
Try Live
Rankings
Enterprise
Create API Key
Add Docs
Pgvector Scout
https://github.com/benbjurstrom/pgvector-scout
Admin
Pgvector Scout is a Laravel Scout driver that enables vector similarity search using PostgreSQL's
...
Tokens:
8,990
Snippets:
70
Trust Score:
9
Update:
1 week ago
Context
Skills
Chat
Benchmark
92.1
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# Pgvector Scout Pgvector Scout is a Laravel Scout driver that enables vector similarity search using PostgreSQL's pgvector extension. It allows Laravel applications to store and search vector embeddings directly in the database, automatically generating embeddings when models are created or updated through Scout's model observers. The package supports multiple embedding providers including OpenAI, Google Gemini, and Ollama with a configurable handler system. The package follows a handler-based architecture where each embedding provider has its own configuration for table name, vector dimensions, model, and API settings. It uses content hashing to prevent unnecessary embedding regeneration, supports soft deletes through database joins, and provides the familiar Scout interface while leveraging PostgreSQL's native vector capabilities for cosine similarity searches. ## Installation Install the package via Composer and publish the required configuration files. ```bash # Install the package composer require benbjurstrom/pgvector-scout # Publish Scout and package configurations php artisan vendor:publish --tag="scout-config" php artisan vendor:publish --tag="pgvector-scout-config" # Create migration for embeddings table php artisan scout:index openai php artisan migrate ``` ## Configuration The package configuration file defines embedding indexes for different providers. Each index specifies the handler class, embedding model, vector dimensions, API endpoint, and table name. ```php // config/pgvector-scout.php return [ 'indexes' => [ 'openai' => [ 'handler' => BenBjurstrom\PgvectorScout\Handlers\OpenAiHandler::class, 'model' => 'text-embedding-3-small', 'dimensions' => 256, 'url' => 'https://api.openai.com/v1', 'api_key' => env('OPENAI_API_KEY'), 'table' => 'openai_embeddings', ], 'gemini' => [ 'handler' => BenBjurstrom\PgvectorScout\Handlers\GeminiHandler::class, 'model' => 'text-embedding-004', 'dimensions' => 256, 'url' => 'https://generativelanguage.googleapis.com/v1beta', 'api_key' => env('GEMINI_API_KEY'), 'table' => 'gemini_embeddings', 'task' => 'SEMANTIC_SIMILARITY', ], 'ollama' => [ 'handler' => BenBjurstrom\PgvectorScout\Handlers\OllamaHandler::class, 'model' => 'nomic-embed-text', 'dimensions' => 768, 'url' => 'http://localhost:11434/api/embeddings', 'api_key' => 'none', 'table' => 'ollama_embeddings', ], ], ]; ``` ## Environment Configuration Set the Scout driver and API keys in your environment file. ```env SCOUT_DRIVER=pgvector OPENAI_API_KEY=your-api-key GEMINI_API_KEY=your-gemini-key ``` ## Making Models Searchable Add the `HasEmbeddings` and `Searchable` traits to your model, implement the `searchableAs()` method to specify the index name, and `toSearchableArray()` to define which content should be converted into embeddings. ```php <?php namespace App\Models; use BenBjurstrom\PgvectorScout\Models\Concerns\HasEmbeddings; use Illuminate\Database\Eloquent\Model; use Laravel\Scout\Searchable; class Article extends Model { use HasEmbeddings, Searchable; protected $fillable = ['title', 'content', 'category']; /** * Get the name of the index associated with the model. */ public function searchableAs(): string { return 'openai'; } /** * Get the indexable data array for the model. */ public function toSearchableArray(): array { return [ 'title' => $this->title, 'content' => $this->content, ]; } } // Embeddings are automatically created when models are saved $article = Article::create([ 'title' => 'Introduction to Vector Search', 'content' => 'Vector search enables semantic similarity matching...', 'category' => 'technology', ]); // Import existing models // php artisan scout:import "App\Models\Article" ``` ## Basic Vector Search Use the standard Scout search syntax to perform vector similarity searches. The search query is converted to an embedding vector using the configured handler, then matched against stored embeddings using cosine similarity. ```php <?php use App\Models\Article; // Basic text search - converts query to vector and finds similar embeddings $results = Article::search('machine learning algorithms')->get(); // Access search results with their embeddings foreach ($results as $article) { echo $article->title . "\n"; echo "Similarity distance: " . $article->embedding->neighbor_distance . "\n"; } // Limit the number of results $results = Article::search('natural language processing') ->take(5) ->get(); // Get only the model IDs $ids = Article::search('deep learning')->keys(); // Returns: Collection [1, 5, 3, 8, ...] ``` ## Search with Existing Vector Pass an existing embedding vector as the search parameter to find similar models. This is useful for finding related content based on a model's existing embedding. ```php <?php use App\Models\Article; use Pgvector\Laravel\Vector; // Get an existing model's embedding vector $sourceArticle = Article::find(1); $vector = $sourceArticle->embedding->vector; // Find similar articles using the vector $similarArticles = Article::search($vector)->get(); // Create a custom vector for search $customVector = new Vector([0.1, 0.2, 0.3, /* ... 256 dimensions */]); $results = Article::search($customVector)->take(10)->get(); ``` ## Filtering Search Results Apply Eloquent-style where constraints to filter results before or after the vector similarity calculation. Standard Scout where methods filter on model properties. ```php <?php use App\Models\Article; // Filter by a single field $results = Article::search('cloud computing') ->where('category', 'technology') ->get(); // Filter with whereIn $results = Article::search('programming tutorials') ->whereIn('category', ['technology', 'education']) ->get(); // Filter with whereNotIn $results = Article::search('business strategies') ->whereNotIn('status', ['draft', 'archived']) ->get(); // Combine multiple filters $results = Article::search('data analysis') ->where('published', true) ->where('author_id', 5) ->take(20) ->get(); ``` ## Advanced Filtering with whereSearchable Use the `whereSearchable()` macro to apply Eloquent query constraints before the vector similarity search. This improves efficiency by filtering the dataset before computing expensive vector distances. ```php <?php use App\Models\DocumentChunk; // Filter by parent relationship before vector search $results = DocumentChunk::search('payment terms') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($doc) => $doc->where('client_id', $clientId) ->where('type', 'contract') ) ) ->get(); // Filter with nested relationships and tags $results = DocumentChunk::search('quarterly results') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($doc) => $doc->where('status', 'active') ->where('user_id', $userId) ->whereHas('tags', fn ($tag) => $tag->where('slug', 'financial-reports') ) ) ) ->get(); // Chain multiple whereSearchable calls $results = DocumentChunk::search('meeting notes') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('status', 'active')) ) ->get(); // Combine with standard where constraints $results = DocumentChunk::search('project requirements') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->where('chunk_number', 1) ->get(); // Use with joins for complex queries $results = DocumentChunk::search('test results') ->whereSearchable(fn ($query) => $query->join('documents', 'document_chunks.document_id', '=', 'documents.id') ->where('documents.user_id', $userId) ->where('documents.status', 'active') ) ->get(); ``` ## Pagination Paginate search results using Scout's built-in pagination support. The paginator includes total count, current page, and navigation metadata. ```php <?php use App\Models\Article; // Basic pagination $results = Article::search('software development') ->paginate(perPage: 10, page: 1); echo "Total results: " . $results->total(); echo "Current page: " . $results->currentPage(); echo "Last page: " . $results->lastPage(); echo "Has more pages: " . ($results->hasMorePages() ? 'Yes' : 'No'); // Paginate with filters $results = Article::search('web frameworks') ->where('category', 'technology') ->paginate(15, page: 2); // Paginate with whereSearchable $results = DocumentChunk::search('user authentication') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->paginate(5, page: 1); // Iterate through paginated results foreach ($results as $article) { echo $article->title . "\n"; } ``` ## Lazy Collections with Cursor Use cursor-based iteration for memory-efficient processing of large result sets. ```php <?php use App\Models\Article; // Get results as a lazy collection $results = Article::search('machine learning')->cursor(); // Process results one at a time without loading all into memory $results->each(function ($article) { processArticle($article); }); // With filters $results = Article::search('data science') ->where('published', true) ->cursor(); // With whereSearchable $results = DocumentChunk::search('api documentation') ->whereSearchable(fn ($query) => $query->whereHas('document', fn ($d) => $d->where('user_id', $userId)) ) ->cursor(); // Count results from cursor $count = Article::search('neural networks')->cursor()->count(); ``` ## Soft Delete Support The package integrates with Laravel Scout's soft delete functionality, allowing you to include or exclude soft-deleted models from search results. ```php <?php use App\Models\Article; // Enable soft delete support in config/scout.php // 'soft_delete' => true, // By default, soft-deleted models are excluded $results = Article::search('archived content')->get(); // Include soft-deleted models in results $results = Article::search('all content') ->withTrashed() ->get(); // Search only soft-deleted models $results = Article::search('deleted items') ->onlyTrashed() ->get(); // Combine with where constraints $results = Article::search('old articles') ->where('category', 'news') ->withTrashed() ->get(); // Soft deleting a model updates its embedding's __soft_deleted flag $article = Article::find(1); $article->delete(); // Embedding retained with __soft_deleted = true // Force delete removes the embedding entirely $article->forceDelete(); ``` ## Listening to Embedding Events Subscribe to the `EmbeddingSaved` event to monitor embedding creation and updates. This is useful for logging, analytics, or triggering follow-up actions. ```php <?php namespace App\Listeners; use BenBjurstrom\PgvectorScout\Events\EmbeddingSaved; use Illuminate\Support\Facades\Log; class LogEmbeddingSaved { public function handle(EmbeddingSaved $event): void { $action = $event->wasRecentlyCreated ? 'created' : 'updated'; Log::info("Embedding {$action}", [ 'model' => $event->modelName, // e.g., "App\Models\Article" 'id' => $event->modelId, // e.g., 123 'handler' => $event->handler, // e.g., "OpenAiHandler" ]); } } // Register in EventServiceProvider // app/Providers/EventServiceProvider.php use BenBjurstrom\PgvectorScout\Events\EmbeddingSaved; use App\Listeners\LogEmbeddingSaved; protected $listen = [ EmbeddingSaved::class => [ LogEmbeddingSaved::class, ], ]; ``` ## Custom Embedding Handlers Create custom handlers to integrate with different embedding providers. Implement the `HandlerContract` interface and define the `handle()` method to convert text to a vector. ```php <?php namespace App\Handlers; use BenBjurstrom\PgvectorScout\HandlerContract; use BenBjurstrom\PgvectorScout\IndexConfig; use Illuminate\Support\Facades\Cache; use Illuminate\Support\Facades\Http; use Pgvector\Laravel\Vector; use RuntimeException; class CohereHandler implements HandlerContract { public static function handle(string $input, IndexConfig $config): Vector { $cacheKey = $config->name . ':' . $config->model . ':' . sha1($input); $embedding = Cache::rememberForever($cacheKey, function () use ($input, $config) { $response = Http::withHeaders([ 'Authorization' => 'Bearer ' . $config->apiKey, 'Content-Type' => 'application/json', ])->post($config->url . '/embed', [ 'texts' => [$input], 'model' => $config->model, 'input_type' => 'search_document', 'truncate' => 'END', ]); if (!$response->successful()) { throw new RuntimeException( 'Cohere API request failed: ' . $response->body() ); } $embedding = $response->json('embeddings.0'); if (empty($embedding)) { throw new RuntimeException('No embedding in Cohere response'); } return $embedding; }); return new Vector($embedding); } } // Add to config/pgvector-scout.php 'cohere' => [ 'handler' => App\Handlers\CohereHandler::class, 'model' => 'embed-english-v3.0', 'dimensions' => 1024, 'url' => 'https://api.cohere.ai/v1', 'api_key' => env('COHERE_API_KEY'), 'table' => 'cohere_embeddings', ], ``` ## Embedding Model The `Embedding` model represents stored vector embeddings and provides polymorphic relationships to searchable models. It uses the pgvector Laravel package for vector operations. ```php <?php use BenBjurstrom\PgvectorScout\Models\Embedding; use App\Models\Article; // Access embedding through the model relationship $article = Article::with('embedding')->find(1); $embedding = $article->embedding; echo $embedding->embeddable_type; // "App\Models\Article" echo $embedding->embeddable_id; // 1 echo $embedding->embedding_model; // "text-embedding-3-small" echo $embedding->content_hash; // UUID hash of content echo $embedding->vector; // Pgvector\Laravel\Vector instance echo $embedding->neighbor_distance; // Distance from search (after search) // Query embeddings directly for a specific index $embedding = (new Embedding)->forIndex('openai'); $allEmbeddings = $embedding->where('embeddable_type', Article::class)->get(); // Access the parent model from an embedding $parentModel = $embedding->embeddable; ``` ## IndexConfig The `IndexConfig` class encapsulates embedding index configuration and validates settings. It is used internally to resolve handler configuration. ```php <?php use BenBjurstrom\PgvectorScout\IndexConfig; use App\Models\Article; // Create config from index name $config = IndexConfig::from('openai'); echo $config->name; // "openai" echo $config->handler; // "BenBjurstrom\PgvectorScout\Handlers\OpenAiHandler" echo $config->model; // "text-embedding-3-small" echo $config->dimensions; // 256 echo $config->table; // "openai_embeddings" echo $config->url; // "https://api.openai.com/v1" echo $config->apiKey; // API key from config echo $config->task; // null (or task type for Gemini) // Create config from a searchable model $article = new Article; $config = IndexConfig::fromModel($article); // Use the handler to generate embeddings manually $vector = $config->handler::handle('Some text to embed', $config); ``` ## Flush and Remove from Search Remove embeddings for models using Scout's built-in methods. ```php <?php use App\Models\Article; // Remove all embeddings for a model type (new Article)->removeAllFromSearch(); // Or use the artisan command // php artisan scout:flush "App\Models\Article" // Delete a specific model (embedding removed automatically) $article = Article::find(1); $article->delete(); // Unsearch specific models without deleting them $articles = Article::whereIn('id', [1, 2, 3])->get(); $articles->unsearchable(); ``` ## Summary Pgvector Scout enables semantic search capabilities in Laravel applications by bridging Laravel Scout with PostgreSQL's pgvector extension. The primary use cases include implementing AI-powered search features, building recommendation systems, finding semantically similar content, and creating RAG (Retrieval-Augmented Generation) pipelines. The package excels in scenarios where traditional keyword search falls short, such as finding documents that are conceptually similar even when they don't share exact keywords. Integration follows standard Laravel Scout patterns, making adoption straightforward for developers familiar with the Scout ecosystem. The handler-based architecture allows easy switching between embedding providers (OpenAI, Gemini, Ollama) or implementing custom handlers. For large content, the recommended pattern is chunking data into separate models (e.g., `DocumentChunk`) for optimal search granularity. The `whereSearchable()` macro enables efficient pre-filtering of results before vector computation, which is essential for multi-tenant applications or complex access control requirements.