# Pgvector Scout

Pgvector Scout is a Laravel Scout driver that enables vector similarity search using PostgreSQL's pgvector extension. It allows Laravel applications to store and search vector embeddings directly in the database, automatically generating embeddings when models are created or updated through Scout's model observers. The package supports multiple embedding providers including OpenAI, Google Gemini, and Ollama with a configurable handler system.

The package follows a handler-based architecture where each embedding provider has its own configuration for table name, vector dimensions, model, and API settings. It uses content hashing to prevent unnecessary embedding regeneration, supports soft deletes through database joins, and provides the familiar Scout interface while leveraging PostgreSQL's native vector capabilities for cosine similarity searches.

## Installation

Install the package via Composer and publish the required configuration files.

```bash
# Install the package
composer require benbjurstrom/pgvector-scout

# Publish Scout and package configurations
php artisan vendor:publish --tag="scout-config"
php artisan vendor:publish --tag="pgvector-scout-config"

# Create migration for embeddings table
php artisan scout:index openai
php artisan migrate
```

## Configuration

The package configuration file defines embedding indexes for different providers. Each index specifies the handler class, embedding model, vector dimensions, API endpoint, and table name.

```php
// config/pgvector-scout.php
return [
    'indexes' => [
        'openai' => [
            'handler' => BenBjurstrom\PgvectorScout\Handlers\OpenAiHandler::class,
            'model' => 'text-embedding-3-small',
            'dimensions' => 256,
            'url' => 'https://api.openai.com/v1',
            'api_key' => env('OPENAI_API_KEY'),
            'table' => 'openai_embeddings',
        ],
        'gemini' => [
            'handler' => BenBjurstrom\PgvectorScout\Handlers\GeminiHandler::class,
            'model' => 'text-embedding-004',
            'dimensions' => 256,
            'url' => 'https://generativelanguage.googleapis.com/v1beta',
            'api_key' => env('GEMINI_API_KEY'),
            'table' => 'gemini_embeddings',
            'task' => 'SEMANTIC_SIMILARITY',
        ],
        'ollama' => [
            'handler' => BenBjurstrom\PgvectorScout\Handlers\OllamaHandler::class,
            'model' => 'nomic-embed-text',
            'dimensions' => 768,
            'url' => 'http://localhost:11434/api/embeddings',
            'api_key' => 'none',
            'table' => 'ollama_embeddings',
        ],
    ],
];
```

## Environment Configuration

Set the Scout driver and API keys in your environment file.

```env
SCOUT_DRIVER=pgvector
OPENAI_API_KEY=your-api-key
GEMINI_API_KEY=your-gemini-key
```

## Making Models Searchable

Add the `HasEmbeddings` and `Searchable` traits to your model, implement the `searchableAs()` method to specify the index name, and `toSearchableArray()` to define which content should be converted into embeddings.

```php
<?php

namespace App\Models;

use BenBjurstrom\PgvectorScout\Models\Concerns\HasEmbeddings;
use Illuminate\Database\Eloquent\Model;
use Laravel\Scout\Searchable;

class Article extends Model
{
    use HasEmbeddings, Searchable;

    protected $fillable = ['title', 'content', 'category'];

    /**
     * Get the name of the index associated with the model.
     */
    public function searchableAs(): string
    {
        return 'openai';
    }

    /**
     * Get the indexable data array for the model.
     */
    public function toSearchableArray(): array
    {
        return [
            'title' => $this->title,
            'content' => $this->content,
        ];
    }
}

// Embeddings are automatically created when models are saved
$article = Article::create([
    'title' => 'Introduction to Vector Search',
    'content' => 'Vector search enables semantic similarity matching...',
    'category' => 'technology',
]);

// Import existing models
// php artisan scout:import "App\Models\Article"
```

## Basic Vector Search

Use the standard Scout search syntax to perform vector similarity searches. The search query is converted to an embedding vector using the configured handler, then matched against stored embeddings using cosine similarity.

```php
<?php

use App\Models\Article;

// Basic text search - converts query to vector and finds similar embeddings
$results = Article::search('machine learning algorithms')->get();

// Access search results with their embeddings
foreach ($results as $article) {
    echo $article->title . "\n";
    echo "Similarity distance: " . $article->embedding->neighbor_distance . "\n";
}

// Limit the number of results
$results = Article::search('natural language processing')
    ->take(5)
    ->get();

// Get only the model IDs
$ids = Article::search('deep learning')->keys();
// Returns: Collection [1, 5, 3, 8, ...]
```

## Search with Existing Vector

Pass an existing embedding vector as the search parameter to find similar models. This is useful for finding related content based on a model's existing embedding.

```php
<?php

use App\Models\Article;
use Pgvector\Laravel\Vector;

// Get an existing model's embedding vector
$sourceArticle = Article::find(1);
$vector = $sourceArticle->embedding->vector;

// Find similar articles using the vector
$similarArticles = Article::search($vector)->get();

// Create a custom vector for search
$customVector = new Vector([0.1, 0.2, 0.3, /* ... 256 dimensions */]);
$results = Article::search($customVector)->take(10)->get();
```

## Filtering Search Results

Apply Eloquent-style where constraints to filter results before or after the vector similarity calculation. Standard Scout where methods filter on model properties.

```php
<?php

use App\Models\Article;

// Filter by a single field
$results = Article::search('cloud computing')
    ->where('category', 'technology')
    ->get();

// Filter with whereIn
$results = Article::search('programming tutorials')
    ->whereIn('category', ['technology', 'education'])
    ->get();

// Filter with whereNotIn
$results = Article::search('business strategies')
    ->whereNotIn('status', ['draft', 'archived'])
    ->get();

// Combine multiple filters
$results = Article::search('data analysis')
    ->where('published', true)
    ->where('author_id', 5)
    ->take(20)
    ->get();
```

## Advanced Filtering with whereSearchable

Use the `whereSearchable()` macro to apply Eloquent query constraints before the vector similarity search. This improves efficiency by filtering the dataset before computing expensive vector distances.

```php
<?php

use App\Models\DocumentChunk;

// Filter by parent relationship before vector search
$results = DocumentChunk::search('payment terms')
    ->whereSearchable(fn ($query) =>
        $query->whereHas('document', fn ($doc) =>
            $doc->where('client_id', $clientId)
                ->where('type', 'contract')
        )
    )
    ->get();

// Filter with nested relationships and tags
$results = DocumentChunk::search('quarterly results')
    ->whereSearchable(fn ($query) =>
        $query->whereHas('document', fn ($doc) =>
            $doc->where('status', 'active')
                ->where('user_id', $userId)
                ->whereHas('tags', fn ($tag) =>
                    $tag->where('slug', 'financial-reports')
                )
        )
    )
    ->get();

// Chain multiple whereSearchable calls
$results = DocumentChunk::search('meeting notes')
    ->whereSearchable(fn ($query) =>
        $query->whereHas('document', fn ($d) => $d->where('user_id', $userId))
    )
    ->whereSearchable(fn ($query) =>
        $query->whereHas('document', fn ($d) => $d->where('status', 'active'))
    )
    ->get();

// Combine with standard where constraints
$results = DocumentChunk::search('project requirements')
    ->whereSearchable(fn ($query) =>
        $query->whereHas('document', fn ($d) => $d->where('user_id', $userId))
    )
    ->where('chunk_number', 1)
    ->get();

// Use with joins for complex queries
$results = DocumentChunk::search('test results')
    ->whereSearchable(fn ($query) =>
        $query->join('documents', 'document_chunks.document_id', '=', 'documents.id')
            ->where('documents.user_id', $userId)
            ->where('documents.status', 'active')
    )
    ->get();
```

## Pagination

Paginate search results using Scout's built-in pagination support. The paginator includes total count, current page, and navigation metadata.

```php
<?php

use App\Models\Article;

// Basic pagination
$results = Article::search('software development')
    ->paginate(perPage: 10, page: 1);

echo "Total results: " . $results->total();
echo "Current page: " . $results->currentPage();
echo "Last page: " . $results->lastPage();
echo "Has more pages: " . ($results->hasMorePages() ? 'Yes' : 'No');

// Paginate with filters
$results = Article::search('web frameworks')
    ->where('category', 'technology')
    ->paginate(15, page: 2);

// Paginate with whereSearchable
$results = DocumentChunk::search('user authentication')
    ->whereSearchable(fn ($query) =>
        $query->whereHas('document', fn ($d) => $d->where('user_id', $userId))
    )
    ->paginate(5, page: 1);

// Iterate through paginated results
foreach ($results as $article) {
    echo $article->title . "\n";
}
```

## Lazy Collections with Cursor

Use cursor-based iteration for memory-efficient processing of large result sets.

```php
<?php

use App\Models\Article;

// Get results as a lazy collection
$results = Article::search('machine learning')->cursor();

// Process results one at a time without loading all into memory
$results->each(function ($article) {
    processArticle($article);
});

// With filters
$results = Article::search('data science')
    ->where('published', true)
    ->cursor();

// With whereSearchable
$results = DocumentChunk::search('api documentation')
    ->whereSearchable(fn ($query) =>
        $query->whereHas('document', fn ($d) => $d->where('user_id', $userId))
    )
    ->cursor();

// Count results from cursor
$count = Article::search('neural networks')->cursor()->count();
```

## Soft Delete Support

The package integrates with Laravel Scout's soft delete functionality, allowing you to include or exclude soft-deleted models from search results.

```php
<?php

use App\Models\Article;

// Enable soft delete support in config/scout.php
// 'soft_delete' => true,

// By default, soft-deleted models are excluded
$results = Article::search('archived content')->get();

// Include soft-deleted models in results
$results = Article::search('all content')
    ->withTrashed()
    ->get();

// Search only soft-deleted models
$results = Article::search('deleted items')
    ->onlyTrashed()
    ->get();

// Combine with where constraints
$results = Article::search('old articles')
    ->where('category', 'news')
    ->withTrashed()
    ->get();

// Soft deleting a model updates its embedding's __soft_deleted flag
$article = Article::find(1);
$article->delete(); // Embedding retained with __soft_deleted = true

// Force delete removes the embedding entirely
$article->forceDelete();
```

## Listening to Embedding Events

Subscribe to the `EmbeddingSaved` event to monitor embedding creation and updates. This is useful for logging, analytics, or triggering follow-up actions.

```php
<?php

namespace App\Listeners;

use BenBjurstrom\PgvectorScout\Events\EmbeddingSaved;
use Illuminate\Support\Facades\Log;

class LogEmbeddingSaved
{
    public function handle(EmbeddingSaved $event): void
    {
        $action = $event->wasRecentlyCreated ? 'created' : 'updated';

        Log::info("Embedding {$action}", [
            'model' => $event->modelName,      // e.g., "App\Models\Article"
            'id' => $event->modelId,           // e.g., 123
            'handler' => $event->handler,      // e.g., "OpenAiHandler"
        ]);
    }
}

// Register in EventServiceProvider
// app/Providers/EventServiceProvider.php
use BenBjurstrom\PgvectorScout\Events\EmbeddingSaved;
use App\Listeners\LogEmbeddingSaved;

protected $listen = [
    EmbeddingSaved::class => [
        LogEmbeddingSaved::class,
    ],
];
```

## Custom Embedding Handlers

Create custom handlers to integrate with different embedding providers. Implement the `HandlerContract` interface and define the `handle()` method to convert text to a vector.

```php
<?php

namespace App\Handlers;

use BenBjurstrom\PgvectorScout\HandlerContract;
use BenBjurstrom\PgvectorScout\IndexConfig;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Http;
use Pgvector\Laravel\Vector;
use RuntimeException;

class CohereHandler implements HandlerContract
{
    public static function handle(string $input, IndexConfig $config): Vector
    {
        $cacheKey = $config->name . ':' . $config->model . ':' . sha1($input);

        $embedding = Cache::rememberForever($cacheKey, function () use ($input, $config) {
            $response = Http::withHeaders([
                'Authorization' => 'Bearer ' . $config->apiKey,
                'Content-Type' => 'application/json',
            ])->post($config->url . '/embed', [
                'texts' => [$input],
                'model' => $config->model,
                'input_type' => 'search_document',
                'truncate' => 'END',
            ]);

            if (!$response->successful()) {
                throw new RuntimeException(
                    'Cohere API request failed: ' . $response->body()
                );
            }

            $embedding = $response->json('embeddings.0');
            if (empty($embedding)) {
                throw new RuntimeException('No embedding in Cohere response');
            }

            return $embedding;
        });

        return new Vector($embedding);
    }
}

// Add to config/pgvector-scout.php
'cohere' => [
    'handler' => App\Handlers\CohereHandler::class,
    'model' => 'embed-english-v3.0',
    'dimensions' => 1024,
    'url' => 'https://api.cohere.ai/v1',
    'api_key' => env('COHERE_API_KEY'),
    'table' => 'cohere_embeddings',
],
```

## Embedding Model

The `Embedding` model represents stored vector embeddings and provides polymorphic relationships to searchable models. It uses the pgvector Laravel package for vector operations.

```php
<?php

use BenBjurstrom\PgvectorScout\Models\Embedding;
use App\Models\Article;

// Access embedding through the model relationship
$article = Article::with('embedding')->find(1);
$embedding = $article->embedding;

echo $embedding->embeddable_type;    // "App\Models\Article"
echo $embedding->embeddable_id;      // 1
echo $embedding->embedding_model;    // "text-embedding-3-small"
echo $embedding->content_hash;       // UUID hash of content
echo $embedding->vector;             // Pgvector\Laravel\Vector instance
echo $embedding->neighbor_distance;  // Distance from search (after search)

// Query embeddings directly for a specific index
$embedding = (new Embedding)->forIndex('openai');
$allEmbeddings = $embedding->where('embeddable_type', Article::class)->get();

// Access the parent model from an embedding
$parentModel = $embedding->embeddable;
```

## IndexConfig

The `IndexConfig` class encapsulates embedding index configuration and validates settings. It is used internally to resolve handler configuration.

```php
<?php

use BenBjurstrom\PgvectorScout\IndexConfig;
use App\Models\Article;

// Create config from index name
$config = IndexConfig::from('openai');

echo $config->name;       // "openai"
echo $config->handler;    // "BenBjurstrom\PgvectorScout\Handlers\OpenAiHandler"
echo $config->model;      // "text-embedding-3-small"
echo $config->dimensions; // 256
echo $config->table;      // "openai_embeddings"
echo $config->url;        // "https://api.openai.com/v1"
echo $config->apiKey;     // API key from config
echo $config->task;       // null (or task type for Gemini)

// Create config from a searchable model
$article = new Article;
$config = IndexConfig::fromModel($article);

// Use the handler to generate embeddings manually
$vector = $config->handler::handle('Some text to embed', $config);
```

## Flush and Remove from Search

Remove embeddings for models using Scout's built-in methods.

```php
<?php

use App\Models\Article;

// Remove all embeddings for a model type
(new Article)->removeAllFromSearch();

// Or use the artisan command
// php artisan scout:flush "App\Models\Article"

// Delete a specific model (embedding removed automatically)
$article = Article::find(1);
$article->delete();

// Unsearch specific models without deleting them
$articles = Article::whereIn('id', [1, 2, 3])->get();
$articles->unsearchable();
```

## Summary

Pgvector Scout enables semantic search capabilities in Laravel applications by bridging Laravel Scout with PostgreSQL's pgvector extension. The primary use cases include implementing AI-powered search features, building recommendation systems, finding semantically similar content, and creating RAG (Retrieval-Augmented Generation) pipelines. The package excels in scenarios where traditional keyword search falls short, such as finding documents that are conceptually similar even when they don't share exact keywords.

Integration follows standard Laravel Scout patterns, making adoption straightforward for developers familiar with the Scout ecosystem. The handler-based architecture allows easy switching between embedding providers (OpenAI, Gemini, Ollama) or implementing custom handlers. For large content, the recommended pattern is chunking data into separate models (e.g., `DocumentChunk`) for optimal search granularity. The `whereSearchable()` macro enables efficient pre-filtering of results before vector computation, which is essential for multi-tenant applications or complex access control requirements.