### Install VectorVFS with pip Source: https://github.com/perone/vectorvfs/blob/main/docs/source/installation.md Use this command to install the VectorVFS package from PyPI. Ensure pip is available in your environment. ```bash pip install vectorvfs ``` -------------------------------- ### Display vfs Script Help Source: https://github.com/perone/vectorvfs/blob/main/docs/source/installation.md After installation, you can use the `vfs` command-line tool to access its functionalities. This command displays the help message. ```bash vfs --help ``` -------------------------------- ### Run commands with uv Source: https://github.com/perone/vectorvfs/blob/main/codex.md Use this command to execute tasks within the project's virtual environment. ```bash uv run [command] ``` -------------------------------- ### Store and Retrieve Tensors with VFSStore Source: https://context7.com/perone/vectorvfs/llms.txt Serialize and store PyTorch tensors into file metadata using VFSStore. ```python import torch from pathlib import Path from vectorvfs.vfsstore import VFSStore, XAttrFile # Initialize VFSStore for a file file_path = Path("/path/to/image.jpg") xattr_file = XAttrFile(file_path) vfs_store = VFSStore(xattr_file) # Create a sample embedding tensor (1024 dimensions, half precision) embedding = torch.randn(1, 1024, dtype=torch.float16) # Write tensor to the file's extended attributes bytes_written = vfs_store.write_tensor(embedding) print(f"Stored embedding: {bytes_written} bytes") # Output: Stored embedding: 2114 bytes # Read the tensor back from extended attributes retrieved_embedding = vfs_store.read_tensor() print(f"Retrieved tensor shape: {retrieved_embedding.shape}") print(f"Tensor dtype: {retrieved_embedding.dtype}") # Output: Retrieved tensor shape: torch.Size([1, 1024]) # Output: Tensor dtype: torch.float16 # Verify the stored and retrieved tensors match assert torch.allclose(embedding, retrieved_embedding) ``` -------------------------------- ### Search Files via CLI Source: https://context7.com/perone/vectorvfs/llms.txt Perform semantic searches across directories using the vfs search command. ```bash # Basic search - find images matching "cat" in a folder vfs search cat /my_folder # Recursive search through all subdirectories vfs search -r "orange tabby cat" /photos # Limit results to top 3 matches vfs search -n 3 "sunset over ocean" /vacation_photos # Force re-indexing of all files (ignores cached embeddings) vfs search -f "mountain landscape" /nature_photos # Combined options: recursive search, top 5 results, force reindex vfs search -r -n 5 -f "happy dog playing" /pet_photos ``` -------------------------------- ### VFSStore Class Source: https://context7.com/perone/vectorvfs/llms.txt Handles high-level tensor storage and retrieval using the user.vectorvfs extended attribute. ```APIDOC ## VFSStore Class ### Description Provides high-level tensor storage and retrieval operations, handling serialization of PyTorch tensors to/from bytes. ### Methods - **write_tensor(tensor)**: Serializes and stores a PyTorch tensor in the 'user.vectorvfs' attribute. - **read_tensor()**: Retrieves and deserializes the tensor from the 'user.vectorvfs' attribute. ``` -------------------------------- ### Perform Semantic Search on Images Source: https://context7.com/perone/vectorvfs/llms.txt Indexes images in a directory and performs semantic search using text queries. Requires PerceptionEncoder and VFSStore. Can force re-indexing if needed. ```python import torch import torch.nn.functional as F from pathlib import Path from heapq import heappush, heappushpop from dataclasses import dataclass, field from vectorvfs.encoders import PerceptionEncoder from vectorvfs.vfsstore import VFSStore, XAttrFile from vectorvfs.utils import pillow_image_extensions @dataclass(order=True) class SearchResult: similarity: float path: Path = field(compare=False) def semantic_search(query: str, directory: Path, top_k: int = 5, recursive: bool = False, force_reindex: bool = False): """ Perform semantic search across images in a directory. Args: query: Text query to search for directory: Directory to search in top_k: Number of results to return recursive: Whether to search subdirectories force_reindex: Whether to force re-embedding of all files Returns: List of (path, similarity_score) tuples """ # Initialize encoder encoder = PerceptionEncoder() # Encode the search query query_features = encoder.encode_text(query) query_features = F.normalize(query_features) # Get supported image extensions supported_extensions = pillow_image_extensions() # Iterate over files if recursive: files = directory.rglob("*") else: files = directory.iterdir() # Min-heap for top-k results results_heap = [] for file_path in files: if not file_path.is_file(): continue if file_path.suffix.lower() not in supported_extensions: continue # Set up VFS store for this file xattr_file = XAttrFile(file_path) vfs_store = VFSStore(xattr_file) # Check if already indexed existing_attrs = xattr_file.list() needs_indexing = "user.vectorvfs" not in existing_attrs or force_reindex if needs_indexing: try: # Generate and store embedding features = encoder.encode_vision(file_path) features = F.normalize(features) features = features.to(torch.float16) vfs_store.write_tensor(features) print(f"Indexed: {file_path.name}") except Exception as e: print(f"Failed to index {file_path.name}: {e}") continue else: # Load cached embedding features = vfs_store.read_tensor() # Compute similarity features = features.to(torch.float32) similarity = (features @ query_features.T).item() # Maintain top-k heap result = SearchResult(similarity=similarity, path=file_path) if len(results_heap) < top_k: heappush(results_heap, result) else: heappushpop(results_heap, result) # Return sorted results (highest similarity first) return [(r.path, r.similarity) for r in sorted(results_heap, reverse=True)] # Example usage if __name__ == "__main__": search_dir = Path("/home/user/photos") results = semantic_search( query="happy golden retriever playing in park", directory=search_dir, top_k=5, recursive=True, force_reindex=False ) print("\nSearch Results:") for path, score in results: print(f" {path.name}: {score:.4f}") ``` -------------------------------- ### Encode and compute similarity with DualEncoder Source: https://context7.com/perone/vectorvfs/llms.txt Demonstrates how to encode images and text, normalize the resulting features, and compute a cosine similarity score. ```python image_path = Path("/path/to/cat.jpg") image_features = encoder.encode_vision(image_path) print(f"Image embedding shape: {image_features.shape}") # Output: Image embedding shape: torch.Size([1, 1024]) # Encode text query text_features = encoder.encode_text("a cute orange cat") print(f"Text embedding shape: {text_features.shape}") # Output: Text embedding shape: torch.Size([1, 1024]) # Normalize features for cosine similarity image_features = F.normalize(image_features) text_features = F.normalize(text_features) # Compute similarity score similarity = (image_features @ text_features.T).item() print(f"Similarity score: {similarity:.4f}") # Output: Similarity score: 0.3245 # Get logit scale for softmax computation logit_scale = encoder.logit_scale() print(f"Logit scale: {logit_scale.item():.2f}") ``` -------------------------------- ### XAttrFile Class Source: https://context7.com/perone/vectorvfs/llms.txt Provides low-level access to Linux extended attributes on individual files. ```APIDOC ## XAttrFile Class ### Description Wraps OS-level xattr operations for reading, writing, listing, and removing extended attributes on a file. ### Methods - **list()**: Returns a list of existing extended attribute keys. - **write(key, value)**: Writes binary data to a specific extended attribute key. - **read(key)**: Reads binary data from a specific extended attribute key. - **remove(key)**: Removes an extended attribute key from the file. ``` -------------------------------- ### Implement a custom DualEncoder Source: https://context7.com/perone/vectorvfs/llms.txt Provides a template for creating custom encoders by inheriting from the DualEncoder abstract base class. ```python from abc import ABC, abstractmethod from pathlib import Path import torch from vectorvfs.encoders import DualEncoder class CustomEncoder(DualEncoder): """Example custom encoder implementation.""" def __init__(self, model_path: str): self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Load your custom model here self.model = self._load_model(model_path) def _load_model(self, path): # Custom model loading logic pass def encode_vision(self, file: Path) -> torch.Tensor: """Encode image file to tensor.""" # Custom image encoding logic # Must return shape [1, embedding_dim] return torch.randn(1, 1024) def encode_text(self, text: str) -> torch.Tensor: """Encode text string to tensor.""" # Custom text encoding logic # Must return shape [1, embedding_dim] return torch.randn(1, 1024) def logit_scale(self) -> torch.Tensor: """Return logit scale for similarity computation.""" return torch.tensor(100.0) # Use the custom encoder encoder = CustomEncoder("/path/to/model") features = encoder.encode_vision(Path("/path/to/image.jpg")) ``` -------------------------------- ### Basic vfs search command Source: https://github.com/perone/vectorvfs/blob/main/docs/source/usage.md Use this command to search for files containing a specific term within a given folder. The tool automatically embeds or loads existing embeddings for supported files. ```bash $ vfs search cat /my_folder ``` -------------------------------- ### Encode Data with PerceptionEncoder Source: https://context7.com/perone/vectorvfs/llms.txt Initialize the PerceptionEncoder to transform images and text into a shared embedding space. ```python import torch import torch.nn.functional as F from pathlib import Path from vectorvfs.encoders import PerceptionEncoder # Initialize the encoder (uses PE-Core-L14-336 model by default) encoder = PerceptionEncoder() print(f"Model: {encoder.model_name}") print(f"Device: {encoder.device}") # Output: Model: PE-Core-L14-336 # Output: Device: cuda (or cpu) ``` -------------------------------- ### Manage Extended Attributes with XAttrFile Source: https://context7.com/perone/vectorvfs/llms.txt Use XAttrFile to read, write, and remove Linux extended attributes on individual files. ```python from pathlib import Path from vectorvfs.vfsstore import XAttrFile # Initialize XAttrFile for a specific file file_path = Path("/path/to/image.jpg") xattr_file = XAttrFile(file_path) # List all extended attributes on the file attributes = xattr_file.list() print(f"Existing attributes: {attributes}") # Output: ['user.vectorvfs', 'user.custom'] # Write custom data as an extended attribute custom_data = b"my custom metadata" xattr_file.write("user.my_custom_key", custom_data) # Read an extended attribute value data = xattr_file.read("user.my_custom_key") print(f"Retrieved data: {data}") # Output: b'my custom metadata' # Remove an extended attribute xattr_file.remove("user.my_custom_key") # Handle non-existent attributes with error handling try: xattr_file.read("user.nonexistent") except OSError as e: print(f"Attribute not found: {e}") ``` -------------------------------- ### Measure Operation Time with PerfCounter Source: https://context7.com/perone/vectorvfs/llms.txt Utility for measuring elapsed time. Can be used as a context manager or a decorator. Import from vectorvfs.utils. ```python from vectorvfs.utils import PerfCounter # Use as a context manager with PerfCounter() as timer: # Perform some operation result = sum(range(1000000)) print(f"Operation took {timer.elapsed:.4f} seconds") # Output: Operation took 0.0234 seconds # Use as a decorator @PerfCounter() def expensive_operation(): return [x**2 for x in range(100000)] expensive_operation() ``` -------------------------------- ### Limit search results with vfs search Source: https://github.com/perone/vectorvfs/blob/main/docs/source/usage.md To display only the top N most similar files, use the -n flag followed by the desired number. This limits the output to the specified quantity of results. ```bash $ vfs search -n 3 cat /my_folder ``` -------------------------------- ### PerceptionEncoder Class Source: https://context7.com/perone/vectorvfs/llms.txt Implements Meta's Perception Encoder for encoding images and text into a shared 1024-dimensional embedding space. ```APIDOC ## PerceptionEncoder Class ### Description Provides methods for encoding both vision (images) and text inputs for semantic similarity computation using the PE-Core-L14-336 model. ``` -------------------------------- ### CLI: vfs search Source: https://context7.com/perone/vectorvfs/llms.txt Performs semantic search across files in a directory using cached embeddings or indexing new files. ```APIDOC ## CLI: vfs search ### Description Performs semantic search across files in a directory, automatically indexing files that haven't been processed and using cached embeddings for previously indexed files. ### Usage `vfs search [options] ` ### Parameters #### Options - **-r** (flag) - Optional - Recursive search through all subdirectories. - **-n** (integer) - Optional - Limit results to the top N matches. - **-f** (flag) - Optional - Force re-indexing of all files, ignoring cached embeddings. ``` -------------------------------- ### Force re-indexing with vfs search Source: https://github.com/perone/vectorvfs/blob/main/docs/source/usage.md To ensure VectorFS re-indexes files, use the -f flag with the search command. This is useful when files have changed and existing embeddings may be outdated. ```bash $ vfs search -f cat /my_folder ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.