Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Add Docs
TNTSearch
https://github.com/teamtnt/tntsearch
Admin
TNTSearch is a full-text search engine written entirely in PHP with support for fuzzy search,
...
Tokens:
8,097
Snippets:
79
Trust Score:
9.3
Update:
4 weeks ago
Context
Skills
Chat
Benchmark
93.5
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# TNTSearch TNTSearch is a full-text search (FTS) engine written entirely in PHP that allows you to add powerful search capabilities to any application with minimal configuration. It stores its indexes in SQLite databases, making it lightweight and easily deployable without external dependencies like Elasticsearch or Solr. The engine uses the BM25 ranking algorithm for relevance scoring and supports dynamic index updates without requiring complete reindexing. The library provides a comprehensive set of features including fuzzy search with configurable Levenshtein distance, boolean search operators, geo-spatial search capabilities, and text classification using Naive Bayes. It includes built-in stemming support for multiple languages (English, Arabic, Croatian, German, Italian, Portuguese, Russian, Ukrainian, and more), customizable tokenizers including n-gram tokenizers, result highlighting, and keyword extraction using the RAKE algorithm. ## Creating an Index Creates a searchable index from a database query result. The index is stored as a SQLite database file in the configured storage directory. Supports MySQL, PostgreSQL, SQLite, SQL Server, and Oracle as data sources. ```php use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig([ 'driver' => 'mysql', 'host' => 'localhost', 'database' => 'myapp', 'username' => 'root', 'password' => 'secret', 'storage' => '/var/www/app/storage/indexes/', 'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class ]); $indexer = $tnt->createIndex('articles.index'); $indexer->query('SELECT id, title, body FROM articles;'); $indexer->run(); // Output: // Total rows 1000 // Index created successfully in 2.35s ``` ## Performing a Basic Search Searches the index for a phrase and returns document IDs ranked by relevance using the BM25 algorithm. The second parameter limits the number of results returned. Results include document IDs, hit count, relevance scores, and execution time. ```php use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig([ 'driver' => 'mysql', 'host' => 'localhost', 'database'=> 'myapp', 'username'=> 'root', 'password'=> 'secret', 'storage' => '/var/www/app/storage/indexes/' ]); $tnt->selectIndex('articles.index'); $results = $tnt->search('machine learning algorithms', 12); print_r($results); // Array // ( // [ids] => Array ( [0] => 42 [1] => 78 [2] => 156 [3] => 23 ) // [hits] => 4 // [docScores] => Array ( [42] => 8.234 [78] => 6.891 [156] => 5.442 [23] => 4.123 ) // [execution_time] => 0.235 ms // ) // Retrieve actual records from database $ids = implode(',', $results['ids']); $articles = $pdo->query("SELECT * FROM articles WHERE id IN ($ids) ORDER BY FIELD(id, $ids)"); ``` ## Boolean Search Performs searches using boolean operators for complex queries. Supports AND (implicit), OR, and NOT (-) operators, along with parentheses for grouping. Returns documents matching the specified boolean expression. ```php use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex('articles.index'); // Find documents with "php" but NOT "java" $results = $tnt->searchBoolean('php -java'); // Returns all documents containing "php" that don't contain "java" // Find documents with "php" OR "python" $results = $tnt->searchBoolean('php or python'); // Returns all documents containing either "php" or "python" // Complex boolean query with grouping $results = $tnt->searchBoolean('(php mysql) or (python postgresql)'); // Returns documents that have both "php" AND "mysql", OR both "python" AND "postgresql" print_r($results); // Array // ( // [ids] => Array ( [0] => 15 [1] => 28 [2] => 45 ) // [hits] => 3 // [execution_time] => 0.312 ms // ) ``` ## Fuzzy Search Enables typo-tolerant search using Levenshtein distance. When fuzziness is enabled, the search finds documents containing words similar to the search terms. Configurable parameters control prefix length, maximum expansions, and edit distance tolerance. ```php use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex('articles.index'); // Enable fuzzy search $tnt->fuzziness(true); // Configure fuzzy search parameters $tnt->setFuzzyDistance(2); // Maximum Levenshtein distance (default: 2) $tnt->setFuzzyPrefixLength(2); // Characters that must match exactly at start (default: 2) $tnt->setFuzzyMaxExpansions(50); // Maximum fuzzy term expansions (default: 50) // Search with typos - "shakespere" will match "shakespeare" $results = $tnt->search('shakespere'); print_r($results); // Array // ( // [ids] => Array ( [0] => 234 [1] => 567 [2] => 891 ) // [hits] => 3 // [execution_time] => 1.234 ms // ) // Search with multiple typos - "algorythm" will match "algorithm" $results = $tnt->search('machine lerning algorythm'); ``` ## Updating the Index Dynamically Allows inserting, updating, and deleting documents in an existing index without full reindexing. This is essential for keeping search results current in applications with frequently changing data. ```php use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex('articles.index'); $index = $tnt->getIndex(); // Insert a new document $index->insert([ 'id' => 1001, 'title' => 'Introduction to Machine Learning', 'body' => 'Machine learning is a subset of artificial intelligence...' ]); // Update an existing document (re-indexes the content) $index->update(1001, [ 'id' => 1001, 'title' => 'Complete Guide to Machine Learning', 'body' => 'Machine learning is a powerful subset of artificial intelligence...' ]); // Delete a document from the index $index->delete(1001); // The index is now updated - no need to rebuild $results = $tnt->search('machine learning'); ``` ## Geo-Spatial Search Finds documents within a specified distance from a geographic location. Creates a separate geo-index storing latitude/longitude coordinates and uses efficient spatial queries to find nearby items sorted by distance. ```php use TeamTNT\TNTSearch\Indexer\TNTGeoIndexer; use TeamTNT\TNTSearch\TNTGeoSearch; // Create a geo index $geoIndexer = new TNTGeoIndexer; $geoIndexer->loadConfig($config); $geoIndexer->createIndex('restaurants.index'); $geoIndexer->query('SELECT id, longitude, latitude FROM restaurants;'); $geoIndexer->run(); // Search for nearby restaurants $geoSearch = new TNTGeoSearch; $geoSearch->loadConfig($config); $geoSearch->selectIndex('restaurants.index'); $currentLocation = [ 'latitude' => 48.137154, // Munich, Germany 'longitude' => 11.576124 ]; $radius = 5; // kilometers $limit = 10; // max results $results = $geoSearch->findNearest($currentLocation, $radius, $limit); print_r($results); // Array // ( // [ids] => Array ( [0] => 45 [1] => 78 [2] => 23 ) // [distances] => Array ( [0] => 0.5 [1] => 1.2 [2] => 2.8 ) // [hits] => 3 // [execution_time] => 0.456 ms // ) ``` ## Text Classification Implements Naive Bayes text classification for categorizing documents. Train the classifier with labeled examples, then predict the category of new text. The classifier can be saved and loaded for persistent use. ```php use TeamTNT\TNTSearch\Classifier\TNTClassifier; $classifier = new TNTClassifier(); // Train the classifier with labeled examples $classifier->learn('The team scored a goal in the final minute', 'Sports'); $classifier->learn('Great save by the goalkeeper', 'Sports'); $classifier->learn('The match ended in a draw', 'Sports'); $classifier->learn('New smartphone released with better camera', 'Technology'); $classifier->learn('AI breakthrough in natural language processing', 'Technology'); $classifier->learn('Stock market reaches all-time high', 'Finance'); $classifier->learn('Interest rates expected to rise', 'Finance'); // Predict category for new text $prediction = $classifier->predict('The goalkeeper made an incredible save'); print_r($prediction); // Array // ( // [likelihood] => -12.345 // [label] => Sports // ) // Save classifier for later use $classifier->save('/var/www/app/storage/news-classifier.cache'); // Load classifier in another request $loadedClassifier = new TNTClassifier(); $loadedClassifier->load('/var/www/app/storage/news-classifier.cache'); $result = $loadedClassifier->predict('Bitcoin price surges'); // [label] => Finance ``` ## Result Highlighting Highlights search terms in text results for better user experience. Wraps matched terms in customizable HTML tags with optional attributes like CSS classes. Supports whole-word matching and case sensitivity options. ```php use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex('articles.index'); $text = 'PHP is a popular programming language for web development. Many developers choose PHP for its simplicity.'; $searchTerms = 'php programming'; // Basic highlighting with default <em> tag $highlighted = $tnt->highlight($text, $searchTerms); // Output: <em>PHP</em> is a popular <em>programming</em> language for web development. Many developers choose <em>PHP</em> for its simplicity. // Custom tag with CSS class $highlighted = $tnt->highlight($text, $searchTerms, 'mark', [ 'tagOptions' => ['class' => 'search-highlight'] ]); // Output: <mark class="search-highlight">PHP</mark> is a popular <mark class="search-highlight">programming</mark> language... // Extract relevant snippet around search terms $longText = 'Lorem ipsum... [very long text] ...PHP is great for web development... [more text]'; $snippet = $tnt->snippet($searchTerms, $longText, 150, 30, '...'); // Output: ...text before PHP is great for web development and more context... ``` ## Custom Tokenizer Allows creating custom tokenizers for specialized text processing needs. Extend AbstractTokenizer and implement TokenizerInterface to define how text is split into searchable tokens. Useful for handling special characters, domain-specific formats, or language-specific requirements. ```php use TeamTNT\TNTSearch\TNTSearch; use TeamTNT\TNTSearch\Tokenizer\AbstractTokenizer; use TeamTNT\TNTSearch\Tokenizer\TokenizerInterface; // Create a custom tokenizer for code/technical content class CodeTokenizer extends AbstractTokenizer implements TokenizerInterface { // Split on whitespace but preserve underscores and camelCase protected static $pattern = '/[\s,\.\(\)\{\}\[\]]+/'; public function tokenize($text, $stopwords = []) { $text = mb_strtolower((string)$text); // Split camelCase: getUserName -> get user name $text = preg_replace('/([a-z])([A-Z])/', '$1 $2', $text); // Split snake_case: get_user_name -> get user name $text = str_replace('_', ' ', $text); $tokens = preg_split($this->getPattern(), $text, -1, PREG_SPLIT_NO_EMPTY); return array_diff($tokens, $stopwords); } } // Use custom tokenizer via config $tnt = new TNTSearch; $tnt->loadConfig([ 'driver' => 'mysql', 'host' => 'localhost', 'database' => 'codebase', 'username' => 'root', 'password' => 'secret', 'storage' => '/var/www/app/storage/indexes/', 'tokenizer' => CodeTokenizer::class ]); $indexer = $tnt->createIndex('codebase.index'); $indexer->query('SELECT id, function_name, docblock FROM functions;'); $indexer->run(); ``` ## N-Gram Tokenizer Provides n-gram tokenization for improved partial matching and typo tolerance. Breaks text into overlapping character sequences of configurable length, enabling substring matching without explicit fuzzy search. ```php use TeamTNT\TNTSearch\TNTSearch; use TeamTNT\TNTSearch\Tokenizer\NGramTokenizer; $tnt = new TNTSearch; $tnt->loadConfig($config); // Create tokenizer with trigrams (3-character sequences) $ngramTokenizer = new NGramTokenizer(3, 3); // Example: "hello" -> ["hel", "ell", "llo"] $tokens = $ngramTokenizer->tokenize('hello world'); print_r($tokens); // Array ( [0] => hel [1] => ell [2] => llo [3] => wor [4] => orl [5] => rld ) // Create index with n-gram tokenizer $indexer = $tnt->createIndex('cities.index'); $indexer->setTokenizer($ngramTokenizer); $indexer->query('SELECT id, city_name FROM cities;'); $indexer->run(); // Now partial matches work better // Searching "york" will match "New York" even with typos ``` ## Keyword Extraction with RAKE Extracts important keywords and phrases from text using the RAKE (Rapid Automatic Keyword Extraction) algorithm. Useful for auto-tagging content, generating search suggestions, or building topic summaries. ```php use TeamTNT\TNTSearch\KeywordExtraction\Rake; // Initialize RAKE with English stopwords $rake = new Rake('english'); $text = "Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention."; // Extract keywords with scores $keywords = $rake->extractKeywords($text, true); print_r($keywords); // Array // ( // [analytical model building] => 9.0 // [machine learning] => 4.0 // [artificial intelligence] => 4.0 // [data analysis] => 4.0 // [minimal human intervention] => 9.0 // ) // Extract keywords without scores $keywordList = $rake->extractKeywords($text, false); // Array ( [0] => analytical model building [1] => machine learning ... ) ``` ## Fuzzy Matching Helper Provides fuzzy string matching using vector similarity calculations. Finds strings that contain a common subsequence with the search pattern and ranks them by similarity. Ideal for autocomplete features and typo-tolerant lookups. ```php use TeamTNT\TNTSearch\TNTFuzzyMatch; $fuzzy = new TNTFuzzyMatch(); // Match against an array of items $items = [ 'JavaScript', 'TypeScript', 'CoffeeScript', 'Java', 'Python', 'Ruby' ]; $matches = $fuzzy->fuzzyMatch('java', $items); print_r($matches); // Array // ( // [Java] => 1.2 // Best match - contains "java" exactly // [JavaScript] => 0.89 // Contains "java" as prefix // ) // Match against a file (useful for large dictionaries) $matches = $fuzzy->fuzzyMatchFromFile('shakesp', '/var/www/app/data/authors.txt'); // Returns matches like "Shakespeare", "Shakespearean" ranked by similarity ``` ## Setting Custom Primary Key Configures a custom primary key field when your database table doesn't use 'id' as the identifier. Also allows making the primary key searchable if needed. ```php use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $indexer = $tnt->createIndex('products.index'); // Set custom primary key $indexer->setPrimaryKey('product_uuid'); // Make primary key searchable (disabled by default) $indexer->includePrimaryKey(); // Use custom stopwords $indexer->setStopWords(['the', 'a', 'an', 'is', 'are']); // Set language for stemming $indexer->setLanguage('german'); $indexer->query('SELECT product_uuid, name, description FROM products;'); $indexer->run(); // Search will now return product_uuid values instead of numeric ids $results = $tnt->search('laptop computer'); // [ids] => ['uuid-abc-123', 'uuid-def-456', ...] ``` ## Using Different Stemmers Configures language-specific stemmers for better search accuracy in non-English content. TNTSearch includes built-in stemmers for Arabic, Croatian, French, German, Italian, Latvian, Polish, Portuguese, Russian, and Ukrainian. ```php use TeamTNT\TNTSearch\TNTSearch; use TeamTNT\TNTSearch\Stemmer\GermanStemmer; use TeamTNT\TNTSearch\Stemmer\PorterStemmer; use TeamTNT\TNTSearch\Stemmer\RussianStemmer; $tnt = new TNTSearch; // German content $tnt->loadConfig([ 'driver' => 'mysql', 'host' => 'localhost', 'database'=> 'german_docs', 'username'=> 'root', 'password'=> 'secret', 'storage' => '/var/www/app/storage/indexes/', 'stemmer' => GermanStemmer::class ]); $indexer = $tnt->createIndex('german_articles.index'); $indexer->query('SELECT id, titel, inhalt FROM artikel;'); $indexer->run(); // Now searching "Häuser" will also find "Haus", "Hauses", etc. // Or set stemmer programmatically $indexer = $tnt->createIndex('russian_docs.index'); $indexer->setStemmer(new RussianStemmer()); $indexer->query('SELECT id, title, content FROM documents;'); $indexer->run(); ``` ## Summary TNTSearch is ideal for applications that need powerful full-text search without the complexity of dedicated search servers. Common use cases include e-commerce product search with fuzzy matching for typo tolerance, content management systems with multi-language support through pluggable stemmers, location-based applications using geo-search capabilities, and document classification systems using the built-in Naive Bayes classifier. The library excels in scenarios where you need search-as-you-type functionality, autocomplete features, or need to keep indexes synchronized with frequently changing data. Integration with PHP frameworks is straightforward, with official Laravel Scout driver support available. For standalone applications, simply instantiate TNTSearch, configure your database connection and storage path, create indexes from SQL queries, and perform searches. The modular architecture allows customization at every level - from tokenizers and stemmers to storage engines - making it adaptable to diverse search requirements while maintaining the simplicity of a pure PHP solution that requires only SQLite and PDO extensions.