### Quick Start: Basic Chunking Setup Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/SemanticChunking.md A quick start guide to setting up the SemanticTextChunker. It shows how to initialize the token counter (either for GPT-4 or embedding models) and an embedding provider, then create and use the chunker. ```csharp using AiGeekSquad.AIContext.Chunking; // Setup dependencies with specific tokenizer for your use case var tokenCounter = MLTokenCounter.CreateGpt4(); // For GPT-4 applications // OR for embedding applications: // var tokenCounter = MLTokenCounter.CreateTextEmbedding3Small(); // Align with your embedding model var embeddingGenerator = new YourEmbeddingProvider(); // Implement IEmbeddingGenerator // Create chunker var chunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator); // Chunk text var text = "Your document text here..."; await foreach (var chunk in chunker.ChunkAsync(text)) { Console.WriteLine($"Chunk: {chunk.Text}"); Console.WriteLine($"Tokens: {chunk.Metadata["TokenCount"]}"); } ``` -------------------------------- ### Dependency Injection Setup Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/README.md Guide on setting up Dependency Injection for AiGeekSquad.AIContext components, facilitating modularity and testability. ```C# using Microsoft.Extensions.DependencyInjection; public static class ServiceCollectionExtensions { public static IServiceCollection AddAIContext(this IServiceCollection services) { // Registering core services services.AddSingleton(); // Example implementation services.AddSingleton(); services.AddSingleton(); // Add other services as needed return services; } } ``` -------------------------------- ### Install AiGeekSquad.AIContext.MEAI Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext.MEAI/README.md Installs the AiGeekSquad.AIContext.MEAI package using the .NET CLI. ```bash dotnet add package AiGeekSquad.AIContext.MEAI ``` -------------------------------- ### Run AIContext Examples Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Instructions to execute example applications for basic text chunking and MMR demonstrations, including setting up a new console application. ```Bash dotnet run --project examples/ --configuration Release BasicChunking dotnet run --project examples/ --configuration Release MMR dotnet new console -n MyAIContextTest cd MyAIContextTest dotnet add package AiGeekSquad.AIContext ``` -------------------------------- ### OpenAI Integration Example Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/README.md Example demonstrating integration with OpenAI's embedding models using the IEmbeddingGenerator interface. ```C# using OpenAI; public class OpenAIEmbeddingGenerator : IEmbeddingGenerator { private readonly OpenAIClient _client; public OpenAIEmbeddingGenerator(string apiKey) { _client = new OpenAIClient(new OpenAIClientOptions { ApiKey = apiKey }); } public async Task GenerateEmbeddingAsync(string text) { var response = await _client.Embeddings.CreateAsync( model: "text-embedding-3-small", input: text ); return response.Data[0].Embedding.ToArray(); } } ``` -------------------------------- ### Basic Weighted Sum Ranking Example Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Demonstrates how to use the RankingEngine with WeightedSum strategy in C#. It defines custom scoring functions for relevance and popularity, assigns weights, and ranks a collection of documents. ```csharp using AiGeekSquad.AIContext.Ranking; using AiGeekSquad.AIContext.Ranking.Normalizers; // Define your item type public class Document { public string Title { get; set; } public string Content { get; set; } public double RelevanceScore { get; set; } public int PopularityRank { get; set; } } // Create scoring functions public class RelevanceScorer : IScoringFunction { public string Name => "Relevance"; public double ComputeScore(Document doc) => doc.RelevanceScore; public double[] ComputeScores(IReadOnlyList docs) => docs.Select(d => d.RelevanceScore).ToArray(); } public class PopularityScorer : IScoringFunction { public string Name => "Popularity"; public double ComputeScore(Document doc) => 1.0 / doc.PopularityRank; public double[] ComputeScores(IReadOnlyList docs) => docs.Select(d => 1.0 / d.PopularityRank).ToArray(); } // Use the ranking engine var engine = new RankingEngine(); var documents = GetDocuments(); // Your document collection var scoringFunctions = new[] { new WeightedScoringFunction(new RelevanceScorer(), 0.7), new WeightedScoringFunction(new PopularityScorer(), 0.3) }; var results = engine.Rank(documents, scoringFunctions); // Process results foreach (var result in results) { Console.WriteLine($"Rank {result.Rank}: {result.Item.Title}"); Console.WriteLine($" Final Score: {result.FinalScore:F3}"); Console.WriteLine($" Relevance: {result.IndividualScores["Relevance"]:F3}"); Console.WriteLine($" Popularity: {result.IndividualScores["Popularity"]:F3}"); } ``` -------------------------------- ### Project Setup for Contributors Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Steps for contributors to set up the project locally, including forking, cloning, creating branches, restoring dependencies, building, and testing. ```Bash git clone https://github.com/YOUR-USERNAME/AIContext.git cd AIContext git checkout -b feature/your-feature-name dotnet restore dotnet build dotnet test ``` -------------------------------- ### Install AiGeekSquad.AIContext Package Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext/README.md Installs the AiGeekSquad.AIContext NuGet package using the .NET CLI. This is the first step to using the library in your C# project. ```PowerShell dotnet add package AiGeekSquad.AIContext ``` -------------------------------- ### Similarity and Dissimilarity Ranking Example Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Illustrates combining similarity (relevance, recency) and dissimilarity (popularity rank) scoring functions using the RankingEngine in C#. It shows how to use positive and negative weights to influence the final ranking. ```csharp // Scoring function for recency (newer is better) public class RecencyScorer : IScoringFunction { private readonly DateTime _referenceDate; public string Name => "Recency"; public RecencyScorer(DateTime referenceDate) => _referenceDate = referenceDate; public double ComputeScore(Document doc) { var daysDiff = (_referenceDate - doc.PublishedDate).TotalDays; return Math.Max(0, 365 - daysDiff) / 365.0; } public double[] ComputeScores(IReadOnlyList docs) => docs.Select(ComputeScore).ToArray(); } // Combine similarity and dissimilarity var scoringFunctions = new[] { // Positive weight: reward high relevance new WeightedScoringFunction(new RelevanceScorer(), 1.0), // Negative weight: penalize high popularity rank (lower rank = more popular) new WeightedScoringFunction(new PopularityScorer(), -0.5), // Positive weight: reward recent documents new WeightedScoringFunction(new RecencyScorer(DateTime.Now), 0.3) }; var results = engine.Rank(documents, scoringFunctions); ``` -------------------------------- ### Using Normalization Strategies Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Demonstrates how to initialize the RankingEngine with different normalization strategies like MinMax, ZScore, and Percentile. Also shows how to apply per-function normalization. ```csharp using AiGeekSquad.AIContext.Ranking.Normalizers; // MinMax normalization (default) var minMaxEngine = new RankingEngine( defaultNormalizer: new MinMaxNormalizer() ); // ZScore normalization var zScoreEngine = new RankingEngine( defaultNormalizer: new ZScoreNormalizer() ); // Percentile normalization var percentileEngine = new RankingEngine( defaultNormalizer: new PercentileNormalizer() ); // Per-function normalization var scoringFunctions = new[] { new WeightedScoringFunction(new RelevanceScorer(), 0.7) { Normalizer = new MinMaxNormalizer() // Use MinMax for relevance }, new WeightedScoringFunction(new PopularityScorer(), 0.3) { Normalizer = new ZScoreNormalizer() // Use ZScore for popularity } }; var results = engine.Rank(documents, scoringFunctions); ``` -------------------------------- ### Creating Custom Ranking Strategies Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Provides an example of a custom ranking strategy by implementing the IRankingStrategy interface. This includes defining a name and implementing the CombineScores method with custom logic, such as exponential weighting. ```csharp public class CustomStrategy : IRankingStrategy { public string Name => "Custom"; public double CombineScores( IReadOnlyList scores, IReadOnlyList weights, RankingContext? context = null) { // Custom combination logic double result = 0; for (int i = 0; i < scores.Count; i++) { // Example: Apply exponential weighting result += Math.Pow(scores[i], weights[i]); } return result; } } ``` -------------------------------- ### Top-K Ranking Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Demonstrates how to retrieve only the top K ranked results from the engine. Shows examples of specifying K and optionally providing a custom ranking strategy for the top-K operation. ```csharp // Get only top 10 results var top10Results = engine.RankTopK(documents, scoringFunctions, k: 10); // With custom strategy var top5RRF = engine.RankTopK( documents, scoringFunctions, k: 5, strategy: new ReciprocalRankFusionStrategy(k: 30) ); ``` -------------------------------- ### Semantic Text Chunking Guide Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/README.md A comprehensive guide to intelligent text splitting, covering architecture, configuration, custom splitters, embedding providers, and performance optimization. ```C# /// /// Provides functionality for splitting text into meaningful chunks based on semantic understanding. /// public interface ITextSplitter { /// /// Splits the given text into chunks. /// /// The text to split. /// A list of text chunks. Task> SplitTextAsync(string text); } /// /// Represents a generator for text embeddings. /// public interface IEmbeddingGenerator { /// /// Generates an embedding for the given text. /// /// The text to generate an embedding for. /// The embedding vector. Task GenerateEmbeddingAsync(string text); } ``` -------------------------------- ### Install NuGet Packages for .NET RAG Source: https://github.com/aigeeksquad/aicontext/blob/main/examples/notebooks/beyond-basic-rag-mmr-complete-demo.ipynb Installs necessary NuGet packages for building RAG systems in .NET, including libraries for numerical computation, AI context, Ollama integration, and Microsoft's AI extensions. This setup is crucial for leveraging MMR and other AI features. ```C# // Install required NuGet packages #r "nuget: MathNet.Numerics, 5.0.0" #r "nuget: AiGeekSquad.AIContext, *-*" #r "nuget: AiGeekSquad.AIContext.MEAI, *-*" #r "nuget: OllamaSharp, *-*" #r "nuget: Microsoft.Extensions.AI.Abstractions, *-*" #r "nuget: Microsoft.Extensions.AI, *-*" #r "nuget: Microsoft.Extensions.DependencyInjection, *-*" #r "nuget: Microsoft.Extensions.Logging, *-*" #r "nuget: Microsoft.Extensions.Logging.Console, *-*" #r "nuget: Microsoft.Extensions.Configuration, *-*" #r "nuget: Microsoft.Extensions.Caching.Memory, *-*" using System; using OllamaSharp; using System.Collections.Generic; using System.Linq; using System.Threading; using System.Threading.Tasks; using Microsoft.Extensions.AI; using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.Logging; using Microsoft.Extensions.Configuration; using Microsoft.Extensions.Caching.Memory; using MathNet.Numerics.LinearAlgebra; using MathNet.Numerics; using AiGeekSquad.AIContext.Ranking; using AiGeekSquad.AIContext.Chunking; using AiGeekSquad.AIContext.MEAI; using IEmbeddingGenerator = Microsoft.Extensions.AI.IEmbeddingGenerator; Console.WriteLine("āœ… Packages loaded successfully!"); Console.WriteLine($"šŸ“¦ MathNet.Numerics: {typeof(MathNet.Numerics.Control).Assembly.GetName().Version}"); ``` -------------------------------- ### Search Platforms Integration Example (Azure Cognitive Search) Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/README.md Demonstrates integration with search platforms like Azure Cognitive Search for advanced search capabilities. ```C# // Assuming Azure Cognitive Search SDK is used // using Azure.Search.Documents; // using Azure.Search.Documents.Models; public class AzureCognitiveSearchIntegration { // private readonly SearchClient _searchClient; // public AzureCognitiveSearchIntegration(string endpoint, string apiKey, string indexName) // { // var credential = new AzureKeyCredential(apiKey); // _searchClient = new SearchClient(new Uri(endpoint), indexName, credential); // } // public async Task IndexDocumentAsync(string content, float[] embedding, string id) // { // var document = new SearchDocument // { // { "id", id }, // { "content", content }, // { "contentVector", new SearchVector { Value = embedding, K = 50, Fields = "content" } } // Example vector configuration // }; // await _searchClient.IndexDocumentsAsync(IndexDocumentsAction.Upload(document)); // } } ``` -------------------------------- ### Vector Databases Integration Example (Pinecone) Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/README.md Illustrates how to integrate AiGeekSquad.AIContext with vector databases like Pinecone for efficient similarity search. ```C# // Assuming a Pinecone client library is available // using Pinecone; public class PineconeIntegration { // private readonly PineconeClient _pineconeClient; // public PineconeIntegration(string apiKey, string environment) // { // _pineconeClient = new PineconeClient(apiKey, environment); // } // public async Task AddDocumentToPinecone(string indexName, string id, float[] embedding, Dictionary metadata) // { // var vector = new Pinecone.Vector { Id = id, Values = embedding, Metadata = metadata }; // await _pineconeClient.UpsertAsync(indexName, new List { vector }); // } } ``` -------------------------------- ### Azure Cognitive Services Integration Example Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/README.md Example showing integration with Azure Cognitive Services for embedding generation. ```C# using Azure.AI.OpenAI; using Azure.Core; public class AzureOpenAIEmbeddingGenerator : IEmbeddingGenerator { private readonly OpenAIClient _client; public AzureOpenAIEmbeddingGenerator(string endpoint, string apiKey) { var credential = new AzureKeyCredential(apiKey); _client = new OpenAIClient(new Uri(endpoint), credential); } public async Task GenerateEmbeddingAsync(string text) { var response = await _client.GetEmbeddingsAsync( deployment: "your-embedding-deployment-name", input: text ); return response.Value.Data[0].Embedding.ToArray(); } } ``` -------------------------------- ### Creating Custom Scoring Functions Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Provides an example of creating a custom scoring function by implementing the IScoringFunction interface. This includes defining a name, injecting dependencies (like IExternalService), and implementing methods for single item and batch scoring. ```csharp public class CustomScorer : IScoringFunction { private readonly IExternalService _service; public string Name => "CustomScore"; public CustomScorer(IExternalService service) => _service = service; public double ComputeScore(MyItem item) { // Custom scoring logic var features = ExtractFeatures(item); return _service.CalculateScore(features); } public double[] ComputeScores(IReadOnlyList items) { // Batch scoring for efficiency return _service.CalculateScoresBatch(items); } } ``` -------------------------------- ### Maximum Marginal Relevance (MMR) Algorithm Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/README.md A guide to the MMR implementation, including its mathematical foundation, performance benchmarks, API reference, and parameter tuning for various use cases. ```C# using MathNet.Numerics.LinearAlgebra; public class MMR { /// /// Selects a subset of items using the MMR algorithm to balance relevance and diversity. /// /// The embedding of the query. /// A list of embeddings for candidate items. /// The number of items to select. /// The diversity parameter (controls trade-off between relevance and diversity). /// A list of indices of the selected items. public List SelectItems(Vector queryEmbedding, List> candidateEmbeddings, int k, float lambda) { // Implementation details for MMR algorithm... return new List(); } // Example of similarity calculation public float CalculateCosineSimilarity(Vector vec1, Vector vec2) { return (float)vec1.DotProduct(vec2) / (vec1.Norm(2) * vec2.Norm(2)); } } ``` -------------------------------- ### Setup .NET Project with NuGet Packages Source: https://github.com/aigeeksquad/aicontext/blob/main/blogs/beyond-basic-rag-mmr-transforms-dotnet-apps/beyond-basic-rag-mmr-transforms-dotnet-apps.md Commands to add necessary NuGet packages to a .NET console application for implementing MMR and AI context features. ```bash dotnet add package AiGeekSquad.AIContext dotnet add package AiGeekSquad.AIContext.MEAI dotnet add package OllamaSharp dotnet add package Microsoft.Extensions.AI dotnet add package MathNet.Numerics ``` -------------------------------- ### Complete RAG Pipeline Example Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Demonstrates a full Retrieval Augmented Generation (RAG) pipeline. It includes generating query embeddings, retrieving relevant document chunks from a vector database, using MMR to select diverse context, and finally generating a response with the LLM. ```csharp public async Task ProcessUserQuery(string question) { // 1. Generate query embedding var queryEmbedding = await embeddingGenerator.GenerateEmbeddingAsync(question); // 2. Retrieve candidate chunks from vector database var candidates = await vectorDb.SearchSimilarAsync(queryEmbedding, topK: 20); // 3. Use MMR to select diverse, relevant context var selectedContext = MaximumMarginalRelevance.ComputeMMR( vectors: candidates.Select(c => c.Embedding).ToList(), query: queryEmbedding, lambda: 0.8, // Balance relevance vs diversity topK: 5 // Limit for LLM context window ); // 4. Generate response with selected context var contextText = string.Join("\n", selectedContext.Select(s => candidates[s.Index].Text)); return await llm.GenerateResponseAsync(question, contextText); } ``` -------------------------------- ### MMR Example Usage Source: https://github.com/aigeeksquad/aicontext/blob/main/mmr csharp optimised 1.ipynb Demonstrates how to use the ComputeMMR and ComputeMMRVerbose functions with a sample dataset of vectors and a query vector. It shows the output for both the clean and verbose versions of the algorithm. ```C# // Example usage with the optimized algorithm var vectors = new List> { Vector.Build.DenseOfArray(new double[] { 1, 0, 0 }), Vector.Build.DenseOfArray(new double[] { 1, 0, 0 }), Vector.Build.DenseOfArray(new double[] { 0, 1, 0 }), Vector.Build.DenseOfArray(new double[] { 0, 0, 1 }), Vector.Build.DenseOfArray(new double[] { 1, 1, 0 }), Vector.Build.DenseOfArray(new double[] { 1, 0, 1 }) }; var query = Vector.Build.DenseOfArray(new double[] { 1, 0, 0 }); "=== Optimized MMR (Clean Output) ===".Display(); var result = ComputeMMR(vectors, query, 0.5, 3); result.Display(); "=== Verbose MMR (With Debug Info) ===".Display(); var resultVerbose = ComputeMMRVerbose(vectors, query, 0.5, 3); resultVerbose.Display(); ``` -------------------------------- ### Generic Ranking Engine Example (C#) Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext/README.md Demonstrates how to use the Generic Ranking Engine to rank search results based on multiple criteria like relevance and popularity. It shows configuration of scoring functions, weights, normalizers, and ranking strategies. ```csharp using System; using System.Collections.Generic; using System.Linq; using AiGeekSquad.AIContext.Ranking; using AiGeekSquad.AIContext.Ranking.Normalizers; using AiGeekSquad.AIContext.Ranking.Strategies; // Example: Ranking search results with multiple criteria public class SearchResult { public string Title { get; set; } public double RelevanceScore { get; set; } public int PopularityRank { get; set; } public DateTime PublishedDate { get; set; } } // Custom scoring functions public class RelevanceScorer : IScoringFunction { public string Name => "Relevance"; public double ComputeScore(SearchResult item) => item.RelevanceScore; public double[] ComputeScores(IReadOnlyList items) => items.Select(ComputeScore).ToArray(); } public class PopularityScorer : IScoringFunction { public string Name => "Popularity"; public double ComputeScore(SearchResult item) => 1.0 / item.PopularityRank; public double[] ComputeScores(IReadOnlyList items) => items.Select(ComputeScore).ToArray(); } // Create search results var results = new List { new() { Title = "AI Guide", RelevanceScore = 0.9, PopularityRank = 5 }, new() { Title = "ML Tutorial", RelevanceScore = 0.7, PopularityRank = 1 }, new() { Title = "Data Science", RelevanceScore = 0.8, PopularityRank = 3 } }; // Configure scoring functions with weights and normalization var scoringFunctions = new List> { new(new RelevanceScorer(), weight: 0.7) { Normalizer = new MinMaxNormalizer() }, new(new PopularityScorer(), weight: 0.3) { Normalizer = new ZScoreNormalizer() } }; // Rank using WeightedSum strategy var engine = new RankingEngine(); var rankedResults = engine.Rank(results, scoringFunctions, new WeightedSumStrategy()); foreach (var result in rankedResults) { Console.WriteLine($"Rank {result.Rank}: {result.Item.Title} (Score: {result.FinalScore:F3})"); } ``` -------------------------------- ### Dependency Injection Setup for MEAI Adapter and Semantic Chunking Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext.MEAI/README.md Illustrates how to configure dependency injection for the AiGeekSquad.AIContext.MEAI adapter and the SemanticTextChunker. This includes registering the Microsoft Extensions AI embedding generator, AIContext components, the adapter, and the chunker itself. ```csharp using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.Hosting; using AiGeekSquad.AIContext.MEAI; using AiGeekSquad.AIContext.Chunking; using Microsoft.Extensions.AI; var builder = Host.CreateApplicationBuilder(args); // Register your Microsoft Extensions AI embedding generator // Example: Register OpenAI embedding generator builder.Services.AddSingleton>>(provider => { // Your specific embedding generator implementation return CreateYourEmbeddingGenerator(); // Replace with actual implementation }); // Register AIContext dependencies builder.Services.AddSingleton(); builder.Services.AddSingleton(); builder.Services.AddSingleton(); // Register the adapter builder.Services.AddSingleton(); // Register semantic chunker with all dependencies builder.Services.AddSingleton(); var app = builder.Build(); // Use the chunker var chunker = app.Services.GetRequiredService(); var chunks = await chunker.ChunkTextAsync("Your document text..."); ``` -------------------------------- ### Using RRF and Hybrid Ranking Strategies Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Illustrates the implementation of Reciprocal Rank Fusion (RRF) and Hybrid ranking strategies. Shows how to configure these strategies and apply them either by default or on a per-ranking call basis. ```csharp using AiGeekSquad.AIContext.Ranking.Strategies; // Reciprocal Rank Fusion var rrfStrategy = new ReciprocalRankFusionStrategy(k: 60); var rrfEngine = new RankingEngine(defaultStrategy: rrfStrategy); var rrfResults = rrfEngine.Rank(documents, scoringFunctions); // Hybrid strategy (70% WeightedSum, 30% RRF) var hybridStrategy = new HybridStrategy(alpha: 0.7, rrfK: 60); var hybridEngine = new RankingEngine(defaultStrategy: hybridStrategy); var hybridResults = hybridEngine.Rank(documents, scoringFunctions); // Pass strategy per ranking call var customResults = engine.Rank(documents, scoringFunctions, strategy: new ReciprocalRankFusionStrategy(k: 100)); ``` -------------------------------- ### Install Microsoft.Extensions.AI NuGet Package Source: https://github.com/aigeeksquad/aicontext/blob/main/mmr csharp optimised 1.ipynb This snippet shows how to reference the Microsoft.Extensions.AI NuGet package using the '#r' directive, commonly used in environments like .NET Interactive Notebooks. ```python #r "nuget: Microsoft.Extensions.AI" ``` -------------------------------- ### IRankingEngine Interface Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md The main interface for the Ranking Engine, responsible for orchestrating the ranking process. It provides methods for ranking a list of items and retrieving the top K ranked items. ```C# public interface IRankingEngine { IList> Rank( IReadOnlyList items, IReadOnlyList> scoringFunctions, IRankingStrategy? strategy = null); IList> RankTopK( IReadOnlyList items, IReadOnlyList> scoringFunctions, int k, IRankingStrategy? strategy = null); } ``` -------------------------------- ### Custom Text Splitter Configurations Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext/README.md Provides examples of creating custom SentenceTextSplitter instances, including a default splitter, a splitter with a custom pattern for numbered sections, and markdown-aware splitters. ```csharp // Default splitter - handles English titles automatically var defaultSplitter = SentenceTextSplitter.Default; // Custom pattern for numbered sections (e.g., legal documents) var customSplitter = SentenceTextSplitter.WithPattern(@"(?<=\.)\s+(?=\d+\.)"); // NEW: Markdown-aware splitters var markdownSplitter = SentenceTextSplitter.ForMarkdown(); var customMarkdownSplitter = SentenceTextSplitter.WithPatternForMarkdown(@"(?<=\.)\s+(?=[A-Z])"); // Use with semantic chunker var chunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator, customSplitter); // Use markdown splitter for documentation processing var markdownChunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator, markdownSplitter); ``` -------------------------------- ### Implement MMR with MEAI for Customer Support Source: https://github.com/aigeeksquad/aicontext/blob/main/examples/notebooks/beyond-basic-rag-mmr-complete-demo.ipynb This C# code demonstrates how to use Microsoft.Extensions.AI (MEAI) to generate embeddings for customer support solutions and a specific query ('app crashes on startup'). It then performs both traditional similarity search and Maximal Marginal Relevance (MMR) search to find the most relevant and diverse solutions. The example highlights the benefits of MMR in providing a broader range of troubleshooting steps compared to traditional methods. ```C# // Customer support: "app crashes on startup" // This example demonstrates MMR solving a customer support scenario // Generate embeddings for solution categories using MEAI var solutionTexts = new[] { "Clear app cache and data to resolve startup issues", "Restart the application to fix temporary glitches", "Reinstall the app to fix corrupted installation", "Check system requirements and compatibility", "Update device drivers for hardware compatibility", "Contact technical support for advanced troubleshooting" }; Console.WriteLine("šŸ”„ Generating embeddings using MEAI..."); // Generate embeddings for all solutions var solutionEmbeddings = new List<(string solution, Vector embedding)>(); foreach (var solution in solutionTexts) { var embeddingResult = await embeddingGenerator.GenerateVectorAsync(solution); var embedding = Vector.Build.DenseOfArray(embeddingResult.ToArray().Select(f => (double)f).ToArray()); solutionEmbeddings.Add((solution, embedding)); } // Generate query embedding var queryText = "app crashes on startup"; var queryEmbeddingResult = await embeddingGenerator.GenerateVectorAsync(queryText); var queryEmbedding = Vector.Build.DenseOfArray(queryEmbeddingResult.ToArray().Select(f => (double)f).ToArray()); Console.WriteLine($"āœ… Generated embeddings for {solutionEmbeddings.Count} solutions"); Console.WriteLine($"šŸ“Š Embedding dimensions: {queryEmbedding.Count}"); Console.WriteLine("\n=== Support Ticket: 'App won\'t start' ==="); // Traditional search (most similar) Console.WriteLine("\nšŸ” BEFORE - Traditional Search (top 3):"); var traditionalSupport = solutionEmbeddings .Select((sol, idx) => new { Index = idx, Solution = sol.solution, Similarity = 1.0 - Distance.Cosine(queryEmbedding.ToArray(), sol.embedding.ToArray()) }) .OrderByDescending(x => x.Similarity) .Take(3); foreach (var result in traditionalSupport) { Console.WriteLine($"• {result.Solution} (similarity: {result.Similarity:F3})"); } // MMR search (balanced relevance and diversity) Console.WriteLine("\n✨ AFTER - MMR Search (Ī» = 0.7):"); var mmrResults = MaximumMarginalRelevance.ComputeMMR( vectors: solutionEmbeddings.Select(s => s.embedding).ToList(), query: queryEmbedding, lambda: 0.7, topK: 3 ); foreach (var (index, score) in mmrResults) { var solution = solutionEmbeddings[index].solution; var similarity = 1.0 - Distance.Cosine(queryEmbedding.ToArray(), solutionEmbeddings[index].embedding.ToArray()); Console.WriteLine($"• {solution} (similarity: {similarity:F3}, MMR score: {score:F3})"); } Console.WriteLine("\nāœ… MMR provides diverse troubleshooting approaches instead of repetitive similar solutions!"); ``` -------------------------------- ### Creating Custom Normalizers Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Shows how to implement a custom score normalizer by inheriting from IScoreNormalizer. This example implements logarithmic scaling, handling edge cases like empty or zero scores. ```csharp public class LogNormalizer : IScoreNormalizer { public string Name => "Logarithmic"; public double[] Normalize(double[] scores) { if (scores == null || scores.Length == 0) return scores; // Apply logarithmic scaling var minScore = scores.Where(s => s > 0).DefaultIfEmpty(1).Min(); return scores.Select(s => s > 0 ? Math.Log(s / minScore + 1) : 0).ToArray(); } } ``` -------------------------------- ### Running MMR Benchmarks with .NET CLI Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext.Benchmarks/README.md Commands to build and run MMR algorithm benchmarks using the .NET CLI. Includes options for running all benchmarks or filtering specific ones. ```bash # Build the project dotnet build src/AiGeekSquad.AIContext.Benchmarks/ # Run all benchmarks dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release # Run specific benchmark method dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --filter "*ComputeMMR_Balanced*" ``` -------------------------------- ### Benchmark Command Line Options Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Commands to run specific benchmarks (MMR, semantic chunking) or all benchmarks, with options for filtering. ```Bash dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release mmr dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release semantic dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release all dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --filter "*MMR*" dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --filter "*Chunking*" ``` -------------------------------- ### Benchmark Export Options Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Commands to export benchmark results to JSON, HTML, or multiple formats. ```Bash dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --exporters json dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --exporters html dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --exporters json html ``` -------------------------------- ### Build and Test AIContext Project Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Commands to restore dependencies, build the project in debug and release modes, and run unit tests, including those with code coverage. ```Bash git clone https://github.com/AiGeekSquad/AIContext.git cd AIContext dotnet restore dotnet build dotnet build --configuration Release dotnet test dotnet test --collect:"XPlat Code Coverage" dotnet test --filter "SemanticChunkingTests" dotnet test --filter "MaximumMarginalRelevanceTests" ``` -------------------------------- ### Running MMR Benchmarks in Interactive Mode Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext.Benchmarks/README.md Instructions to run the compiled benchmark executable directly in interactive mode after building the project. ```bash # Run the executable directly cd src/AiGeekSquad.AIContext.Benchmarks/bin/Release/net9.0/ ./AiGeekSquad.AIContext.Benchmarks.exe ``` -------------------------------- ### BenchmarkDotNet Configuration for MMR Benchmarks Source: https://github.com/aigeeksquad/aicontext/blob/main/src/AiGeekSquad.AIContext.Benchmarks/README.md Illustrates the configuration aspects for BenchmarkDotNet, including GC modes, memory diagnostics, export formats, and statistical analysis, as used in the MMR algorithm benchmarks. ```csharp // Example configuration within BenchmarkConfig.cs (conceptual) // using BenchmarkDotNet.Configs; // using BenchmarkDotNet.Jobs; // using BenchmarkDotNet.Toolchains.Discover; // public class BenchmarkConfig : ManualConfig // { // public BenchmarkConfig() // { // // Multiple GC modes // AddJob(Job.Default.WithGcServer(true)); // AddJob(Job.Default.WithGcServer(false)); // // Memory diagnostics // // AddDiagnoser(new MemoryDiagnoser()); // // Multiple export formats // // AddExporter(new ConsoleExporter()); // // AddExporter(new MarkdownExporter()); // // AddExporter(new HtmlExporter()); // // Statistical analysis // // SummaryStyle = SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage); // } // } ``` -------------------------------- ### Run and Export AIContext Benchmarks Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Commands to execute all or specific benchmarks within the AIContext project and export the results in JSON and HTML formats. ```Bash dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --filter "*MMR*" dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --filter "*Chunking*" dotnet run --project src/AiGeekSquad.AIContext.Benchmarks/ --configuration Release -- --exporters json html ``` -------------------------------- ### Proper Tokenizer Alignment Example Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/SemanticChunking.md Provides an example of correctly aligning the MLTokenCounter with an embedding model (text-embedding-3-small) and setting up the SemanticTextChunker. This ensures chunks do not exceed embedding model limits. ```csharp // For text-embedding-3-small model var tokenCounter = MLTokenCounter.CreateTextEmbedding3Small(); var embeddingGenerator = new OpenAIEmbeddingGenerator("text-embedding-3-small"); var chunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator); ``` -------------------------------- ### Batch Scoring with ComputeScores Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Implements batch processing for scoring items to improve performance by processing all items at once. ```C# public double[] ComputeScores(IReadOnlyList items) { // Process all items at once for better performance return BatchProcess(items); } ``` -------------------------------- ### Caching Scores with CachedScorer Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Demonstrates how to cache scores for expensive computations using an in-memory cache to avoid redundant calculations. ```C# public class CachedScorer : IScoringFunction { private readonly IScoringFunction _innerScorer; private readonly IMemoryCache _cache; public double ComputeScore(T item) { var key = GetCacheKey(item); return _cache.GetOrCreate(key, entry => { entry.SlidingExpiration = TimeSpan.FromMinutes(5); return _innerScorer.ComputeScore(item); }); } } ``` -------------------------------- ### Custom Score Normalization Interface Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Architecture.md Provides an interface for custom score normalization. The LogNormalizer class is an example that applies a logarithmic transformation to the scores. ```csharp public interface IScoreNormalizer { string Name { get; } double[] Normalize(double[] scores); } public class LogNormalizer : IScoreNormalizer { public string Name => "Logarithmic"; public double[] Normalize(double[] scores) { return scores.Select(s => Math.Log(1 + s)).ToArray(); } } ``` -------------------------------- ### Azure Cognitive Search Integration Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/SemanticChunking.md Example of indexing text chunks into Azure Cognitive Search, mapping chunk content and metadata to search document fields. ```csharp // Azure Cognitive Search integration public async Task IndexDocumentAsync(Document document) { var searchDocuments = new List(); await foreach (var chunk in chunker.ChunkAsync(document.Content, document.Metadata)) { var searchDoc = new SearchDocument { ["id"] = $"{document.Id}_{chunk.StartIndex}", ["content"] = chunk.Text, ["tokens"] = chunk.Metadata["TokenCount"], ["source"] = chunk.Metadata["Source"] }; searchDocuments.Add(searchDoc); } await searchClient.IndexDocumentsAsync(searchDocuments); } ``` -------------------------------- ### Chunk Quality Monitoring Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/SemanticChunking.md Example of monitoring chunk quality during the chunking process, logging warnings for chunks below minimum token thresholds or when fallback chunking is used. ```csharp // Monitor chunk quality await foreach (var chunk in chunker.ChunkAsync(text, options)) { var tokenCount = (int)chunk.Metadata["TokenCount"]; var segmentCount = (int)chunk.Metadata["SegmentCount"]; // Log potential issues if (tokenCount < options.MinTokensPerChunk * 0.8) logger.LogWarning($"Chunk below minimum threshold: {tokenCount} tokens"); if (chunk.Metadata.ContainsKey("IsFallback")) logger.LogInfo("Fallback chunking used"); } ``` -------------------------------- ### Parallel Scoring for Large Datasets Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Utilizes parallel processing to compute scores for large datasets, enhancing throughput by distributing the workload across multiple threads. ```C# public double[] ComputeScores(IReadOnlyList items) { var scores = new double[items.Count]; Parallel.For(0, items.Count, i => { scores[i] = ComputeScore(items[i]); }); return scores; } ``` -------------------------------- ### RankedResult Class Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md Represents the result of a ranking operation for a single item. It includes the item itself, its final computed score, individual scores from each function, its rank, and associated metadata. ```C# public class RankedResult { public T Item { get; } public double FinalScore { get; } public IReadOnlyDictionary IndividualScores { get; } public int Rank { get; set; } public Dictionary Metadata { get; } } ``` -------------------------------- ### Performance Recommendations by Use Case Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/BenchmarkResults.md Offers specific performance tuning recommendations for MMR, Ranking, and Chunking based on different use cases like real-time applications, batch processing, and memory-constrained environments. ```APIDOC Performance Recommendations by Use Case: Real-time Applications: MMR: Smaller vector counts, moderate dimensions Ranking: WeightedSum, MinMax normalization Chunking: 256-512 tokens/chunk, buffer size 1-2 Batch Processing: MMR: Larger datasets, enable caching Ranking: Any strategy, Hybrid for flexibility Chunking: 512-1024 tokens/chunk, buffer size 2-3, caching enabled Memory-Constrained Environments: MMR: Limit dimensions, memory-focused variants Ranking: Single functions, avoid expensive scoring Chunking: Streaming processing, smaller cache sizes ``` -------------------------------- ### WeightedScoringFunction Class Source: https://github.com/aigeeksquad/aicontext/blob/main/docs/RankingAPI_Usage.md A class that encapsulates a scoring function along with its associated weight and an optional normalizer. This allows for flexible configuration of how each scoring function contributes to the final ranking. ```C# public class WeightedScoringFunction { public IScoringFunction Function { get; } public double Weight { get; } public IScoreNormalizer? Normalizer { get; set; } } ``` -------------------------------- ### MMR Algorithm Benchmark Scenarios Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Demonstrates various benchmark methods for the MMR algorithm, highlighting configurations for pure relevance, pure diversity, balanced selection, and memory allocation analysis. Includes performance characteristics like processing time, memory usage, and complexity. ```csharp // MMR Benchmark Scenarios // Main benchmark with parameter combinations // Uses [Params] for comprehensive testing public void ComputeMMR() // Pure relevance selection // Lambda = 1.0 (relevance only) public void ComputeMMR_PureRelevance() // Pure diversity selection // Lambda = 0.0 (diversity only) public void ComputeMMR_PureDiversity() // Balanced selection // Lambda = 0.5 (balanced approach) public void ComputeMMR_Balanced() // Memory allocation analysis // Includes forced GC for accurate measurement public void ComputeMMR_MemoryFocused() ``` -------------------------------- ### Semantic Chunking Benchmark Scenarios Source: https://github.com/aigeeksquad/aicontext/blob/main/README.md Outlines different benchmark methods for semantic text chunking, covering baseline performance, default configurations, speed and quality optimizations, buffer size impacts, and caching effectiveness. Includes parameter details for document size and chunking options. ```csharp // Semantic Chunking Benchmark Scenarios // Baseline benchmark // Uses parameterized configurations public void SemanticChunking_Complete() // Default configuration performance // Standard SemanticChunkingOptions.Default public void SemanticChunking_DefaultOptions() // Speed-optimized configuration // Buffer=1, Threshold=0.75, Cache=true public void SemanticChunking_OptimizedForSpeed() // Quality-optimized configuration // Buffer=3, Threshold=0.90, MaxTokens=1024 public void SemanticChunking_OptimizedForQuality() // Buffer size impact (small) // BufferSize=1 public void SemanticChunking_SmallBuffer() // Buffer size impact (large) // BufferSize=4 public void SemanticChunking_LargeBuffer() // Cache miss performance // Fresh chunker instance public void SemanticChunking_CachingFirstPass() // No caching baseline // Caching disabled public void SemanticChunking_NoCaching() ```