### VLLMChatGenerator with Reasoning Models Source: https://docs.haystack.deepset.ai/docs/vllmchatgenerator Integrate reasoning capabilities by starting the vLLM server with `--reasoning-parser`. This example demonstrates how to get step-by-step reasoning for a mathematical problem. ```python from haystack.dataclasses import ChatMessage from haystack_integrations.components.generators.vllm import VLLMChatGenerator generator = VLLMChatGenerator(model="Qwen/Qwen3-0.6B") messages = [ChatMessage.from_user("Solve step by step: what is 15 * 37?")] response = generator.run(messages=messages) reply = response["replies"][0] if reply.reasoning: print("Reasoning:", reply.reasoning.reasoning_text) print("Answer:", reply.text) ``` -------------------------------- ### Build and Run a Query Pipeline with OpenSearchEmbeddingRetriever Source: https://docs.haystack.deepset.ai/docs/opensearchembeddingretriever Demonstrates how to set up an OpenSearchDocumentStore, embed documents, and then use the OpenSearchEmbeddingRetriever within a Haystack query pipeline. This example requires a running OpenSearch instance and the opensearch-haystack integration installed. ```python from haystack_integrations.components.retrievers.opensearch import ( OpenSearchEmbeddingRetriever, ) from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore from haystack.document_stores.types import DuplicatePolicy from haystack import Document from haystack import Pipeline from haystack.components.embedders import ( SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder, ) document_store = OpenSearchDocumentStore( hosts="http://localhost:9200", use_ssl=True, verify_certs=False, http_auth=("admin", "admin"), ) model = "sentence-transformers/all-mpnet-base-v2" documents = [ Document(content="There are over 7,000 languages spoken around the world today."), Document( content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", ), Document( content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", ), ] document_embedder = SentenceTransformersDocumentEmbedder(model=model) documents_with_embeddings = document_embedder.run(documents) document_store.write_documents( documents_with_embeddings.get("documents"), policy=DuplicatePolicy.SKIP, ) query_pipeline = Pipeline() query_pipeline.add_component( "text_embedder", SentenceTransformersTextEmbedder(model=model), ) query_pipeline.add_component( "retriever", OpenSearchEmbeddingRetriever(document_store=document_store), ) query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") query = "How many languages are there?" result = query_pipeline.run({"text_embedder": {"text": query}}) print(result["retriever"]["documents"][0]) ``` -------------------------------- ### Install OpenTelemetry SDK Source: https://docs.haystack.deepset.ai/docs/tracing Install the necessary OpenTelemetry SDK packages for tracing. ```shell pip install opentelemetry-sdk ``` ```shell pip install opentelemetry-exporter-otlp ``` -------------------------------- ### Initialize and Run ChromaQueryTextRetriever Source: https://docs.haystack.deepset.ai/docs/chromaqueryretriever Demonstrates how to initialize the ChromaDocumentStore and ChromaQueryTextRetriever, then run a query. ```python from haystack_integrations.document_stores.chroma import ChromaDocumentStore from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever document_store = ChromaDocumentStore() retriever = ChromaQueryTextRetriever(document_store=document_store) ## example run query retriever.run(query="How does Chroma Retriever work?") ``` -------------------------------- ### Initialize ValkeyDocumentStore Source: https://docs.haystack.deepset.ai/docs/valkeydocumentstore Connect to a Valkey server and initialize the document store. Ensure the Valkey server has the search module running. ```python from haystack_integrations.document_stores.valkey import ValkeyDocumentStore document_store = ValkeyDocumentStore( nodes_list=[("localhost", 6379)], index_name="my_documents", embedding_dim=768, distance_metric="cosine" ) ``` -------------------------------- ### Install sentence-transformers Source: https://docs.haystack.deepset.ai/docs/astraretriever Optionally install sentence-transformers if you need to run embedding generation examples locally. This library is used for creating text embeddings. ```shell pip install sentence-transformers ``` -------------------------------- ### Basic OpenRouterChatGenerator Usage Source: https://docs.haystack.deepset.ai/docs/openrouterchatgenerator A simple example of initializing and running the OpenRouterChatGenerator to get a chat response. ```python from haystack.dataclasses import ChatMessage from haystack_integrations.components.generators.openrouter import ( OpenRouterChatGenerator, ) client = OpenRouterChatGenerator() response = client.run([ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]) print(response["replies"][0].text) ``` -------------------------------- ### Initialize and Use QdrantSparseEmbeddingRetriever Source: https://docs.haystack.deepset.ai/docs/qdrantsparseembeddingretriever Demonstrates initializing a QdrantDocumentStore with sparse embedding support, writing a document, and then initializing and running the QdrantSparseEmbeddingRetriever with a query sparse embedding. ```python from haystack_integrations.components.retrievers.qdrant import ( QdrantSparseEmbeddingRetriever, ) from haystack_integrations.document_stores.qdrant import QdrantDocumentStore from haystack.dataclasses import Document, SparseEmbedding document_store = QdrantDocumentStore( ":memory:", use_sparse_embeddings=True, recreate_index=True, return_embedding=True, ) doc = Document( content="test", sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12]), ) document_store.write_documents([doc]) retriever = QdrantSparseEmbeddingRetriever(document_store=document_store) sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33]) retriever.run(query_sparse_embedding=sparse_embedding) ``` -------------------------------- ### Run OpenSearch locally with Docker Source: https://docs.haystack.deepset.ai/docs/opensearchhybridretriever Start an OpenSearch instance using Docker for local development. This example disables the security plugin for simplicity. ```docker docker run -d \ --name opensearch-nosec \ -p 9200:9200 \ -p 9600:9600 \ -e "discovery.type=single-node" \ -e "DISABLE_SECURITY_PLUGIN=true" \ opensearchproject/opensearch:2.12.0 ``` -------------------------------- ### Initialize and Use QdrantHybridRetriever Source: https://docs.haystack.deepset.ai/docs/qdranthybridretriever Demonstrates initializing a QdrantDocumentStore with sparse embedding support, writing a document with both dense and sparse embeddings, and then initializing and running the QdrantHybridRetriever with query embeddings. ```python from haystack_integrations.components.retrievers.qdrant import QdrantHybridRetriever from haystack_integrations.document_stores.qdrant import QdrantDocumentStore from haystack.dataclasses import Document, SparseEmbedding document_store = QdrantDocumentStore( ":memory:", use_sparse_embeddings=True, recreate_index=True, return_embedding=True, wait_result_from_api=True, ) doc = Document( content="test", embedding=[0.5] * 768, sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12]), ) document_store.write_documents([doc]) retriever = QdrantHybridRetriever(document_store=document_store) embedding = [0.1] * 768 sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33]) retriever.run(query_embedding=embedding, query_sparse_embedding=sparse_embedding) ``` -------------------------------- ### OpenAIChatGenerator in a Pipeline Source: https://docs.haystack.deepset.ai/docs/openaichatgenerator Provides a starting point for integrating OpenAIChatGenerator into a Haystack Pipeline. This example shows the necessary imports for building a chat pipeline. ```python from haystack.components.builders import ChatPromptBuilder from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage from haystack import Pipeline from haystack.utils import Secret ``` -------------------------------- ### Initialize InMemoryDocumentStore and Write Documents Source: https://docs.haystack.deepset.ai/docs/metallamachatgenerator Sets up an in-memory document store and populates it with sample documents. Ensure documents are properly formatted as Document objects. ```python from haystack import Document from haystack.document_stores.in_memory import InMemoryDocumentStore document_store = InMemoryDocumentStore() document_store.write_documents( [ Document(content="My name is Jean and I live in Paris."), Document(content="My name is Mark and I live in Berlin."), Document(content="My name is Giorgio and I live in Rome."), ], ) ``` -------------------------------- ### Create a HybridRetriever SuperComponent Source: https://docs.haystack.deepset.ai/docs/supercomponents This example demonstrates wrapping a pipeline that combines BM25 and embedding-based retrieval into a single SuperComponent. It requires installing haystack-ai, datasets, and sentence-transformers. ```python from haystack import Document, Pipeline, super_component from haystack.components.joiners import DocumentJoiner from haystack.components.embedders import SentenceTransformersTextEmbedder from haystack.components.retrievers import ( InMemoryBM25Retriever, InMemoryEmbeddingRetriever, ) from haystack.document_stores.in_memory import InMemoryDocumentStore from datasets import load_dataset @super_component class HybridRetriever: def __init__( self, document_store: InMemoryDocumentStore, embedder_model: str = "BAAI/bge-small-en-v1.5", ): embedding_retriever = InMemoryEmbeddingRetriever(document_store) bm25_retriever = InMemoryBM25Retriever(document_store) text_embedder = SentenceTransformersTextEmbedder(embedder_model) document_joiner = DocumentJoiner() self.pipeline = Pipeline() self.pipeline.add_component("text_embedder", text_embedder) self.pipeline.add_component("embedding_retriever", embedding_retriever) self.pipeline.add_component("bm25_retriever", bm25_retriever) self.pipeline.add_component("document_joiner", document_joiner) self.pipeline.connect("text_embedder", "embedding_retriever") self.pipeline.connect("bm25_retriever", "document_joiner") self.pipeline.connect("embedding_retriever", "document_joiner") dataset = load_dataset("HaystackBot/medrag-pubmed-chunk-with-embeddings", split="train") docs = [ Document(content=doc["contents"], embedding=doc["embedding"]) for doc in dataset ] document_store = InMemoryDocumentStore() document_store.write_documents(docs) query = "What treatments are available for chronic bronchitis?" result = HybridRetriever(document_store).run(text=query, query=query) print(result) ``` -------------------------------- ### Initialize and Run DocumentPreprocessor Source: https://docs.haystack.deepset.ai/docs/documentpreprocessor Demonstrates how to use the DocumentPreprocessor on its own. Instantiate the component and run it with a list of documents. ```python from haystack import Document from haystack.components.preprocessors import DocumentPreprocessor doc = Document(content="I love pizza!") preprocessor = DocumentPreprocessor() result = preprocessor.run(documents=[doc]) print(result["documents"]) ``` -------------------------------- ### Integrate LLMDocumentContentExtractor into a Haystack Pipeline Source: https://docs.haystack.deepset.ai/docs/llmdocumentcontentextractor Example of a Haystack pipeline using LLMDocumentContentExtractor to process image-based documents, followed by a DocumentSplitter and DocumentWriter. This setup extracts text, splits it, and stores it in an InMemoryDocumentStore. ```python from haystack import Pipeline from haystack.components.extractors.image import LLMDocumentContentExtractor from haystack.components.generators.chat import OpenAIChatGenerator from haystack.components.preprocessors import DocumentSplitter from haystack.components.writers import DocumentWriter from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.dataclasses import Document ## Create document store document_store = InMemoryDocumentStore() ## Create pipeline p = Pipeline() p.add_component( instance=LLMDocumentContentExtractor( chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"), file_path_meta_field="file_path", ), name="content_extractor", ) p.add_component(instance=DocumentSplitter(), name="splitter") p.add_component(instance=DocumentWriter(document_store=document_store), name="writer") ## Connect components p.connect("content_extractor.documents", "splitter.documents") p.connect("splitter.documents", "writer.documents") ## Create test documents docs = [ Document(content="", meta={"file_path": "scanned_document.pdf"}), Document(content="", meta={"file_path": "image_with_text.jpg"}), ] ## Run pipeline result = p.run({"content_extractor": {"documents": docs}}) ## Check results print(f"Successfully processed: {len(result['content_extractor']['documents'])}") print(f"Failed documents: {len(result['content_extractor']['failed_documents'])}") ## Access documents in the store stored_docs = document_store.filter_documents() print(f"Documents in store: {len(stored_docs)}") ``` -------------------------------- ### Basic Human-in-the-Loop Setup with SimpleConsoleUI Source: https://docs.haystack.deepset.ai/docs/human-in-the-loop Demonstrates a basic HITL setup using BlockingConfirmationStrategy, AlwaysAskPolicy, and SimpleConsoleUI. This configuration will prompt the user in the console before executing the 'send_email' tool. ```python from typing import Annotated from haystack.components.agents import Agent from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage from haystack.human_in_the_loop import ( AlwaysAskPolicy, BlockingConfirmationStrategy, SimpleConsoleUI, ) from haystack.tools import tool @tool def send_email( to: Annotated[str, "The recipient email address"], subject: Annotated[str, "The email subject line"], body: Annotated[str, "The email body"], ) -> str: """Send an email to a recipient.""" return f"Email sent to {to}." strategy = BlockingConfirmationStrategy( confirmation_policy=AlwaysAskPolicy(), confirmation_ui=SimpleConsoleUI(), ) agent = Agent( chat_generator=OpenAIChatGenerator(model="gpt-5.4-mini"), tools=[send_email], confirmation_strategies={"send_email": strategy}, ) result = agent.run( messages=[ChatMessage.from_user("Send a welcome email to alice@example.com")], ) ``` -------------------------------- ### Install OpenSearch with Docker Source: https://docs.haystack.deepset.ai/docs/opensearchbm25retriever Installs and runs an OpenSearch instance using Docker. Ensure Docker is set up before running these commands. ```shell docker pull opensearchproject/opensearch:2.11.0 ``` ```shell docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1024m -Xmx1024m" opensearchproject/opensearch:2.11.0 ``` -------------------------------- ### Create and Use InMemoryBM25Retriever in a Pipeline Source: https://docs.haystack.deepset.ai/docs/inmemorybm25retriever This example shows the complete setup for using InMemoryBM25Retriever in a Haystack pipeline. It includes creating the document store, retriever, adding documents, and running the pipeline. ```python from haystack import Document from haystack.components.retrievers.in_memory import InMemoryBM25Retriever from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.pipeline import Pipeline document_store = InMemoryDocumentStore() retriever = InMemoryBM25Retriever(document_store=document_store) pipeline = Pipeline() pipeline.add_component(instance=retriever, name="retriever") documents = [ Document(content="There are over 7,000 languages spoken around the world today."), Document( content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", ), Document( content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", ), ] document_store.write_documents(documents) result = pipeline.run(data={"retriever": {"query": "How many languages are there?"}}) print(result["retriever"]["documents"][0]) ``` -------------------------------- ### Define a function for tool calls Source: https://docs.haystack.deepset.ai/docs/vertexaigeminichatgenerator Defines a Python function and converts it into a Haystack Tool for use with Gemini models. This example shows how to define a function to get current weather information. ```python from typing import Annotated from haystack.tools import create_tool_from_function ## example function to get the current weather def get_current_weather( location: Annotated[ str, "The city for which to get the weather, e.g. 'San Francisco'", ] = "Munich", unit: Annotated[str, "The unit for the temperature, e.g. 'celsius'"] = "celsius", ) -> str: return f"The weather in {location} is sunny. The temperature is 20 {unit}." tool = create_tool_from_function(get_current_weather) ``` -------------------------------- ### Pipeline Setup with Qdrant Source: https://docs.haystack.deepset.ai/docs/sentencetransformerssparsetextembedder Demonstrates setting up a pipeline that uses SentenceTransformersSparseTextEmbedder and SentenceTransformersSparseDocumentEmbedder with a QdrantDocumentStore configured for sparse embeddings. This setup is necessary for sparse embedding retrieval. ```python from haystack import Document, Pipeline from haystack.components.embedders import ( SentenceTransformersSparseDocumentEmbedder, SentenceTransformersSparseTextEmbedder, ) from haystack_integrations.components.retrievers.qdrant import ( QdrantSparseEmbeddingRetriever, ) from haystack_integrations.document_stores.qdrant import QdrantDocumentStore document_store = QdrantDocumentStore( ":memory:", recreate_index=True, use_sparse_embeddings=True, ) documents = [ Document(content="My name is Wolfgang and I live in Berlin"), Document(content="I saw a black horse running"), Document(content="Germany has many big cities"), Document(content="Sentence Transformers provides sparse embedding models."), ] ``` -------------------------------- ### Pipeline with FileTypeRouter for Text Files Source: https://docs.haystack.deepset.ai/docs/filetyperouter Example of a pipeline that uses FileTypeRouter to forward only plain text files to a DocumentSplitter and then a DocumentWriter. Only the content of plain text files gets added to the InMemoryDocumentStore. ```python from haystack import Pipeline from haystack.components.routers import FileTypeRouter from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.converters import TextFileToDocument from haystack.components.preprocessors import DocumentSplitter from haystack.components.writers import DocumentWriter document_store = InMemoryDocumentStore() p = Pipeline() p.add_component( instance=FileTypeRouter(mime_types=["text/plain"]), name="file_type_router", ) p.add_component(instance=TextFileToDocument(), name="text_file_converter") p.add_component(instance=DocumentSplitter(), name="splitter") p.add_component(instance=DocumentWriter(document_store=document_store), name="writer") p.connect("file_type_router.text/plain", "text_file_converter.sources") p.connect("text_file_converter.documents", "splitter.documents") p.connect("splitter.documents", "writer.documents") p.run( { "file_type_router": { "sources": ["text-file-will-be-added.txt", "pdf-will-not-be-added.pdf"], }, }, ) ``` -------------------------------- ### Set up generators and inspect metadata Source: https://docs.haystack.deepset.ai/docs/fallbackchatgenerator This example demonstrates how to set up primary and backup generators using OpenAIChatGenerator and then wrap them in a FallbackChatGenerator. It also shows how to run the generator and inspect the metadata returned, including the index and class of the successful generator, the total number of attempts, and a list of failed generators. ```python from haystack.components.generators.chat import ( FallbackChatGenerator, OpenAIChatGenerator, ) from haystack.dataclasses import ChatMessage ## Set up generators primary = OpenAIChatGenerator(model="gpt-4o") backup = OpenAIChatGenerator(model="gpt-4o-mini") generator = FallbackChatGenerator(chat_generators=[primary, backup]) ## Run and inspect metadata result = generator.run(messages=[ChatMessage.from_user("Hello")]) meta = result["meta"] print( f"Successful generator index: {meta['successful_chat_generator_index']}", ) # 0 for first, 1 for second, etc. print( f"Successful generator class: {meta['successful_chat_generator_class']}", ) # e.g., "OpenAIChatGenerator" print( f"Total attempts made: {meta['total_attempts']}", ) # How many Generators were tried print( f"Failed generators: {meta['failed_chat_generators']}", ) # List of failed Generator names ``` -------------------------------- ### Basic Usage of Searchable Toolset with an Agent Source: https://docs.haystack.deepset.ai/docs/searchabletoolset Demonstrates how to initialize a Searchable Toolset with a catalog of tools and integrate it with an Agent. The agent initially only has access to `search_tools` and discovers other tools as needed. ```python from typing import Annotated from haystack.components.agents import Agent from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage from haystack.tools import create_tool_from_function, SearchableToolset def get_weather(city: Annotated[str, "The city to get the weather for"]) -> str: """Get current weather for a city.""" return f"Sunny, 22°C in {city}" def search_web(query: Annotated[str, "The search query"]) -> str: """Search the web for information.""" return f"Results for: {query}" # Build a catalog from tools catalog = [ create_tool_from_function(get_weather), create_tool_from_function(search_web), # ... many more tools ] toolset = SearchableToolset(catalog=catalog) agent = Agent( chat_generator=OpenAIChatGenerator(), tools=toolset, ) # The agent initially sees only `search_tools`. It will call it to find relevant tools, # then use the discovered tools to answer the question. result = agent.run(messages=[ChatMessage.from_user("What's the weather in Milan?")]) print(result["messages"][-1].text) ``` -------------------------------- ### RAG Pipeline with SerperDevWebSearch Source: https://docs.haystack.deepset.ai/docs/serperdevwebsearch Example of a RAG pipeline using SerperDevWebSearch to find relevant documents, LinkContentFetcher to get their content, and an LLM to generate an answer. Ensure to set the `top_k` parameter for the search if needed. ```python from haystack import Pipeline from haystack.utils import Secret from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder from haystack.components.fetchers import LinkContentFetcher from haystack.components.converters import HTMLToDocument from haystack.components.generators.chat import OpenAIChatGenerator from haystack.components.websearch import SerperDevWebSearch from haystack.dataclasses import ChatMessage from haystack.utils import Secret web_search = SerperDevWebSearch(api_key=Secret.from_token(""), top_k=2) link_content = LinkContentFetcher() html_converter = HTMLToDocument() prompt_template = [ ChatMessage.from_system("You are a helpful assistant."), ChatMessage.from_user( "Given the information below:\n" "{%% for document in documents %%}{{ document.content }}{%% endfor %%}\n" "Answer question: {{ query }}.\nAnswer:", ), ] prompt_builder = ChatPromptBuilder( template=prompt_template, required_variables={"query", "documents"}, ) llm = OpenAIChatGenerator( api_key=Secret.from_token(""), ) pipe = Pipeline() pipe.add_component("search", web_search) pipe.add_component("fetcher", link_content) pipe.add_component("converter", html_converter) pipe.add_component("prompt_builder", prompt_builder) pipe.add_component("llm", llm) pipe.connect("search.links", "fetcher.urls") pipe.connect("fetcher.streams", "converter.sources") pipe.connect("converter.documents", "prompt_builder.documents") pipe.connect("prompt_builder.messages", "llm.messages") query = "What is the most famous landmark in Berlin?" pipe.run(data={"search": {"query": query}, "prompt_builder": {"query": query}}) ``` -------------------------------- ### Set up and populate the document store Source: https://docs.haystack.deepset.ai/docs/fastembedlateinteractionranker Initializes an InMemoryDocumentStore and indexes documents using FastembedDocumentEmbedder. Ensure FastembedDocumentEmbedder is correctly configured. ```python document_store = InMemoryDocumentStore() docs = [ Document(content="Paris is the capital of France."), Document(content="Berlin is the capital of Germany."), Document(content="Madrid is the capital of Spain."), ] indexing = Pipeline() indexing.add_component("embedder", FastembedDocumentEmbedder()) indexing.add_component("writer", DocumentWriter(document_store=document_store)) indexing.connect("embedder", "writer") indexing.run({"embedder": {"documents": docs}}) ``` -------------------------------- ### Initialize and Run LinkContentFetcher Source: https://docs.haystack.deepset.ai/docs/linkcontentfetcher This example shows how to initialize the LinkContentFetcher component with default settings and then use it to fetch content from a given URL. Ensure the `urls` parameter is a list of strings. ```python from haystack.components.fetchers import LinkContentFetcher fetcher = LinkContentFetcher() fetcher.run(urls=["https://haystack.deepset.ai"]) ``` -------------------------------- ### Use GoogleGenAITextEmbedder in a Haystack Pipeline Source: https://docs.haystack.deepset.ai/docs/googlegenaitextembedder This example demonstrates how to set up a Haystack pipeline that uses GoogleGenAITextEmbedder for query embedding and an InMemoryEmbeddingRetriever for retrieving relevant documents. Ensure you have the necessary Haystack and Google GenAI integrations installed. ```python from haystack import Document from haystack import Pipeline from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack_integrations.components.embedders.google_genai import ( GoogleGenAITextEmbedder, ) from haystack_integrations.components.embedders.google_genai import ( GoogleGenAIDocumentEmbedder, ) from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") documents = [ Document(content="My name is Wolfgang and I live in Berlin"), Document(content="I saw a black horse running"), Document(content="Germany has many big cities"), ] document_embedder = GoogleGenAIDocumentEmbedder() documents_with_embeddings = document_embedder.run(documents)["documents"] document_store.write_documents(documents_with_embeddings) query_pipeline = Pipeline() query_pipeline.add_component("text_embedder", GoogleGenAITextEmbedder()) query_pipeline.add_component( "retriever", InMemoryEmbeddingRetriever(document_store=document_store), ) query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") query = "Who lives in Berlin?" result = query_pipeline.run({"text_embedder": {"text": query}}) print(result["retriever"]["documents"][0]) ``` -------------------------------- ### Indexing Pipeline with LinkContentFetcher Source: https://docs.haystack.deepset.ai/docs/linkcontentfetcher This example demonstrates how to set up an indexing pipeline using LinkContentFetcher to retrieve content from URLs and then convert and write it to a document store. Ensure all necessary components are imported and connected correctly. ```python from haystack import Pipeline from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.fetchers import LinkContentFetcher from haystack.components.converters import HTMLToDocument from haystack.components.writers import DocumentWriter document_store = InMemoryDocumentStore() fetcher = LinkContentFetcher() converter = HTMLToDocument() writer = DocumentWriter(document_store=document_store) indexing_pipeline = Pipeline() indexing_pipeline.add_component(instance=fetcher, name="fetcher") indexing_pipeline.add_component(instance=converter, name="converter") indexing_pipeline.add_component(instance=writer, name="writer") indexing_pipeline.connect("fetcher.streams", "converter.sources") indexing_pipeline.connect("converter.documents", "writer.documents") indexing_pipeline.run( data={ "fetcher": { "urls": [ "https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2", ], }, }, ) ``` -------------------------------- ### Integrate Amazon Bedrock Chat Generator in a RAG Pipeline Source: https://docs.haystack.deepset.ai/docs/amazonbedrockchatgenerator Shows how to incorporate the AmazonBedrockChatGenerator within a Haystack pipeline, specifically for a Retrieval-Augmented Generation (RAG) setup. This example uses ChatPromptBuilder to dynamically create messages. ```python from haystack import Pipeline from haystack.components.builders import ChatPromptBuilder from haystack.dataclasses import ChatMessage from haystack_integrations.components.generators.amazon_bedrock import ( AmazonBedrockChatGenerator, ) pipe = Pipeline() pipe.add_component("prompt_builder", ChatPromptBuilder()) pipe.add_component("llm", AmazonBedrockChatGenerator(model="meta.llama2-70b-chat-v1")) pipe.connect("prompt_builder", "llm") country = "Germany" system_message = ChatMessage.from_system( "You are an assistant giving out valuable information to language learners.", ) messages = [ system_message, ChatMessage.from_user("What's the official language of {{ country }}?"), ] res = pipe.run( data={ "prompt_builder": { "template_variables": {"country": country}, "template": messages, }, }, ) print(res) ``` -------------------------------- ### Tool Support Example Source: https://docs.haystack.deepset.ai/docs/nvidiachatgenerator Demonstrates how to configure the NvidiaChatGenerator with individual tools, toolsets, or a mix of both for function calling capabilities. ```APIDOC ## Tool Support `NvidiaChatGenerator` supports function calling through the `tools` parameter, which accepts flexible tool configurations: * **A list of Tool objects**: Pass individual tools as a list * **A single Toolset**: Pass an entire Toolset directly * **Mixed Tools and Toolsets**: Combine multiple Toolsets with standalone tools in a single list This allows you to organize related tools into logical groups while also including standalone tools as needed. ```python from haystack.tools import Tool, Toolset from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator # Create individual tools weather_tool = Tool(name="weather", description="Get weather info", ...) news_tool = Tool(name="news", description="Get latest news", ...) # Group related tools into a toolset math_toolset = Toolset([add_tool, subtract_tool, multiply_tool]) # Pass mixed tools and toolsets to the generator generator = NvidiaChatGenerator( tools=[math_toolset, weather_tool, news_tool] # Mix of Toolset and Tool objects ) ``` For more details on working with tools, refer to the Tool and Toolset documentation. ``` -------------------------------- ### VLLMChatGenerator with Tool Calling Source: https://docs.haystack.deepset.ai/docs/vllmchatgenerator Use this snippet to enable tool calling with the VLLMChatGenerator. Ensure the vLLM server is started with `--enable-auto-tool-choice` and `--tool-call-parser`. This example defines a 'weather' tool and uses it to answer a user's query. ```python from haystack.dataclasses import ChatMessage from haystack.tools import tool from haystack_integrations.components.generators.vllm import VLLMChatGenerator @tool def weather(city: str) -> str: """Get the weather in a given city.""" return f"The weather in {city} is sunny" generator = VLLMChatGenerator(model="Qwen/Qwen3-0.6B", tools=[weather]) messages = [ChatMessage.from_user("What is the weather in Paris?")] response = generator.run(messages=messages) print(response["replies"][0].tool_calls) ``` -------------------------------- ### Route messages using OpenAI for topic classification Source: https://docs.haystack.deepset.ai/docs/llmmessagesrouter This example demonstrates routing messages based on topic classification using OpenAI's GPT-4. A system prompt is used to guide the LLM to classify messages into 'animals' or 'politics'. ```python from haystack.components.generators.chat.openai import OpenAIChatGenerator from haystack.components.routers.llm_messages_router import LLMMessagesRouter from haystack.dataclasses import ChatMessage system_prompt = """Classify the given message into one of the following labels: - animals - politics Respond with the label only, no other text. """ chat_generator = OpenAIChatGenerator(model="gpt-4.1-mini") router = LLMMessagesRouter( chat_generator=chat_generator, system_prompt=system_prompt, output_names=["animals", "politics"], output_patterns=["animals", "politics"], ) messages = [ChatMessage.from_user("You are a crazy gorilla!")] print(router.run(messages)) ## { ## 'chat_generator_text': 'animals', ## 'unsafe': [ ## ChatMessage( ## _role=, ## _content=[TextContent(text='You are a crazy gorilla!')], ## _name=None, ## _meta={} ## ) ## ] ## } ``` -------------------------------- ### Start Basic vLLM Server Source: https://docs.haystack.deepset.ai/docs/vllmchatgenerator Start a vLLM server with a specified model. Ensure the server is running before using the VLLMChatGenerator. ```bash vllm serve Qwen/Qwen3-4B-Instruct-2507 ``` -------------------------------- ### Create and Run a SuperComponent with Custom Mappings Source: https://docs.haystack.deepset.ai/docs/supercomponents This example demonstrates creating a SuperComponent that wraps a pipeline for document retrieval, prompt building, and answer generation. It includes custom input and output mappings to simplify the interface. The pipeline is then run with a query. ```python from haystack import Pipeline, SuperComponent from haystack.components.generators.chat import OpenAIChatGenerator from haystack.components.builders import ChatPromptBuilder from haystack.components.retrievers import InMemoryBM25Retriever from haystack.dataclasses.chat_message import ChatMessage from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.dataclasses import Document document_store = InMemoryDocumentStore() documents = [ Document(content="Paris is the capital of France."), Document(content="London is the capital of England."), ] document_store.write_documents(documents) prompt_template = [ ChatMessage.from_user( ''' According to the following documents: {% for document in documents %} {{document.content}} {% endfor %} Answer the given question: {{query}} Answer: ''' ) ] prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables="*") pipeline = Pipeline() pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=document_store)) pipeline.add_component("prompt_builder", prompt_builder) pipeline.add_component("llm", OpenAIChatGenerator()) pipeline.connect("retriever.documents", "prompt_builder.documents") pipeline.connect("prompt_builder.prompt", "llm.messages") ## Create a super component with simplified input/output mapping wrapper = SuperComponent( pipeline=pipeline, input_mapping={ "query": ["retriever.query", "prompt_builder.query"], }, output_mapping={ "llm.replies": "replies", "retriever.documents": "documents" } ) ## Run the pipeline with simplified interface result = wrapper.run(query="What is the capital of France?") print(result) ``` -------------------------------- ### RAG Pipeline with MistralChatGenerator Source: https://docs.haystack.deepset.ai/docs/mistralchatgenerator Example of setting up a RAG pipeline that fetches content from a URL, converts it to documents, builds a prompt, and then uses MistralChatGenerator to answer a question based on the provided article. Ensure all necessary Haystack components and the MistralChatGenerator integration are installed. ```python from haystack import Document from haystack import Pipeline from haystack.components.builders import ChatPromptBuilder from haystack.components.generators.utils import print_streaming_chunk from haystack.components.fetchers import LinkContentFetcher from haystack.components.converters import HTMLToDocument from haystack.dataclasses import ChatMessage from haystack_integrations.components.generators.mistral import MistralChatGenerator fetcher = LinkContentFetcher() converter = HTMLToDocument() prompt_builder = ChatPromptBuilder(variables=["documents"]) llm = MistralChatGenerator( streaming_callback=print_streaming_chunk, model="mistral-small", ) message_template = """Answer the following question based on the contents of the article: {{query}} Article: {{documents[0].content}} """ messages = [ChatMessage.from_user(message_template)] rag_pipeline = Pipeline() rag_pipeline.add_component(name="fetcher", instance=fetcher) rag_pipeline.add_component(name="converter", instance=converter) rag_pipeline.add_component("prompt_builder", prompt_builder) rag_pipeline.add_component("llm", llm) rag_pipeline.connect("fetcher.streams", "converter.sources") rag_pipeline.connect("converter.documents", "prompt_builder.documents") rag_pipeline.connect("prompt_builder.prompt", "llm.messages") question = "What are the capabilities of Mixtral?" result = rag_pipeline.run( { "fetcher": {"urls": ["https://mistral.ai/news/mixtral-of-experts"]}, "prompt_builder": { "template_variables": {"query": question}, "template": messages, }, "llm": {"generation_kwargs": {"max_tokens": 165}}, }, ) ``` -------------------------------- ### Using AnswerBuilder in a Haystack Pipeline Source: https://docs.haystack.deepset.ai/docs/answerbuilder This example demonstrates how to integrate the AnswerBuilder into a Haystack pipeline. It shows the setup of a RAG pipeline with a retriever, prompt builder, and a chat generator, culminating in the use of AnswerBuilder to process the generator's replies and associated documents. ```python from haystack import Pipeline from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.retrievers.in_memory import InMemoryBM25Retriever from haystack.components.generators.chat import OpenAIChatGenerator from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder from haystack.components.builders.answer_builder import AnswerBuilder from haystack.utils import Secret from haystack.dataclasses import ChatMessage from haystack.dataclasses import Document prompt_template = [ ChatMessage.from_system("You are a helpful assistant."), ChatMessage.from_user( "Given these documents, answer the question.\nDocuments:\n" "{% for doc in documents %}{{ doc.content }}{% endfor %}\n" "Question: {{query}}\nAnswer:" ), ] docs = [ Document(content="The capital of France is Paris"), Document(content="The capital of England is London"), ] document_store = InMemoryDocumentStore() document_store.write_documents(docs) p = Pipeline() p.add_component( instance=InMemoryBM25Retriever(document_store=document_store), name="retriever", ) p.add_component( instance=ChatPromptBuilder( template=prompt_template, required_variables={"query", "documents"}, ), name="prompt_builder", ) p.add_component( instance=OpenAIChatGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY")), name="llm", ) p.add_component(instance=AnswerBuilder(), name="answer_builder") p.connect("retriever", "prompt_builder.documents") p.connect("prompt_builder", "llm.messages") p.connect("llm.replies", "answer_builder.replies") p.connect("retriever", "answer_builder.documents") query = "What is the capital of France?" result = p.run( { "retriever": {"query": query}, "prompt_builder": {"query": query}, "answer_builder": {"query": query}, }, ) print(result) ``` -------------------------------- ### Langfuse Connector with Agent and Tools Source: https://docs.haystack.deepset.ai/docs/langfuseconnector Demonstrates setting up an agent with tools and integrating the LangfuseConnector to trace its execution. Ensure Langfuse environment variables are set. ```python import os os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"] = "true" from typing import Annotated from haystack.components.agents import Agent from haystack.components.generators.chat import OpenAIChatGenerator from haystack.dataclasses import ChatMessage from haystack.tools import tool from haystack import Pipeline from haystack_integrations.components.connectors.langfuse import LangfuseConnector @tool def get_weather(city: Annotated[str, "The city to get weather for"]) -> str: """Get current weather information for a city.""" weather_data = { "Berlin": "18°C, partly cloudy", "New York": "22°C, sunny", "Tokyo": "25°C, clear skies", } return weather_data.get(city, f"Weather information for {city} not available") @tool def calculate( operation: Annotated[ str, "Mathematical operation: add, subtract, multiply, divide", ], a: Annotated[float, "First number"], b: Annotated[float, "Second number"], ) -> str: """Perform basic mathematical calculations.""" if operation == "add": result = a + b elif operation == "subtract": result = a - b elif operation == "multiply": result = a * b elif operation == "divide": if b == 0: return "Error: Division by zero" else: result = a / b else: return f"Error: Unknown operation '{operation}'" return f"The result of {a} {operation} {b} is {result}" if __name__ == "__main__": ## Create components chat_generator = OpenAIChatGenerator() agent = Agent( chat_generator=chat_generator, tools=[get_weather, calculate], system_prompt="You are a helpful assistant with access to weather and calculator tools. Use them when needed.", exit_conditions=["text"], ) langfuse_connector = LangfuseConnector("Agent Example") ## Create and run pipeline pipe = Pipeline() pipe.add_component("tracer", langfuse_connector) pipe.add_component("agent", agent) response = pipe.run( data={ "agent": { "messages": [ ChatMessage.from_user( "What's the weather in Berlin and calculate 15 + 27?", ), ], }, "tracer": {"invocation_context": {"test": "agent_with_tools"}}, }, ) print(response["agent"]["last_message"].text) print(response["tracer"]["trace_url"]) ``` -------------------------------- ### Start vLLM Server for Reasoning Models Source: https://docs.haystack.deepset.ai/docs/vllmchatgenerator Start a vLLM server with a reasoning parser enabled for models that support reasoning capabilities. ```bash vllm serve Qwen/Qwen3-0.6B --reasoning-parser qwen3 ``` -------------------------------- ### Install Haystack with uv Source: https://docs.haystack.deepset.ai/docs/installation Install Haystack using the uv package installer. uv is a fast Python package installer and dependency resolver. ```shell uv pip install haystack-ai ``` -------------------------------- ### Initialize and Use InMemoryBM25Retriever Source: https://docs.haystack.deepset.ai/docs/inmemorybm25retriever Demonstrates how to initialize an InMemoryDocumentStore, write documents to it, and then use InMemoryBM25Retriever to run a query. ```python from haystack import Document from haystack.components.retrievers.in_memory import InMemoryBM25Retriever from haystack.document_stores.in_memory import InMemoryDocumentStore document_store = InMemoryDocumentStore() documents = [ Document(content="There are over 7,000 languages spoken around the world today."), Document( content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", ), Document( content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", ), ] document_store.write_documents(documents=documents) retriever = InMemoryBM25Retriever(document_store=document_store) retriever.run(query="How many languages are spoken around the world today?") ``` -------------------------------- ### Use RecursiveDocumentSplitter in a Haystack Pipeline Source: https://docs.haystack.deepset.ai/docs/recursivesplitter This example demonstrates how to integrate the RecursiveDocumentSplitter into a Haystack pipeline for processing text files. It shows the setup of the pipeline with components for converting text files to documents, cleaning them, splitting them using RecursiveDocumentSplitter, and writing them to a document store. Ensure the 'path/to/your/files' is replaced with the actual directory containing your .md files. ```python from pathlib import Path from haystack import Document from haystack import Pipeline from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.converters.txt import TextFileToDocument from haystack.components.preprocessors import DocumentCleaner from haystack.components.preprocessors import RecursiveDocumentSplitter from haystack.components.writers import DocumentWriter document_store = InMemoryDocumentStore() p = Pipeline() p.add_component(instance=TextFileToDocument(), name="text_file_converter") p.add_component(instance=DocumentCleaner(), name="cleaner") p.add_component( instance=RecursiveDocumentSplitter( split_length=400, split_overlap=0, split_unit="char", separators=["\n\n", "\n", "sentence", " "], sentence_splitter_params={ "language": "en", "use_split_rules": True, "keep_white_spaces": False, }, ), name="recursive_splitter", ) p.add_component(instance=DocumentWriter(document_store=document_store), name="writer") p.connect("text_file_converter.documents", "cleaner.documents") p.connect("cleaner.documents", "splitter.documents") p.connect("splitter.documents", "writer.documents") path = "path/to/your/files" files = list(Path(path).glob("*.md")) p.run({"text_file_converter": {"sources": files}}) ```