### Install Haystack Source: https://docs.haystack.deepset.ai/docs/get_started Installs the minimal version of the Haystack library. Ensure you are using a clean Python environment, especially if you have previously installed `farm-haystack`. ```shell pip install haystack-ai ``` -------------------------------- ### Uninstall and Install Haystack Source: https://docs.haystack.deepset.ai/docs/get_started Removes both `farm-haystack` and `haystack-ai` to prevent environment conflicts, then installs `haystack-ai`. This is recommended for users migrating from or encountering issues with previous versions. ```bash pip uninstall -y farm-haystack haystack-ai pip install haystack-ai ``` -------------------------------- ### Initialize InMemoryDocumentStore (Python) Source: https://docs.haystack.deepset.ai/docs/inmemorydocumentstore This snippet shows how to initialize the InMemoryDocumentStore, which requires no external setup and has no dependencies. It's a straightforward way to get started with Haystack for experimentation. ```python from haystack.document_stores.in_memory import InMemoryDocumentStore document_store = InMemoryDocumentStore() ``` -------------------------------- ### Install openapi3 Dependency Source: https://docs.haystack.deepset.ai/docs/openapiserviceconnector Install the optional 'openapi3' dependency required for the OpenAPIServiceConnector. This is a one-time setup step before using the connector. ```shell pip install openapi3 ``` -------------------------------- ### Haystack 2.x Pipeline Definition Source: https://docs.haystack.deepset.ai/docs/migration Example demonstrating pipeline construction in Haystack 2.x. Components are first added using `add_component` and then explicitly connected using the `connect` method to define the graph structure. ```python pipeline = Pipeline() component_1 = SomeComponent() component_2 = AnotherComponent() pipeline.add_component("Comp_1", component_1) pipeline.add_component("Comp_2", component_2) pipeline.connect("Comp_1", "Comp_2") ``` -------------------------------- ### Build a RAG Pipeline with Haystack Source: https://docs.haystack.deepset.ai/docs/get_started Demonstrates building a Retrieval Augmented Generation (RAG) pipeline using Haystack. It initializes an in-memory document store, adds documents, sets up a retriever and a chat generator, and connects them in a pipeline to answer a question. ```python from haystack import Pipeline, Document from haystack.utils import Secret from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.retrievers.in_memory import InMemoryBM25Retriever from haystack.components.generators.chat import OpenAIChatGenerator from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder from haystack.dataclasses import ChatMessage # Write documents to InMemoryDocumentStore document_store = InMemoryDocumentStore() document_store.write_documents([ Document(content="My name is Jean and I live in Paris."), Document(content="My name is Mark and I live in Berlin."), Document(content="My name is Giorgio and I live in Rome.") ]) # Build a RAG pipeline prompt_template = [ ChatMessage.from_system("You are a helpful assistant."), ChatMessage.from_user( "Given these documents, answer the question.\n" "Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n" "Question: {{question}}\n" "Answer:" ) ] # Define required variables explicitly prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"question", "documents"}) retriever = InMemoryBM25Retriever(document_store=document_store) llm = OpenAIChatGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY")) rag_pipeline = Pipeline() rag_pipeline.add_component("retriever", retriever) rag_pipeline.add_component("prompt_builder", prompt_builder) rag_pipeline.add_component("llm", llm) rag_pipeline.connect("retriever", "prompt_builder.documents") rag_pipeline.connect("prompt_builder", "llm.messages") # Ask a question question = "Who lives in Paris?" results = rag_pipeline.run( { "retriever": {"query": question}, "prompt_builder": {"question": question}, } ) print(results["llm"]["replies"]) ``` -------------------------------- ### Haystack 1.x RAG Pipeline Setup Source: https://docs.haystack.deepset.ai/docs/migration This snippet demonstrates setting up a RAG pipeline in Haystack 1.x. It involves initializing an InMemoryDocumentStore, loading and writing documents, configuring an EmbeddingRetriever, and defining a PromptTemplate for the PromptNode. The pipeline connects the retriever to the prompt node for query processing. ```python from datasets import load_dataset from haystack.pipelines import Pipeline from haystack.document_stores import InMemoryDocumentStore from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate, AnswerParser document_store = InMemoryDocumentStore(embedding_dim=384) dataset = load_dataset("bilgeyucel/seven-wonders", split="train") document_store.write_documents(dataset) retriever = EmbeddingRetriever(embedding_model="sentence-transformers/all-MiniLM-L6-v2", document_store=document_store, top_k=2) document_store.update_embeddings(retriever) rag_prompt = PromptTemplate( prompt="""Synthesize a comprehensive answer from the following text for the given question. Provide a clear and concise response that summarizes the key points and information presented in the text. Your answer should be in your own words and be no longer than 50 words. \n\n Related text: {join(documents)} \n\n Question: {query} \n\n Answer:""", output_parser=AnswerParser(), ) prompt_node = PromptNode(model_name_or_path="gpt-3.5-turbo", api_key=OPENAI_API_KEY, default_prompt_template=rag_prompt) pipe = Pipeline() pipe.add_node(component=retriever, name="retriever", inputs=["Query"]) pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"]) output = pipe.run(query="What does Rhodes Statue look like?") ``` -------------------------------- ### Haystack 1.x Pipeline Definition Source: https://docs.haystack.deepset.ai/docs/migration Example of how to define a pipeline in Haystack 1.x by sequentially adding nodes. Edges are automatically created in the order nodes are added. ```python pipeline = Pipeline() node_1 = SomeNode() node_2 = AnotherNode() pipeline.add_node(node_1, name="Node_1", inputs=["Query"]) pipeline.add_node(node_2, name="Node_2", inputs=["Node_1"]) ``` -------------------------------- ### LLMEvaluator Example Data Format Source: https://docs.haystack.deepset.ai/docs/llmevaluator Illustrates the expected format for providing few-shot examples to the LLMEvaluator. Each example must include 'inputs' and 'outputs' dictionaries, with nested structures for questions, contexts, statements, and their scores. ```python [ { "inputs": { "questions": "What is the capital of Italy?", "contexts": ["Rome is the capital of Italy."] }, "outputs": { "statements": ["Rome is the capital of Italy.", "Rome has more than 4 million inhabitants."], "statement_scores": [1, 0] } } ] ``` -------------------------------- ### Extractive QA Pipeline - Haystack 1.x vs 2.x Source: https://docs.haystack.deepset.ai/docs/migration This snippet showcases the implementation of an extractive question-answering pipeline using Haystack. It includes setting up an in-memory document store, adding documents, configuring a retriever and a reader, and running a query. The example is provided for both Haystack 1.x and 2.x, highlighting the differences in API usage and component connection. ```python from haystack.document_stores import InMemoryDocumentStore from haystack.pipelines import ExtractiveQAPipeline from haystack import Document from haystack.nodes import BM25Retriever from haystack.nodes import FARMReader document_store = InMemoryDocumentStore(use_bm25=True) document_store.write_documents([ Document(content="Paris is the capital of France."), Document(content="Berlin is the capital of Germany."), Document(content="Rome is the capital of Italy."), Document(content="Madrid is the capital of Spain."), ]) retriever = BM25Retriever(document_store=document_store) reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2") extractive_qa_pipeline = ExtractiveQAPipeline(reader, retriever) query = "What is the capital of France?" result = extractive_qa_pipeline.run( query=query, params={ "Retriever": {"top_k": 10}, "Reader": {"top_k": 5} } ) ``` ```python from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack import Document, Pipeline from haystack.components.retrievers.in_memory import InMemoryBM25Retriever from haystack.components.readers import ExtractiveReader document_store = InMemoryDocumentStore() document_store.write_documents([ Document(content="Paris is the capital of France."), Document(content="Berlin is the capital of Germany."), Document(content="Rome is the capital of Italy."), Document(content="Madrid is the capital of Spain."), ]) retriever = InMemoryBM25Retriever(document_store) reader = ExtractiveReader(model="deepset/roberta-base-squad2") extractive_qa_pipeline = Pipeline() extractive_qa_pipeline.add_component("retriever", retriever) extractive_qa_pipeline.add_component("reader", reader) extractive_qa_pipeline.connect("retriever", "reader") query = "What is the capital of France?" result = extractive_qa_pipeline.run(data={ "retriever": {"query": query, "top_k": 3}, "reader": {"query": query, "top_k": 2} }) ``` -------------------------------- ### Haystack 2.x RAG Pipeline Setup Source: https://docs.haystack.deepset.ai/docs/migration This snippet illustrates setting up a RAG pipeline in Haystack 2.x. It initializes an InMemoryDocumentStore, uses SentenceTransformers embedders for documents and queries, and configures an InMemoryEmbeddingRetriever. A PromptBuilder and OpenAIGenerator are used for text generation, with components connected to form the pipeline. ```python from datasets import load_dataset from haystack import Document, Pipeline from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.builders import PromptBuilder from haystack.components.generators import OpenAIGenerator from haystack.components.embedders import SentenceTransformersDocumentEmbedder from haystack.components.embedders import SentenceTransformersTextEmbedder from haystack.components.retrievers import InMemoryEmbeddingRetriever document_store = InMemoryDocumentStore() dataset = load_dataset("bilgeyucel/seven-wonders", split="train") embedder = SentenceTransformersDocumentEmbedder("sentence-transformers/all-MiniLM-L6-v2") embedder.warm_up() output = embedder.run([Document(**ds) for ds in dataset]) document_store.write_documents(output.get("documents")) template = """ Given the following information, answer the question. Context: {% for document in documents %} {{ document.content }} {% endfor %} Question: {{question}} Answer: """ prompt_builder = PromptBuilder(template=template) retriever = InMemoryEmbeddingRetriever(document_store=document_store, top_k=2) generator = OpenAIGenerator(model="gpt-3.5-turbo") query_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") basic_rag_pipeline = Pipeline() basic_rag_pipeline.add_component("text_embedder", query_embedder) basic_rag_pipeline.add_component("retriever", retriever) basic_rag_pipeline.add_component("prompt_builder", prompt_builder) basic_rag_pipeline.add_component("llm", generator) basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") basic_rag_pipeline.connect("retriever", "prompt_builder.documents") basic_rag_pipeline.connect("prompt_builder", "llm") query = "What does Rhodes Statue look like?" output = basic_rag_pipeline.run({"text_embedder": {"text": query}, "prompt_builder": {"question": query}}) ``` -------------------------------- ### ContextRelevanceEvaluator Few-Shot Examples Format Source: https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator Defines the expected format for few-shot examples used with the ContextRelevanceEvaluator. Each example includes 'inputs' (questions and contexts) and 'outputs' (statements and their scores), which are used to guide the LLM's evaluation process. ```python [ { "inputs": { "questions": "What is the capital of Italy?", "contexts": ["Rome is the capital of Italy."], }, "outputs": { "statements": ["Rome is the capital of Italy.", "Rome has more than 4 million inhabitants."], "statement_scores": [1, 0], }, }, ] ``` -------------------------------- ### QdrantHybridRetriever Usage Example Source: https://docs.haystack.deepset.ai/docs/qdranthybridretriever Demonstrates how to initialize and use the QdrantHybridRetriever. It involves setting up an in-memory QdrantDocumentStore with sparse embedding support, writing a document with both dense and sparse embeddings, and then running the retriever with query embeddings. ```python from haystack_integrations.components.retrievers.qdrant import QdrantHybridRetriever from haystack_integrations.document_stores.qdrant import QdrantDocumentStore from haystack.dataclasses import Document, SparseEmbedding document_store = QdrantDocumentStore( ":memory:", use_sparse_embeddings=True, recreate_index=True, return_embedding=True, wait_result_from_api=True, ) doc = Document(content="test", embedding=[0.5]*768, sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12])) document_store.write_documents([doc]) retriever = QdrantHybridRetriever(document_store=document_store) embedding = [0.1]*768 sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33]) retriever.run(query_embedding=embedding, query_sparse_embedding=sparse_embedding) ``` -------------------------------- ### Define Haystack Pipeline Wrapper for MCP Source: https://docs.haystack.deepset.ai/docs/hayhooks Provides a Python example of creating a `PipelineWrapper` class to expose a Haystack pipeline as an MCP Tool. The wrapper requires `name`, `description`, and `inputSchema`. Hayhooks uses these properties and the `run_api` method's docstring and arguments to define the MCP Tool. The `setup` method loads the pipeline, and `run_api` executes it. ```python from pathlib import Path from typing import List from haystack import Pipeline from hayhooks import BasePipelineWrapper class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: pipeline_yaml = (Path(__file__).parent / "chat_with_website.yml").read_text() self.pipeline = Pipeline.loads(pipeline_yaml) def run_api(self, urls: List[str], question: str) -> str: """ Ask a question about one or more websites using a Haystack pipeline. """ result = self.pipeline.run({"fetcher": {"urls": urls}, "prompt": {"query": question}}) return result["llm"]["replies"][0] ``` -------------------------------- ### Install GitHub Integration Source: https://docs.haystack.deepset.ai/docs/ready-made-tools Installs the necessary package for the GitHub integration. This is a prerequisite for using the GitHubFileEditorTool. ```shell pip install github-haystack ``` -------------------------------- ### Initialize LlamaCppChatGenerator with Generation Parameters Source: https://docs.haystack.deepset.ai/docs/llamacppchatgenerator This example shows initializing the LlamaCppChatGenerator with default model parameters and specifying generation arguments like 'max_tokens' and 'temperature' during initialization. It then warms up the generator and runs it with a user message to get a response. ```python from haystack_integrations.components.generators.llama_cpp import LlamaCppChatGenerator from haystack.dataclasses import ChatMessage generator = LlamaCppChatGenerator( model="/content/openchat-3.5-1210.Q3_K_S.gguf", n_ctx=512, n_batch=128, generation_kwargs={"max_tokens": 128, "temperature": 0.1}, ) generator.warm_up() messages = [ChatMessage.from_user("Who is the best American actor?")] result = generator.run(messages) ``` -------------------------------- ### Install Custom Haystack Document Store from PyPI Source: https://docs.haystack.deepset.ai/docs/creating-custom-document-stores Installs a custom Document Store package from the Python Package Index (PyPI) using pip. This is the recommended method for distributing and installing stable versions. ```shell pip install example-haystack ``` -------------------------------- ### Pipeline Wrapper Setup and Run API Source: https://docs.haystack.deepset.ai/docs/hayhooks Defines a `PipelineWrapper` class that inherits from `BasePipelineWrapper`. The `setup` method loads a Haystack pipeline from a YAML file, and the `run_api` method processes input text using the loaded pipeline and returns the result. This wrapper is essential for exposing Haystack pipelines through Hayhooks. ```python from pathlib import Path from haystack import Pipeline from hayhooks import BasePipelineWrapper class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: pipeline_yaml = (Path(__file__).parent / "pipeline.yml").read_text() self.pipeline = Pipeline.loads(pipeline_yaml) def run_api(self, input_text: str) -> str: result = self.pipeline.run({"input": {"text": input_text}}) return result["output"]["text"] ``` -------------------------------- ### Install Optimum Haystack Integration Source: https://docs.haystack.deepset.ai/docs/optimumdocumentembedder Installs the necessary library to use the OptimumDocumentEmbedder with Haystack. This is a prerequisite for using the embedder. ```shell pip install optimum-haystack ``` -------------------------------- ### Install openrouter-haystack Integration Source: https://docs.haystack.deepset.ai/docs/openrouterchatgenerator Installs the necessary library for the OpenRouterChatGenerator integration. This is a prerequisite for using the component. ```shell pip install openrouter-haystack ``` -------------------------------- ### MongoDBAtlasEmbeddingRetriever Initialization and Run Example Source: https://docs.haystack.deepset.ai/docs/mongodbatlasembeddingretriever Demonstrates how to initialize the MongoDBAtlasEmbeddingRetriever with a MongoDBAtlasDocumentStore and perform a basic retrieval using a query embedding. It assumes documents have already been indexed into the store. ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever document_store = MongoDBAtlasDocumentStore() retriever = MongoDBAtlasEmbeddingRetriever(document_store=document_store) # example run query retriever.run(query_embedding=[0.1]*384) ``` -------------------------------- ### Install qdrant-haystack Package Source: https://docs.haystack.deepset.ai/docs/sentencetransformerssparsetextembedder Install the necessary package for Qdrant integration with Haystack. This command uses pip to download and install the qdrant-haystack library. ```shell pip install qdrant-haystack ``` -------------------------------- ### Install Custom Haystack Document Store from Git Source: https://docs.haystack.deepset.ai/docs/creating-custom-document-stores Installs a custom Document Store package directly from a Git repository using pip. This is useful for installing prototypes or unreleased versions. ```shell pip install git+https://github.com/your-org/example-haystack.git ``` -------------------------------- ### Install PyPDF Package Source: https://docs.haystack.deepset.ai/docs/pypdftodocument This command installs the necessary `pypdf` library required for the `PyPDFToDocument` component to function. Ensure you have pip installed. ```shell pip install pypdf ``` -------------------------------- ### Install Pinecone Haystack Integration Source: https://docs.haystack.deepset.ai/docs/pinecone-document-store Installs the necessary package for using Pinecone with Haystack. This is a prerequisite for initializing the PineconeDocumentStore. ```shell pip install pinecone-haystack ``` -------------------------------- ### Install Ollama Haystack Integration Source: https://docs.haystack.deepset.ai/docs/ollamatextembedder Install the necessary package to use the Ollama integration with Haystack. This command fetches and installs the 'ollama-haystack' library, making its components available for use. ```shell pip install ollama-haystack ``` -------------------------------- ### Install Cohere Haystack Package Source: https://docs.haystack.deepset.ai/docs/coherechatgenerator This command installs the necessary package to use the CohereChatGenerator within your Haystack project. Ensure you have pip installed. ```shell pip install cohere-haystack ``` -------------------------------- ### Install nvidia-haystack Package Source: https://docs.haystack.deepset.ai/docs/nvidiatextembedder Installs the necessary package for using Nvidia's integration with Haystack. ```shell pip install nvidia-haystack ``` -------------------------------- ### Install llama-cpp-haystack Source: https://docs.haystack.deepset.ai/docs/llamacppgenerator This command installs the llama-cpp-haystack package, which provides the LlamaCppGenerator for Haystack. ```bash pip install llama-cpp-haystack ``` -------------------------------- ### Install pgvector and pgvector-haystack Source: https://docs.haystack.deepset.ai/docs/pgvectorkeywordretriever Instructions for setting up a PostgreSQL database with pgvector using Docker and installing the necessary Python package for Haystack integration. ```shell docker run -d -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=postgres ankane/pgvector pip install pgvector-haystack ``` -------------------------------- ### Initialize and Run WeaviateHybridRetriever Source: https://docs.haystack.deepset.ai/docs/weaviatehybridretriever Demonstrates how to initialize the WeaviateHybridRetriever with a WeaviateDocumentStore and run a search query. It takes a query string and its embedding as input, returning a list of documents. ```python from haystack_integrations.document_stores.weaviate.document_store import WeaviateDocumentStore from haystack_integrations.components.retrievers.weaviate import WeaviateHybridRetriever document_store = WeaviateDocumentStore(url="http://localhost:8080") retriever = WeaviateHybridRetriever(document_store=document_store) # using a fake vector to keep the example simple retriever.run(query="How many languages are there?", query_embedding=[0.1]*768) ``` -------------------------------- ### Install MCP-Haystack Integration Source: https://docs.haystack.deepset.ai/docs/mcptool Installs the necessary package for using the MCP-Haystack integration. This is a prerequisite for utilizing the MCPTool. ```shell pip install mcp-haystack ``` -------------------------------- ### Initialize and Use AzureAISearchBM25Retriever Source: https://docs.haystack.deepset.ai/docs/azureaisearchbm25retriever Demonstrates how to initialize an AzureAISearchDocumentStore, write documents to it, and then set up and run the AzureAISearchBM25Retriever. This snippet shows a basic RAG pipeline setup. ```python from haystack import Document from haystack_integrations.components.retrievers.azure_ai_search import AzureAISearchBM25Retriever from haystack_integrations.document_stores.azure_ai_search import AzureAISearchDocumentStore document_store = AzureAISearchDocumentStore(index_name="haystack_docs") documents = [Document(content="There are over 7,000 languages spoken around the world today."), Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."), Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")] document_store.write_documents(documents=documents) retriever = AzureAISearchBM25Retriever(document_store=document_store) retriever.run(query="How many languages are spoken around the world today?") ``` -------------------------------- ### Install llama-stack-haystack Package Source: https://docs.haystack.deepset.ai/docs/googlegenaichatgenerator-copy This command installs the necessary package for using the LlamaStackChatGenerator integration with Haystack. Ensure you have pip installed. ```shell pip install llama-stack-haystack ``` -------------------------------- ### Install OpenTelemetry SDK and Exporter Source: https://docs.haystack.deepset.ai/docs/tracing Installs the necessary OpenTelemetry SDK and an OTLP exporter for sending traces. This is a prerequisite for configuring OpenTelemetry as a tracing backend. ```shell pip install opentelemetry-sdk pip install opentelemetry-exporter-otlp ``` -------------------------------- ### Install Chroma Integration for Haystack Source: https://docs.haystack.deepset.ai/docs/chromadocumentstore Installs the necessary Chroma integration package for Haystack. This command will also install Haystack and Chroma if they are not already present on your system. ```shell pip install chroma-haystack ``` -------------------------------- ### Setup OpenSearchDocumentStore and Hybrid Retriever in Python Source: https://docs.haystack.deepset.ai/docs/opensearchhybridretriever Shows the Python code to set up an OpenSearchDocumentStore, embed documents using SentenceTransformers, write them to the store, and initialize an OpenSearchHybridRetriever. Assumes an OpenSearch instance is running. ```python from haystack import Document from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore # Initialize the document store doc_store = OpenSearchDocumentStore( hosts=[""], index="document_store", embedding_dim=384, ) # Create some sample documents docs = [ Document(content="Machine learning is a subset of artificial intelligence."), Document(content="Deep learning is a subset of machine learning."), Document(content="Natural language processing is a field of AI."), Document(content="Reinforcement learning is a type of machine learning."), Document(content="Supervised learning is a type of machine learning."), ] # Embed the documents and add them to the document store doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") doc_embedder.warm_up() docs = doc_embedder.run(docs) doc_store.write_documents(docs['documents']) ``` -------------------------------- ### GitHub Issue Viewer Output Example Source: https://docs.haystack.deepset.ai/docs/githubissueviewer An example of the output structure generated by the GitHubIssueViewer component, showing 'documents' containing issue and comment data with associated metadata. ```bash { 'documents': [ Document(id=3989459bbd8c2a8420a9ba7f3cd3cf79bb41d78bd0738882e57d509e1293c67a, content='sentence-transformers = 0.2.6.1 haystack = latest farm = 0.4.3 latest branch In the call to Emb...', meta={'type': 'issue', 'title': 'SentenceTransformer no longer accepts 'gpu" as argument', 'number': 123, 'state': 'closed', 'created_at': '2020-05-28T04:49:31Z', 'updated_at': '2020-05-28T07:11:43Z', 'author': 'predoctech', 'url': 'https://github.com/deepset-ai/haystack/issues/123'}) Document(id=a8a56b9ad119244678804d5873b13da0784587773d8f839e07f644c4d02c167a, content='Thanks for reporting! Fixed with #124 ', meta={'type': 'comment', 'issue_number': 123, 'created_at': '2020-05-28T07:11:42Z', 'updated_at': '2020-05-28T07:11:42Z', 'author': 'tholor', 'url': 'https://github.com/deepset-ai/haystack/issues/123#issuecomment-635153940'}) ] } ``` -------------------------------- ### Connecting Components in a Haystack Pipeline Source: https://docs.haystack.deepset.ai/docs/creating-pipelines Illustrates various syntaxes for connecting components in a Haystack pipeline, from explicit input/output specification to simpler connections when components have single inputs/outputs. It also shows an example for a semantic document search pipeline. ```python # Explicit connection: output1 of component1 to input1 of component2 pipeline.connect("component1.output1", "component2.input1") # Simplified connection when components have single input/output pipeline.connect("component1", "component2") # Connecting a single output to a specific input of a multi-input component pipeline.connect("component1", "component2.input1") # Example for semantic search pipeline pipeline.connect("text_embedder.embedding", "retriever.query_embedding") # Simplified version if retriever has only one input pipeline.connect("text_embedder.embedding", "retriever") # Example connecting multiple components in sequence query_pipeline.connect("text_embedder.embedding", "retriever") query_pipeline.connect("retriever","prompt_builder.documents") query_pipeline.connect("prompt_builder", "llm") ``` -------------------------------- ### Initialize and Use Tool Instance Source: https://docs.haystack.deepset.ai/docs/tool Demonstrates how to initialize a `Tool` instance by providing a function, its name, description, and a JSON schema for its parameters. It also shows how to access the tool's specification and invoke the underlying function. ```python from haystack.tools import Tool def add(a: int, b: int) -> int: return a + b parameters = { "type": "object", "properties": { "a": {"type": "integer"}, "b": {"type": "integer"} }, "required": ["a", "b"] } add_tool = Tool(name="addition_tool", description="This tool adds two numbers", parameters=parameters, function=add) print(add_tool.tool_spec) print(add_tool.invoke(a=15, b=10)) ``` -------------------------------- ### Example Usage of StreamingChunk (Python) Source: https://docs.haystack.deepset.ai/docs/data-classes Demonstrates how to create instances of the StreamingChunk class for basic text content and for tool call scenarios. This includes setting the 'content', 'start', 'meta' attributes for a basic chunk, and 'tool_calls', 'index', 'start', and 'finish_reason' for a tool call chunk. ```python from haystack.dataclasses.streaming_chunk import StreamingChunk, ComponentInfo # Basic text chunk chunk = StreamingChunk( content="Hello world", start=True, meta={"model": "gpt-3.5-turbo"} ) # Tool call chunk tool_chunk = StreamingChunk( tool_calls=[ToolCallDelta(index=0, tool_name="calculator", arguments='{"operation": "add", "a": 2, "b": 3}')], index=0, start=False, finish_reason="tool_calls" ) ``` -------------------------------- ### Start Remote Chroma Server Source: https://docs.haystack.deepset.ai/docs/chromadocumentstore Commands to start a remote Chroma server, either directly using the Chroma CLI or via Docker. This server can then be connected to by the ChromaDocumentStore. ```shell chroma run --path /db_path ``` ```shell docker run -p 8000:8000 chromadb/chroma ``` -------------------------------- ### OpenSearchHybridRetriever Setup Source: https://docs.haystack.deepset.ai/docs/opensearchhybridretriever Demonstrates how to initialize the OpenSearchHybridRetriever with a document store and an embedder, including optional parameters for underlying retrievers. ```APIDOC ## OpenSearchHybridRetriever Initialization ### Description Initializes the `OpenSearchHybridRetriever` component. ### Method `__init__` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **document_store** (OpenSearchDocumentStore) - Required - An instance of `OpenSearchDocumentStore` to use for retrieval. - **embedder** (Embedder) - Required - Any [Embedder](doc:embedders) implementing the `TextEmbedder` protocol. - **bm25_retriever** (dict) - Optional - Parameters for the BM25 retriever. Example: `{"raise_on_failure": True}`. - **embedding_retriever** (dict) - Optional - Parameters for the embedding retriever. Example: `{"raise_on_failure": False}`. ### Request Example ```python from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore from haystack.components.embedders import SentenceTransformersTextEmbedder document_store = OpenSearchDocumentStore(hosts="", index="document_store", embedding_dim=384) embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") retriever = OpenSearchHybridRetriever( document_store=document_store, embedder=embedder, bm25_retriever={\"raise_on_failure\": True}, embedding_retriever={\"raise_on_failure\": False} ) ``` ### Response #### Success Response (200) Initializes the `OpenSearchHybridRetriever` object. #### Response Example ```json { "instance": "" } ``` ``` -------------------------------- ### STACKITChatGenerator Usage Example Source: https://docs.haystack.deepset.ai/docs/stackitchatgenerator Demonstrates how to initialize and use the STACKITChatGenerator component to get a chat completion. It requires specifying the model and providing a list of ChatMessage objects. ```python from haystack_integrations.components.generators.stackit import STACKITChatGenerator from haystack.dataclasses import ChatMessage generator = STACKITChatGenerator(model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8") result = generator.run([ChatMessage.from_user("Tell me a joke.")]) print(result) ``` -------------------------------- ### Install LlamaStack Chat Generator Source: https://docs.haystack.deepset.ai/docs/llamastackchatgenerator Install the necessary package to use the LlamaStackChatGenerator integration with Haystack. This command fetches and installs the llama-stack-haystack package from PyPI. ```shell pip install llama-stack-haystack ``` -------------------------------- ### Initialize and Run InMemoryBM25Retriever Source: https://docs.haystack.deepset.ai/docs/inmemorybm25retriever Demonstrates how to initialize an InMemoryDocumentStore, add documents, create an InMemoryBM25Retriever, and run a query against it. This snippet shows the basic standalone usage of the retriever. ```python from haystack import Document from haystack.components.retrievers.in_memory import InMemoryBM25Retriever from haystack.document_stores.in_memory import InMemoryDocumentStore document_store = InMemoryDocumentStore() documents = [Document(content="There are over 7,000 languages spoken around the world today."), Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."), Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")] document_store.write_documents(documents=documents) retriever = InMemoryBM25Retriever(document_store=document_store) retriever.run(query="How many languages are spoken around the world today?") ``` -------------------------------- ### Install Weaviate Haystack Integration Source: https://docs.haystack.deepset.ai/docs/weaviatedocumentstore Install the Weaviate Haystack integration using pip. This command fetches and installs the necessary package for using Weaviate with Haystack. ```shell pip install weaviate-haystack ``` -------------------------------- ### Embed Documents and Use in a Pipeline Source: https://docs.haystack.deepset.ai/docs/ollamatextembedder Shows how to embed a list of documents using OllamaDocumentEmbedder and then integrate them into a Haystack pipeline with an InMemoryDocumentStore and InMemoryEmbeddingRetriever. This example illustrates a complete RAG pipeline setup. ```python from haystack import Document from haystack import Pipeline from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder from haystack_integrations.components.embedders.ollama import OllamaDocumentEmbedder from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") documents = [Document(content="My name is Wolfgang and I live in Berlin"), Document(content="I saw a black horse running"), Document(content="Germany has many big cities")] document_embedder = OllamaDocumentEmbedder() documents_with_embeddings = document_embedder.run(documents)['documents'] document_store.write_documents(documents_with_embeddings) query_pipeline = Pipeline() query_pipeline.add_component("text_embedder", OllamaTextEmbedder()) query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store)) query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") query = "Who lives in Berlin?" result = query_pipeline.run({"text_embedder":{"text": query}}) print(result['retriever']['documents'][0]) ``` -------------------------------- ### Install Datadog Tracing Library Source: https://docs.haystack.deepset.ai/docs/tracing Installs the `ddtrace` library, which is required for integrating Datadog tracing with Haystack applications. ```shell pip install ddtrace ``` -------------------------------- ### Example Telemetry Event JSON Source: https://docs.haystack.deepset.ai/docs/telemetry This JSON object represents an exemplary telemetry event sent when tutorial 1 is executed. It includes anonymized user ID, execution environment details, and versions of installed libraries. ```json { "event": "tutorial 1 executed", "distinct_id": "9baab867-3bc8-438c-9974-a192c9d53cd1", "properties": { "os_family": "Darwin", "os_machine": "arm64", "os_version": "21.3.0", "haystack_version": "1.0.0", "python_version": "3.9.6", "torch_version": "1.9.0", "transformers_version": "4.13.0", "execution_env": "script", "n_gpu": 0 } } ``` -------------------------------- ### Install amazon-bedrock-haystack Package Source: https://docs.haystack.deepset.ai/docs/amazonbedrockdocumentimageembedder Installs the necessary package for using the Amazon Bedrock integration with Haystack. ```shell pip install amazon-bedrock-haystack ``` -------------------------------- ### Run AnthropicChatGenerator (Python) Source: https://docs.haystack.deepset.ai/docs/anthropicchatgenerator Shows a basic example of using the AnthropicChatGenerator component to get a chat completion. It initializes the generator and sends a single user message, printing the result. Requires the 'anthropic-haystack' package. ```python from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator from haystack.dataclasses import ChatMessage generator = AnthropicChatGenerator() message = ChatMessage.from_user("What's Natural Language Processing? Be brief.") print(generator.run([message])) ``` -------------------------------- ### Install jq package Source: https://docs.haystack.deepset.ai/docs/jsonconverter Installs the 'jq' package, which is a dependency for the JSONConverter. ```shell pip install jq ``` -------------------------------- ### Install MarkdownToDocument Dependencies Source: https://docs.haystack.deepset.ai/docs/markdowntodocument Installs the necessary packages for the MarkdownToDocument component. This includes markdown-it-py and mdit_plain. ```shell pip install markdown-it-py mdit_plain ``` -------------------------------- ### Initialize and Run ExtractiveReader (Python) Source: https://docs.haystack.deepset.ai/docs/extractivereader Demonstrates how to initialize the ExtractiveReader component and use it to run a query against a list of documents. The `warm_up()` method prepares the model, and the `run()` method processes the query and documents, returning potential answers. ```python from haystack import Document from haystack.components.readers import ExtractiveReader docs = [Document(content="Paris is the capital of France."), Document(content="Berlin is the capital of Germany.")] reader = ExtractiveReader() reader.warm_up() reader.run(query="What is the capital of France?", documents=docs, top_k=2) ``` -------------------------------- ### Run SentenceTransformersDocumentEmbedder and print embeddings Source: https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder Provides a basic usage example of the SentenceTransformersDocumentEmbedder. It initializes the embedder with default settings, warms it up, runs it on a sample document, and prints the resulting embedding vector. ```python from haystack import Document from haystack.components.embedders import SentenceTransformersDocumentEmbedder doc = Document(content="I love pizza!") doc_embedder = SentenceTransformersDocumentEmbedder() doc_embedder.warm_up() result = doc_embedder.run([doc]) print(result['documents'][0].embedding) # [-0.07804739475250244, 0.1498992145061493, ...] ``` -------------------------------- ### Install google-vertex-haystack Source: https://docs.haystack.deepset.ai/docs/vertexaiimagegenerator Installs the necessary package for using the VertexAIImageGenerator. This is a prerequisite for utilizing the component. ```shell pip install google-vertex-haystack ``` -------------------------------- ### Install azure-ai-search-haystack Source: https://docs.haystack.deepset.ai/docs/azureaisearchdocumentstore Installs the necessary Python package for the Azure AI Search Haystack integration. This is a prerequisite for using the AzureAISearchDocumentStore and its related components. ```bash pip install azure-ai-search-haystack ``` -------------------------------- ### Install Hayhooks using pip Source: https://docs.haystack.deepset.ai/docs/hayhooks This command installs the Hayhooks package, which includes both the server and client components necessary for deploying and managing Haystack pipelines as HTTP endpoints. ```shell pip install hayhooks ``` -------------------------------- ### Haystack Indexing Pipeline Migration (Python) Source: https://docs.haystack.deepset.ai/docs/migration Demonstrates the migration of an indexing pipeline from Haystack 1.x to Haystack 2.x. This includes setting up components for file type routing, text conversion, document cleaning, splitting, and writing to a document store. It shows the evolution of pipeline construction and component connection. ```python from haystack.document_stores import InMemoryDocumentStore from haystack.nodes.file_classifier import FileTypeClassifier from haystack.nodes.file_converter import TextConverter from haystack.nodes.preprocessor import PreProcessor from haystack.pipelines import Pipeline # Initialize a DocumentStore document_store = InMemoryDocumentStore() # Indexing Pipeline indexing_pipeline = Pipeline() # Makes sure the file is a TXT file (FileTypeClassifier node) classifier = FileTypeClassifier() indexing_pipeline.add_node(classifier, name="Classifier", inputs=["File"]) # Converts a file into text and performs basic cleaning (TextConverter node) text_converter = TextConverter(remove_numeric_tables=True) indexing_pipeline.add_node(text_converter, name="Text_converter", inputs=["Classifier.output_1"]) # Pre-processes the text by performing splits and adding metadata to the text (Preprocessor node) preprocessor = PreProcessor( clean_whitespace=True, clean_empty_lines=True, split_length=100, split_overlap=50, split_respect_sentence_boundary=True, ) indexing_pipeline.add_node(preprocessor, name="Preprocessor", inputs=["Text_converter"]) # - Writes the resulting documents into the document store indexing_pipeline.add_node(document_store, name="Document_Store", inputs=["Preprocessor"]) # Then we run it with the documents and their metadata as input result = indexing_pipeline.run(file_paths=file_paths, meta=files_metadata) ``` ```python from haystack import Pipeline from haystack.components.routers import FileTypeRouter from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack.components.converters import TextFileToDocument from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter from haystack.components.writers import DocumentWriter # Initialize a DocumentStore document_store = InMemoryDocumentStore() # Indexing Pipeline indexing_pipeline = Pipeline() # Makes sure the file is a TXT file (FileTypeRouter component) classifier = FileTypeRouter(mime_types=["text/plain"]) indexing_pipeline.add_component("file_type_router", classifier) # Converts a file into a Document (TextFileToDocument component) text_converter = TextFileToDocument() indexing_pipeline.add_component("text_converter", text_converter) # Performs basic cleaning (DocumentCleaner component) cleaner = DocumentCleaner( remove_empty_lines=True, remove_extra_whitespaces=True, ) indexing_pipeline.add_component("cleaner", cleaner) # Pre-processes the text by performing splits and adding metadata to the text (DocumentSplitter component) preprocessor = DocumentSplitter( split_by="passage", split_length=100, split_overlap=50 ) indexing_pipeline.add_component("preprocessor", preprocessor) # - Writes the resulting documents into the document store indexing_pipeline.add_component("writer", DocumentWriter(document_store)) # Connect all the components indexing_pipeline.connect("file_type_router.text/plain", "text_converter") indexing_pipeline.connect("text_converter", "cleaner") indexing_pipeline.connect("cleaner", "preprocessor") indexing_pipeline.connect("preprocessor", "writer") # Then we run it with the documents and their metadata as input result = indexing_pipeline.run({"file_type_router": {"sources": file_paths}}) ``` -------------------------------- ### Install sentence-transformers and Initialize Document Embedder Source: https://docs.haystack.deepset.ai/docs/llamacppchatgenerator Installs the sentence-transformers library using pip and initializes a SentenceTransformersDocumentEmbedder for generating document embeddings. This is a prerequisite for indexing documents. ```python pip install sentence-transformers doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") ``` -------------------------------- ### Install openapi-llm dependency Source: https://docs.haystack.deepset.ai/docs/openapiconnector Installs the necessary 'openapi-llm' dependency for the OpenAPIConnector. This is a prerequisite for using the component. ```shell pip install openapi-llm ``` -------------------------------- ### Install Dependencies for ComponentTool Source: https://docs.haystack.deepset.ai/docs/componenttool Installs the necessary Python packages, 'docstring-parser' and 'jsonschema', which are required for using the ComponentTool. ```shell pip install docstring-parser jsonschema ``` -------------------------------- ### Install llama-cpp-python with cuBLAS backend Source: https://docs.haystack.deepset.ai/docs/llamacppgenerator These commands demonstrate how to install llama-cpp-python with CUDA support (cuBLAS) for GPU acceleration, followed by the llama-cpp-haystack package. ```bash export GGML_CUDA=1 CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python pip install llama-cpp-haystack ``` -------------------------------- ### Install GitHub Integration for Haystack Source: https://docs.haystack.deepset.ai/docs/githubprcreatortool This command installs the necessary GitHub integration package for Haystack. It is a prerequisite for using the GitHubPRCreatorTool. ```shell pip install github-haystack ```