### Install Dependencies for Astra DB Quickstart Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/astradb.ipynb Installs the necessary Python packages, specifically `datasets` and `pypdf`, required for the Astra DB quickstart example. Ensure you have recent versions for optimal performance. ```python !pip install --quiet datasets pypdf ``` -------------------------------- ### Install RAGStack AI Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/llama-astra.ipynb Installs the ragstack-ai library using pip. This is the initial setup step for the project. ```python ! pip install ragstack-ai ``` -------------------------------- ### Retrieve Examples via Semantic Similarity Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/langchain_dynamic_fewshot_SQL.ipynb Demonstrates how to query the example selector to retrieve semantically relevant SQL examples for a specific user prompt. ```python some_examples = example_selector.select_examples( {"input": "Who are the students who have both cat and dog pets."} ) for example in some_examples: print(example) ``` -------------------------------- ### Install RAGStack and Datasets Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/langchain_dynamic_fewshot_SQL.ipynb Installs the RAGStack library and the datasets library, which are necessary for dynamic few-shot prompting and data handling. This is a prerequisite for running the subsequent code examples. ```python !pip install --quiet ragstack-ai datasets ``` -------------------------------- ### Setup RAGStack with Poetry Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ROOT/pages/dev-environment.adoc Commands to initialize a project, add dependencies, and install packages using Poetry for deterministic dependency management. ```console poetry init poetry install ``` -------------------------------- ### Install Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/mmr.adoc Installs the necessary Python packages for RAGStack and environment variable loading. This is a prerequisite for running the example. ```python pip install -qU ragstack-ai python-dotenv ``` -------------------------------- ### Configure FewShotPromptTemplate for SQL Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/langchain_dynamic_fewshot_SQL.ipynb Creates a FewShotPromptTemplate that incorporates the semantic example selector. This template formats the retrieved examples into a prompt suitable for SQL generation tasks. ```python from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate few_shot_prompt = FewShotPromptTemplate( example_selector=example_selector, example_prompt=PromptTemplate.from_template( "User input: {input}\nSQL query: {query}" ), prefix=SQL_PREFIX + "\n\nHere are some examples of user inputs and their corresponding SQL queries:", suffix="", input_variables=["input", "dialect", "top_k"], ) ``` -------------------------------- ### Install RagStack and Configure Environment Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/FLARE.ipynb Installs the necessary ragstack-ai package and prompts the user for Astra DB and OpenAI credentials to set up the environment variables. ```python ! pip install ragstack-ai import os from getpass import getpass os.environ["ASTRA_DB_API_ENDPOINT"] = input("Enter your Astra DB API Endpoint: ") os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("Enter your Astra DB Token: ") os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI Key: ") ``` -------------------------------- ### Install Dependencies with Pip Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/QA_with_cassio.ipynb Installs the necessary Python libraries, ragstack-ai and pypdf, for the project. The '-qU' flags ensure quiet installation and upgrade. ```python # install required dependencies ! pip install -qU ragstack-ai pypdf ``` -------------------------------- ### Implement Hybrid GraphStore Source: https://context7.com/datastax/ragstack-ai/llms.txt Provides an example of setting up a GraphStore that combines vector similarity search with graph traversal, including a custom wrapper for OpenAI embeddings. ```python from ragstack_knowledge_store import GraphStore class OpenAIEmbeddingModel: def __init__(self): self.embeddings = OpenAIEmbeddings() def embed_texts(self, texts): return self.embeddings.embed_documents(texts) def embed_query(self, text): return self.embeddings.embed_query(text) graph_store = GraphStore(embedding=OpenAIEmbeddingModel(), node_table="knowledge_nodes", setup_mode=SetupMode.SYNC, metadata_indexing="all") ``` -------------------------------- ### Traverse Knowledge Graph from a Starting Node Source: https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-graph/notebooks/notebook.ipynb Shows how to traverse the knowledge graph starting from a specific node using the `as_runnable` method of the `GraphStore`. This allows for exploring relationships and connected entities within the graph up to a specified number of steps. ```python from ragstack_knowledge_graph.traverse import Node # Assuming graph_store and graph_documents are already defined # The result shows relationships up to 2 steps away from 'Marie Curie' result = graph_store.as_runnable(steps=2).invoke(Node("Marie Curie", "Person")) print(result) ``` -------------------------------- ### Install Ragstack AI Knowledge Graph Library (Python) Source: https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-graph/notebooks/notebook.ipynb Installs the ragstack-ai-knowledge-graph library and its dependencies. This is a prerequisite for using the knowledge graph functionalities. ```python # (Required in Colab) Install the knowledge graph library from the repository. # This will also install the dependencies. %pip install ragstack-ai-knowledge-graph ``` -------------------------------- ### Install RAGStack and Datasets Library Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ROOT/pages/quickstart.adoc Installs the RAGStack AI library and the HuggingFace datasets library using pip. This is a prerequisite for using RAGStack functionalities. ```bash pip3 install ragstack-ai datasets ``` -------------------------------- ### Install ragstack-ai Package Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/langchain_evaluation.ipynb Installs the ragstack-ai package with quiet output. This is a prerequisite for using the library's functionalities. ```python ! pip install -q ragstack-ai ``` -------------------------------- ### Build RAGStack from source Source: https://github.com/datastax/ragstack-ai/blob/main/README.adoc Commands to clone the repository, install the Poetry dependency manager, resolve project dependencies, and build the package distribution locally. ```bash git clone https://github.com/datastax/ragstack-ai pip install poetry poetry install poetry build ``` -------------------------------- ### Load Sample Dataset Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/nvidia.ipynb Loads a sample dataset named 'philosopher-quotes' from the 'datastax' repository using the 'datasets' library. It then prints an example entry from the loaded dataset. ```python from datasets import load_dataset # Load a sample dataset philo_dataset = load_dataset("datastax/philosopher-quotes")["train"] print("An example entry:") print(philo_dataset[16]) ``` -------------------------------- ### Run Application Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ROOT/pages/migration.adoc Command to execute the migration script after setting up the environment. ```bash python3 llama-migration.py ``` -------------------------------- ### Initialize Environment and Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/rag-with-cassio.adoc Imports required modules and sets up environment variables for Astra DB and OpenAI connectivity. ```python import os from dotenv import load_dotenv from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider from cassandra.query import SimpleStatement from langchain_openai import OpenAIEmbeddings from langchain.vectorstores import Cassandra from langchain.indexes.vectorstore import VectorStoreIndexWrapper from langchain_community.document_loaders import TextLoader from langchain_community.document_loaders import PyPDFLoader from langchain.chat_models import ChatOpenAI ASTRA_DB_SECURE_BUNDLE_PATH = os.getenv("ASTRA_DB_SECURE_BUNDLE_PATH") ASTRA_DB_APPLICATION_TOKEN = os.getenv("ASTRA_DB_APPLICATION_TOKEN") ASTRA_DB_APPLICATION_TOKEN_BASED_USERNAME = "token" ASTRA_DB_KEYSPACE = os.getenv("ASTRA_DB_NAMESPACE") ASTRA_DB_TABLE_NAME = os.getenv("ASTRA_DB_COLLECTION") OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") ``` -------------------------------- ### Initialize LlamaIndex and Astra DB Pipeline Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/llama-parse-astra.adoc Imports necessary modules, configures global settings, downloads a sample PDF, and initializes the vector store for indexing. ```python import os import requests from dotenv import load_dotenv from llama_parse import LlamaParse from llama_index.vector_stores.astra_db import AstraDBVectorStore from llama_index.core.node_parser import SimpleNodeParser from llama_index.core import VectorStoreIndex, StorageContext, Settings from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding load_dotenv() Settings.llm = OpenAI(model="gpt-4", temperature=0.1) Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small", embed_batch_size=100) # Download PDF url = "https://arxiv.org/pdf/1706.03762.pdf" file_path = "./attention.pdf" response = requests.get(url, timeout=30) if response.status_code == 200: with open(file_path, "wb") as file: file.write(response.content) # Parse and Index documents = LlamaParse(result_type="text").load_data(file_path) astra_db_store = AstraDBVectorStore(token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"), api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"), collection_name="astra_v_table_llamaparse", embedding_dimension=1536) node_parser = SimpleNodeParser() nodes = node_parser.get_nodes_from_documents(documents) storage_context = StorageContext.from_defaults(vector_store=astra_db_store) index = VectorStoreIndex(nodes=nodes, storage_context=storage_context) query_engine = index.as_query_engine(similarity_top_k=15) ``` -------------------------------- ### Install Dependencies for Ragstack-AI Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/nvidia.ipynb Installs the necessary Python packages for building a RAG pipeline, including ragstack-ai, langchain-nvidia-ai-endpoints, and datasets. The '-qU' flags ensure quiet installation and upgrade. ```python ! pip install -qU ragstack-ai langchain-nvidia-ai-endpoints datasets ``` -------------------------------- ### Install RAGStack Packages Source: https://context7.com/datastax/ragstack-ai/llms.txt Installs RAGStack packages using pip for various use cases, including full installation, LangChain integration with ColBERT, ColBERT-only, and LlamaIndex integration. ```bash # Full RAGStack installation pip install ragstack-ai # LangChain integration with ColBERT support pip install "ragstack-ai-langchain[colbert]" # ColBERT-only installation pip install ragstack-ai-colbert # LlamaIndex integration pip install ragstack-ai-llamaindex ``` -------------------------------- ### Install RAGStack-ColBERT Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/RAGStackColBERT.ipynb Installs the RAGStack-ColBERT library, which is necessary for using ColBERT embeddings with RAGStack. ```python !pip install ragstack-ai-colbert ``` -------------------------------- ### Create Cassandra Vector Store with CassIO Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/astradb.ipynb Demonstrates how to create a vector store using the Cassandra class from CassIO. It requires an embedding object and specifies the table name. ```python from cassio.vector import Cassandra # Assuming 'embe' is a pre-defined embedding object vstore = Cassandra( embedding=embe, table_name="cassandra_vector_demo", session=None, keyspace=None ) ``` -------------------------------- ### Verify LangChain Version after RAGStack Installation Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ROOT/pages/migration.adoc Command to list installed Python packages. After installing `ragstack-ai`, this command helps verify that the `langchain` package version has been updated to the one provided by RAGStack (e.g., `0.0.349`). ```console Package Version ------------------- ------------ aiohttp 3.9.1 aiosignal 1.3.1 annotated-types 0.6.0 anyio 4.1.0 astrapy 0.6.2 attrs 23.1.0 backoff 2.2.1 beautifulsoup4 4.12.2 cassandra-driver 3.28.0 cassio 0.1.3 certifi 2023.11.17 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 dataclasses-json 0.6.3 datasets 2.15.0 Deprecated 1.2.14 dill 0.3.7 distro 1.8.0 emoji 2.9.0 filelock 3.13.1 filetype 1.2.0 frozenlist 1.4.0 fsspec 2023.10.0 geomet 0.2.1.post1 greenlet 3.0.2 h11 0.14.0 h2 4.1.0 hpack 4.0.0 httpcore 1.0.2 httpx 0.25.2 huggingface-hub 0.19.4 hyperframe 6.0.1 idna 3.6 joblib 1.3.2 jsonpatch 1.33 jsonpointer 2.4 langchain 0.0.349 langchain-community 0.0.1 langchain-core 0.0.13 langdetect 1.0.9 langsmith 0.0.69 llama-index 0.9.14 lxml 4.9.3 marshmallow 3.20.1 multidict 6.0.4 multiprocess 0.70.15 mypy-extensions 1.0.0 nest-asyncio 1.5.8 nltk 3.8.1 numpy 1.26.2 openai 1.3.8 packaging 23.2 ``` -------------------------------- ### Install RAGStack Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/quickstart.ipynb Installs the necessary Python packages for the RAGStack environment and dataset management. ```python ! pip install -q ragstack-ai datasets ``` -------------------------------- ### Install Dependencies for RAGStack Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/langchain-unstructured-astra.ipynb Installs the required ragstack-ai package to enable integration with Unstructured and AstraDB. ```python ! pip install --quiet ragstack-ai ``` -------------------------------- ### Construct RetrievalQAWithSourcesChain Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/qa-maximal-marginal-relevance.ipynb Demonstrates how to build a RetrievalQAWithSourcesChain to retrieve answers along with their source metadata using either similarity or MMR search strategies. ```python retriever = cassandra_vstore.as_retriever(search_type="mmr", search_kwargs={"k": 2}) chain = RetrievalQAWithSourcesChain.from_chain_type(llm, retriever=retriever) response = chain({chain.question_key: QUESTION}) print(f' ANSWER : {response["answer"].strip()}') print(f' SOURCES: {response["sources"].strip()}') ``` -------------------------------- ### Install RAGStack Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/llama-astra.adoc Installs the necessary Python packages for RAGStack and environment variable management. ```bash pip install ragstack-ai python-dotenv ``` -------------------------------- ### Execute Basic Index Queries Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/qa-maximal-marginal-relevance.ipynb Shows how to perform standard similarity queries and queries with source attribution using LangChain index abstractions. ```python # (implicitly) by similarity print(index.query(QUESTION, llm=llm)) # Query with sources response_sources = index.query_with_sources(QUESTION, llm=llm) print(f' ANSWER : {response_sources["answer"].strip()}') print(f' SOURCES: {response_sources["sources"].strip()}') ``` -------------------------------- ### Initialize SQLite Database with Spider Schema Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/langchain_dynamic_fewshot_SQL.ipynb Creates a local SQLite database using LangChain and populates it with the Spider 'Pets' schema and sample data. ```python from langchain_community.utilities import SQLDatabase db = SQLDatabase.from_uri("sqlite:///spider.db") PETS_SQL = """PRAGMA foreign_keys=OFF;\nBEGIN TRANSACTION;\nCREATE TABLE Student (StuID INTEGER PRIMARY KEY, LName VARCHAR(12), Fname VARCHAR(12), Age INTEGER, Sex VARCHAR(1), Major INTEGER, Advisor INTEGER, city_code VARCHAR(3));\nINSERT INTO Student VALUES(1001,'Smith','Linda',18,'F',600,1121,'BAL');\nCREATE TABLE Has_Pet (StuID INTEGER, PetID INTEGER, FOREIGN KEY(PetID) REFERENCES Pets(PetID), FOREIGN KEY(StuID) REFERENCES Student(StuID));\nCREATE TABLE Pets (PetID INTEGER PRIMARY KEY, PetType VARCHAR(20), pet_age INTEGER, weight REAL);\nCOMMIT;""" with db._engine.begin() as conn: conn.connection.executescript(PETS_SQL) ``` -------------------------------- ### Install RAGStack ColBERT Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/colbert.adoc Installs the required Python packages for using ColBERT with RAGStack. ```bash pip install ragstack-ai-colbert python-dotenv ``` -------------------------------- ### Initialize Product Catalog DataFrame Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/langchain_multimodal_gemini.adoc Creates a pandas DataFrame containing product metadata to be loaded into a vector store. ```python import pandas as pd d = {'name': ["Saucer", "Saucer Ceramic", "Milk Jug Assembly", "Handle Steam Wand Kit (New Version From 0735 PDC)", "Spout Juice Small (From 0637 to 1041 PDC)", "Cleaning Steam Wand", "Jug Frothing", "Spoon Tamping 50mm", "Collar Grouphead 50mm", "Filter 2 Cup Dual Wall 50mm", "Filter 1 Cup 50mm", "Water Tank Assembly", "Portafilter Assembly 50mm", "Milk Jug Assembly", "Filter 2 Cup 50mm" ]} ``` -------------------------------- ### Install Dependencies with Pip Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/RAG_with_cassio.ipynb Installs the necessary Python packages for ragstack-ai, datasets, and google-cloud-aiplatform. This is a prerequisite for running the notebook. ```python # install required dependencies ! pip install -qU ragstack-ai datasets google-cloud-aiplatform ``` -------------------------------- ### Query Engine Basic Query Example Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/llama-parse-astra.ipynb This Python code illustrates a basic query to the query engine. It shows how to define a query string and then execute it using the query engine. The output demonstrates how the system handles queries when context is insufficient. ```python # Query fails to be answered due to lack of context in Astra DB query = "What is the color of the sky?" response_1 = query_engine.query(query) print("***********New LlamaParse+ Basic Query Engine***********") print(response_1) ``` -------------------------------- ### Install Development Dependencies and Run Tests (Shell) Source: https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/README.md This snippet provides shell commands for setting up the development environment and running tests for the RAGStack project. It uses Poetry for dependency management. ```shell poetry install --with=dev # Run Tests poetry run pytest ``` -------------------------------- ### Install ragstack-ai dependency Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/llama-parse-astra.ipynb Installs the necessary ragstack-ai library using pip. This is a prerequisite for using the llama-parse functionality. ```python # First, install the required dependencies !pip install --quiet ragstack-ai ``` -------------------------------- ### Install RAGStack Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/qa-with-cassio.adoc Installs the necessary Python libraries for RAGStack, including OpenAI, PyPDF, and dotenv support. ```bash pip install "ragstack-ai" "openai" "pypdf" "python-dotenv" ``` -------------------------------- ### Initialize ColBERT and Astra DB Connection Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/colbert.adoc Imports necessary modules, loads environment variables, and initializes the CassandraDatabase, ColbertEmbeddingModel, and ColbertVectorStore. ```python import os import logging import nest_asyncio from dotenv import load_dotenv from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel, ColbertVectorStore load_dotenv() keyspace="default_keyspace" database_id=os.getenv("ASTRA_DB_ID") astra_token=os.getenv("ASTRA_DB_APPLICATION_TOKEN") database = CassandraDatabase.from_astra( astra_token=astra_token, database_id=database_id, keyspace=keyspace ) embedding_model = ColbertEmbeddingModel() vector_store = ColbertVectorStore( database = database, embedding_model = embedding_model, ) ``` -------------------------------- ### Query Vector Store Index Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/qa-with-cassio.adoc This example demonstrates querying the vector store index directly with a question. It utilizes the index to find and return relevant information based on the provided prompt. ```python prompt = "Who is Luchesi?" index.query(question=prompt) ``` -------------------------------- ### Install RAGStack Package Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ragstack-ts/pages/quickstart.adoc CLI commands to install the RAGStack package into your project using the preferred package manager. ```bash npx @datastax/ragstack-ai install --use-npm ``` ```bash npx @datastax/ragstack-ai install --use-yarn ``` -------------------------------- ### Install ColBERT packages for RAGStack Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/colbert/pages/index.adoc Commands to install the core ColBERT package or integrate it with LangChain and LlamaIndex frameworks. ```bash pip install ragstack-ai-colbert pip install "ragstack-ai-langchain[colbert]" pip install "ragstack-ai-llamaindex[colbert]" ``` -------------------------------- ### Install Dependencies for Multi-modal RAG Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/langchain_multimodal_gemini.adoc Installs the necessary Python packages for Google Cloud AI platform and RagStack-AI integration. ```python pip install google-cloud-aiplatform ragstack-ai --upgrade ``` -------------------------------- ### Initialize Astra DB Vector Store and Perform RAG Queries Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/partials/mmr-example.adoc This script demonstrates how to connect to Astra DB using LangChain, index text data with associated metadata, and execute retrieval-augmented generation (RAG) queries using both similarity search and Maximal Marginal Relevance (MMR) search strategies. ```python import os from dotenv import load_dotenv from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain from langchain_openai import OpenAI, OpenAIEmbeddings from langchain.indexes.vectorstore import VectorStoreIndexWrapper from langchain_astradb import AstraDBVectorStore load_dotenv() llm = OpenAI(temperature=0) myEmbedding = OpenAIEmbeddings() myAstraDBVStore = AstraDBVectorStore( embedding=myEmbedding, api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"], token=os.environ["ASTRA_DB_APPLICATION_TOKEN"], namespace=os.environ.get("ASTRA_DB_KEYSPACE"), collection_name="mmr_test", ) texts = ["The frogs and the toads were meeting in the night for a party under the moon.", "There was a party under the moon, that all toads, with the frogs, decided to throw that night.", "And the frogs and the toads said: \"Let us have a party tonight, as the moon is shining\".", "I remember that night... toads, along with frogs, were all busy planning a moonlit celebration.", "For the party, frogs and toads set a rule: everyone was to wear a purple hat."] metadatas = [{"source": "Barney's story"}, {"source": "Barney's story"}, {"source": "Barney's story"}, {"source": "Barney's story"}, {"source": "The chronicles"}] ids = myAstraDBVStore.add_texts(texts, metadatas=metadatas) retrieverMMR = myAstraDBVStore.as_retriever(search_type="mmr", search_kwargs={"k": 2}) chainMMRSrc = RetrievalQAWithSourcesChain.from_chain_type(llm, retriever=retrieverMMR) response = chainMMRSrc.invoke({chainMMRSrc.question_key: "Tell me about the party that night."}) print(response) ``` -------------------------------- ### Install RAGStack Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/dse-69.adoc Installs the required Python libraries including ragstack-ai-langchain, python-dotenv, and langchainhub to enable RAG functionality. ```bash pip install ragstack-ai-langchain python-dotenv langchainhub ``` -------------------------------- ### Configure Environment Variables Source: https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/notebooks/pdf_keybert.ipynb Sets up necessary credentials for OpenAI and Astra DB. Users can choose between interactive input via getpass or loading from a .env file. ```python import getpass import os os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter OpenAI API Key: ") os.environ["ASTRA_DB_DATABASE_ID"] = input("Enter Astra DB Database ID: ") os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass.getpass("Enter Astra DB Application Token: ") keyspace = input("Enter Astra DB Keyspace (Empty for default): ") if keyspace: os.environ["ASTRA_DB_KEYSPACE"] = keyspace else: os.environ.pop("ASTRA_DB_KEYSPACE", None) ``` ```python import dotenv dotenv.load_dotenv() ``` -------------------------------- ### Install RAGStack CLI Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ragstack-ts/pages/migration.adoc Installs the RAGStack CLI into your project. This command modifies your package.json, installs the core @datastax/ragstack-ai package, and refreshes local dependencies. It automatically detects your package manager (npm or yarn) but can be forced using --use-npm or --use-yarn flags. ```bash npx @datastax/ragstack-ai-cli install ``` -------------------------------- ### Initialize CassIO with Astra DB Credentials Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/QA_with_cassio.ipynb This snippet initializes the CassIO library with the necessary Astra DB credentials, which are expected to be set as environment variables. ```python import os import cassio cassio.init( database_id=os.environ["ASTRA_DB_ID"], token=os.environ["ASTRA_DB_APPLICATION_TOKEN"], ) ``` -------------------------------- ### Install RAG Pipeline Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/nvidia_embeddings.adoc Installs the necessary Python packages including ragstack-ai, langchain-nvidia-ai-endpoints, and datasets to enable RAG functionality. ```bash pip install -qU ragstack-ai langchain-nvidia-ai-endpoints datasets ``` -------------------------------- ### Install RAGStack Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/rag-with-cassio.adoc Installs the necessary Python libraries including ragstack-ai, openai, and langchain components required for the RAG pipeline. ```bash pip install "ragstack-ai" "openai" "pypdf" "python-dotenv" "datasets" "pandas" "google-cloud-aiplatform" ``` -------------------------------- ### Invoke RAG Pipeline Queries Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/langchain-unstructured-astra.ipynb Demonstrates how to invoke the constructed RAG pipeline with different types of questions. Includes examples for querying specific information, retrieving table data, and testing the LLM's ability to identify when it lacks context. ```python # Query for specific text information chain.invoke("What does reducing the attention key size do?") # Query for a value from a table chain.invoke( "For the transformer to English constituency results, " "what was the 'WSJ 23 F1' value for 'Dyer et al. (2016) (5]'?" ) # Query that should fail due to lack of context # Query fails to be answered due to lack of context in Astra DB chain.invoke("When was George Washington born?") ``` -------------------------------- ### Start TruLens Dashboard Source: https://github.com/datastax/ragstack-ai/blob/main/examples/evaluation/langchain_trulens_full.ipynb Initializes and runs the TruLens dashboard UI. Note that it may require a second attempt to start successfully. ```python tru.run_dashboard() ``` -------------------------------- ### Initialize RAGStack Project Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ragstack-ts/pages/quickstart.adoc Commands to initialize a new project using either NPM or Yarn. ```bash npm init ``` ```console yarn init ``` -------------------------------- ### Initialize Cassandra Graph Store Source: https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/notebooks/astra_support.ipynb Initializes the Cassio library and the CassandraGraphStore. Provides an optional utility to clear existing tables before setting up the store. ```python SITE_PREFIX = "astra_docs" answer = input("Drop Tables? [(Y)es/(N)o]") if answer.lower() in ["y", "yes"]: import cassio cassio.init(auto=True) from cassio.config import check_resolve_keyspace, check_resolve_session session = check_resolve_session() keyspace = check_resolve_keyspace() session.execute(f"DROP TABLE IF EXISTS {keyspace}.{SITE_PREFIX}_nodes") session.execute(f"DROP TABLE IF EXISTS {keyspace}.{SITE_PREFIX}_targets") import cassio from langchain_openai import OpenAIEmbeddings from ragstack_langchain.graph_store import CassandraGraphStore cassio.init(auto=True) embeddings = OpenAIEmbeddings() graph_store = CassandraGraphStore( embeddings, node_table=f"{SITE_PREFIX}_nodes", ) ``` -------------------------------- ### Load Llama Dataset and Documents Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/llama-astra.ipynb Downloads a sample dataset from Llama Hub and loads documents from a local directory into memory. It also prints basic information about the loaded documents. ```python from llama_index.core.llama_dataset import download_llama_dataset !mkdir -p 'data' dataset = download_llama_dataset("PaulGrahamEssayDataset", "./data") ``` ```python from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex from llama_index.vector_stores.astra_db import AstraDBVectorStore documents = SimpleDirectoryReader("./data/source_files").load_data() print(f"Total documents: {len(documents)}") print(f"First document, id: {documents[0].doc_id}") print(f"First document, hash: {documents[0].hash}") print( "First document, text" f" ({len(documents[0].text)} characters):\n" f"{ '=' * 20}\n" f"{documents[0].text[:360]} ..." ) ``` -------------------------------- ### Install RAGStack and NeMo Guardrails dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/nemo_guardrails.ipynb Installs the necessary Python packages including ragstack-ai, asyncio, and nemoguardrails to enable guardrail functionality. ```python ! pip install -qU ragstack-ai asyncio nemoguardrails ``` -------------------------------- ### Initialize RAGStack Environment and Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/qa-with-cassio.adoc Imports required LangChain and Cassandra modules and initializes environment variables for database connectivity and OpenAI API access. ```python import os from dotenv import load_dotenv from langchain_openai import OpenAIEmbeddings from langchain.vectorstores import Cassandra from langchain_community.document_loaders import TextLoader from langchain_community.document_loaders import PyPDFLoader from langchain.chat_models import ChatOpenAI from langchain.indexes.vectorstore import VectorStoreIndexWrapper from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider ASTRA_DB_SECURE_BUNDLE_PATH = os.getenv("ASTRA_DB_SECURE_BUNDLE_PATH") ASTRA_DB_APPLICATION_TOKEN = os.getenv("ASTRA_DB_APPLICATION_TOKEN") ASTRA_DB_KEYSPACE = os.getenv("ASTRA_DB_NAMESPACE") ASTRA_DB_TABLE_NAME = os.getenv("ASTRA_DB_COLLECTION") OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") ``` -------------------------------- ### Install ragstack-ai-langchain and Dependencies Source: https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/notebooks/astra_support.ipynb Installs the ragstack-ai-langchain library with knowledge store capabilities and other necessary packages like beautifulsoup4, markdownify, and python-dotenv using pip. ```python %pip install -q \ ragstack-ai-langchain[knowledge-store]==1.3.0 \ beautifulsoup4 markdownify python-dotenv ``` -------------------------------- ### Execute RAG Queries Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/ROOT/pages/migration.adoc Demonstrates how to run queries using the execute_query function. It includes a basic query execution and an advanced version using MMR (Maximal Marginal Relevance) mode for better retrieval results. ```python query_string_1 = "Why did the author choose to work on AI?" execute_query(query_string_1) execute_query(query_string_1, mode="mmr", mmr_prefetch_factor=4) ``` -------------------------------- ### Install RAGStack AI and LangChain OpenAI Packages Source: https://github.com/datastax/ragstack-ai/blob/main/docs/modules/examples/pages/langchain-evaluation.adoc Installs the 'ragstack-ai' package for RAG pipeline building and the 'langchain[openai]' package which includes LangSmith integration. ```shell pip install ragstack-ai langchain[openai] ``` -------------------------------- ### Initialize Combined Multi-Query and Parent Document Retriever Source: https://github.com/datastax/ragstack-ai/blob/main/examples/notebooks/advancedRAG.ipynb This snippet demonstrates how to instantiate a MultiQueryRetriever using a parent document retriever and an LLM. It then sets up a retrieval chain using LangChain Expression Language (LCEL) to process user questions. ```python multi_parent_retriever = MultiQueryRetriever.from_llm( retriever=parent_retriever, llm=model ) multi_parent_chain = ( {"context": multi_parent_retriever, "question": RunnablePassthrough()} | prompt | model | StrOutputParser() ) ```