### Check LibreOffice Installation Source: https://github.com/hkuds/rag-anything/blob/main/README.md Run a check for LibreOffice installation using the office document test script. ```bash python examples/office_document_test.py --check-libreoffice --file dummy ``` -------------------------------- ### Install RAG-Anything from Source with uv Source: https://github.com/hkuds/rag-anything/blob/main/README.md Clone the repository, install uv, and then sync dependencies. Use '--extra' or '--all-extras' for optional features. Commands can be run directly with 'uv run'. ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` ```bash git clone https://github.com/HKUDS/RAG-Anything.git ``` ```bash cd RAG-Anything ``` ```bash uv sync ``` ```bash UV_HTTP_TIMEOUT=120 uv sync ``` ```bash uv run python examples/raganything_example.py --help ``` ```bash uv sync --extra image --extra text ``` ```bash uv sync --all-extras ``` -------------------------------- ### Run End-to-End Document Processing Example Source: https://github.com/hkuds/rag-anything/blob/main/README.md Execute the main example script for end-to-end document processing with MinerU parser. Requires an API key and a document path. ```bash python examples/raganything_example.py path/to/document.pdf --api-key YOUR_API_KEY --parser mineru ``` -------------------------------- ### Verify MinerU Installation Source: https://github.com/hkuds/rag-anything/blob/main/README.md Check the installation and configuration of MinerU. The first command verifies the command-line tool, and the second checks programmatically. ```bash mineru --version ``` ```python from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU installed properly' if rag.check_parser_installation() else '❌ MinerU installation issue') ``` -------------------------------- ### Install RAG-Anything from PyPI Source: https://github.com/hkuds/rag-anything/blob/main/README.md Install the RAG-Anything package using pip. Use the '[all]' extra for all optional features, or specify individual features like '[image]' or '[text]'. ```bash pip install raganything ``` ```bash pip install 'raganything[all]' ``` ```bash pip install 'raganything[image]' ``` ```bash pip install 'raganything[text]' ``` ```bash pip install 'raganything[image,text]' ``` -------------------------------- ### Run Direct Multimodal Content Processing Example Source: https://github.com/hkuds/rag-anything/blob/main/README.md Execute the script for direct multimodal content processing. Requires an API key. ```bash python examples/modalprocessors_example.py --api-key YOUR_API_KEY ``` -------------------------------- ### Check ReportLab Installation Source: https://github.com/hkuds/rag-anything/blob/main/README.md Run a check for ReportLab installation using the text format test script. ```bash python examples/text_format_test.py --check-reportlab --file dummy ``` -------------------------------- ### Comprehensive Context Configuration (Python) Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md An example of configuring RAGAnything for comprehensive context analysis. This setup uses a larger context window (`context_window=2`), page-based context mode (`context_mode="page"`), and allows up to 3000 context tokens. It includes both headers and captions and considers 'text', 'image', and 'table' content types. ```python config = RAGAnythingConfig( context_window=2, context_mode="page", max_context_tokens=3000, include_headers=True, include_captions=True, context_filter_content_types=["text", "image", "table"] ) ``` -------------------------------- ### Configure RAGAnything Pipeline Source: https://context7.com/hkuds/rag-anything/llms.txt Instantiate and configure the RAGAnything pipeline using RAGAnythingConfig. All configuration options can be set directly or will be read from environment variables by default. This example demonstrates setting storage paths, parser selection, modality toggles, batch processing options, context-aware settings, and path handling. ```python from raganything import RAGAnythingConfig config = RAGAnythingConfig( # Storage working_dir="./rag_storage", # WORKING_DIR env var parser_output_dir="./output", # OUTPUT_DIR env var # Parser selection parser="mineru", # PARSER env var: "mineru" | "docling" | "paddleocr" parse_method="auto", # PARSE_METHOD env var: "auto" | "ocr" | "txt" display_content_stats=True, # DISPLAY_CONTENT_STATS env var # Modality toggles enable_image_processing=True, # ENABLE_IMAGE_PROCESSING env var enable_table_processing=True, # ENABLE_TABLE_PROCESSING env var enable_equation_processing=True, # ENABLE_EQUATION_PROCESSING env var # Batch processing max_concurrent_files=1, # MAX_CONCURRENT_FILES env var recursive_folder_processing=True, # RECURSIVE_FOLDER_PROCESSING env var supported_file_extensions=[ ".pdf", ".docx", ".pptx", ".jpg", ".png", ".md"], # Context-aware multimodal analysis context_window=2, # CONTEXT_WINDOW env var (pages/chunks around item) context_mode="page", # CONTEXT_MODE env var: "page" | "chunk" max_context_tokens=3000, # MAX_CONTEXT_TOKENS env var include_headers=True, # INCLUDE_HEADERS env var include_captions=True, # INCLUDE_CAPTIONS env var context_filter_content_types=["text"],# CONTEXT_FILTER_CONTENT_TYPES env var content_format="minerU", # CONTENT_FORMAT env var # Path handling use_full_path=False, # USE_FULL_PATH env var: store basename vs. full path ) print(config.parser) # "mineru" print(config.context_window) # 2 ``` -------------------------------- ### Check PIL/Pillow Installation Source: https://github.com/hkuds/rag-anything/blob/main/README.md Run a check for PIL/Pillow installation using the image format test script. ```bash python examples/image_format_test.py --check-pillow --file dummy ``` -------------------------------- ### Install PaddleOCR Parser Extras Source: https://github.com/hkuds/rag-anything/blob/main/README.md Install the PaddleOCR parser extras for RAG-Anything using pip or uv. Note that paddlepaddle itself also needs to be installed separately. ```bash pip install -e ".[paddleocr]" # or uv sync --extra paddleocr ``` -------------------------------- ### Install RAG-Anything Batch Dependencies Source: https://github.com/hkuds/rag-anything/blob/main/docs/batch_processing.md Commands to install the core RAG-Anything package along with necessary dependencies for batch processing and OCR support. ```bash pip install raganything[all] pip install tqdm pip install raganything[paddleocr] ``` -------------------------------- ### Install Pandoc Backend Dependencies Source: https://github.com/hkuds/rag-anything/blob/main/docs/enhanced_markdown.md Installs Pandoc and wkhtmltopdf, which are optional dependencies for the Pandoc backend. Instructions are provided for Ubuntu/Debian, macOS using Homebrew, and Conda environments. ```bash # Ubuntu/Debian: sudo apt-get install pandoc wkhtmltopdf # macOS: brew install pandoc wkhtmltopdf # Or using conda: conda install -c conda-forge pandoc wkhtmltopdf ``` -------------------------------- ### Run Office Document Parsing Test Example Source: https://github.com/hkuds/rag-anything/blob/main/README.md Execute the test script for parsing office documents using the MinerU parser. This test does not require an API key. ```bash python examples/office_document_test.py --file path/to/document.docx ``` -------------------------------- ### MinerU Content Format Example (JSON) Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md An example illustrating the MinerU format for content sources, which is a list of dictionaries. Each dictionary represents a content item and can include types like 'text' or 'image', along with relevant metadata such as text content, page index, image paths, and captions. ```json [ { "type": "text", "text": "Document content here...", "text_level": 1, "page_idx": 0 }, { "type": "image", "img_path": "images/figure1.jpg", "image_caption": ["Figure 1: Architecture"], "image_footnote": [], "page_idx": 1 } ] ``` -------------------------------- ### Run Image Format Parsing Test Example Source: https://github.com/hkuds/rag-anything/blob/main/README.md Execute the test script for parsing image formats using the MinerU parser. This test does not require an API key. ```bash python examples/image_format_test.py --file path/to/image.bmp ``` -------------------------------- ### Install WeasyPrint System Dependencies Source: https://github.com/hkuds/rag-anything/blob/main/docs/enhanced_markdown.md Installs the necessary system-level dependencies for the WeasyPrint PDF generation backend on Ubuntu/Debian systems. This includes build tools, Python development headers, and Cairo/Pango libraries. ```bash sudo apt-get install -y build-essential python3-dev python3-pip \ python3-setuptools python3-wheel python3-cffi libcairo2 \ libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 \ libffi-dev shared-mime-info ``` -------------------------------- ### Install System Dependencies for PDF Conversion Source: https://github.com/hkuds/rag-anything/blob/main/docs/enhanced_markdown.md Provides shell commands to install necessary system-level dependencies for WeasyPrint and Pandoc on Ubuntu/Debian systems. ```bash # Ubuntu/Debian: Install system dependencies sudo apt-get update sudo apt-get install -y build-essential python3-dev libcairo2 \ libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 \ libffi-dev shared-mime-info # Then reinstall WeasyPrint pip install --force-reinstall weasyprint ``` ```bash # Check if Pandoc is installed pandoc --version # Install Pandoc (Ubuntu/Debian) sudo apt-get install pandoc wkhtmltopdf ``` -------------------------------- ### Run Text Format Parsing Test Example Source: https://github.com/hkuds/rag-anything/blob/main/README.md Execute the test script for parsing text formats using the MinerU parser. This test does not require an API key. ```bash python examples/text_format_test.py --file path/to/document.md ``` -------------------------------- ### Install RAG-Anything with Enhanced Markdown Dependencies Source: https://github.com/hkuds/rag-anything/blob/main/docs/enhanced_markdown.md Installs the RAG-Anything package with all optional dependencies, specifically including those required for enhanced markdown conversion like markdown, weasyprint, and pygments. ```bash pip install raganything[all] pip install markdown weasyprint pygments ``` -------------------------------- ### Configure BatchParser Settings Source: https://github.com/hkuds/rag-anything/blob/main/docs/batch_processing.md Examples for configuring the BatchParser to handle memory constraints, timeouts, and debugging requirements. ```python # Memory optimization batch_parser = BatchParser(max_workers=2) # Timeout adjustment batch_parser = BatchParser(timeout_per_file=600) # Skip installation check batch_parser = BatchParser(skip_installation_check=True) # Debug logging import logging logging.basicConfig(level=logging.DEBUG) batch_parser = BatchParser(parser_type="mineru", max_workers=2) ``` -------------------------------- ### Execute RAG-Anything Application Source: https://github.com/hkuds/rag-anything/blob/main/docs/offline_setup.md Command to execute the RAG-Anything example script once the offline cache is properly configured. ```bash uv run examples/raganything_example.py requirements.txt ``` -------------------------------- ### Command Line Batch Interface Source: https://github.com/hkuds/rag-anything/blob/main/docs/batch_processing.md Examples of using the CLI to trigger batch processing, specify parsers, and perform dry runs. ```bash python -m raganything.batch_parser examples/sample_docs/ --output ./output --workers 4 python -m raganything.batch_parser examples/sample_docs/ --parser paddleocr --method ocr python -m raganything.batch_parser examples/sample_docs/ --output ./output --dry-run ``` -------------------------------- ### Header Formatting Example (Markdown) Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md Illustrates how headers are formatted when the `include_headers=True` configuration option is enabled. Headers are presented using markdown-style prefixes to denote their level, such as '# Level 1 Header', '## Level 2 Header', and '### Level 3 Header'. ```markdown # Level 1 Header ## Level 2 Header ### Level 3 Header ``` -------------------------------- ### Process Directories and Filter Files Source: https://github.com/hkuds/rag-anything/blob/main/docs/batch_processing.md Practical examples of processing entire directories recursively and filtering file lists based on supported extensions. ```python from pathlib import Path # Process directory batch_parser = BatchParser(max_workers=4) directory_path = Path("./documents") result = batch_parser.process_batch(file_paths=[str(directory_path)], output_dir="./processed", recursive=True) # Filter files all_files = ["doc1.pdf", "image.png", "spreadsheet.xlsx"] supported_files = batch_parser.filter_supported_files(all_files) result = batch_parser.process_batch(file_paths=supported_files, output_dir="./output") ``` -------------------------------- ### Caption Integration Example (Text) Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md Shows the format for integrating image and table captions when `include_captions=True`. Captions are enclosed in square brackets, prefixed with 'Image:' or 'Table:', followed by the caption text. ```text [Image: Figure 1 caption text] [Table: Table 1 caption text] ``` -------------------------------- ### Chunk-Based Analysis Configuration (Python) Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md An example demonstrating configuration for chunk-based context analysis. This configuration uses a context window of 5 items (`context_window=5`), specifies chunk-based context mode (`context_mode="chunk"`), and sets a maximum of 2000 context tokens. It excludes headers and captions and filters context to only include 'text' content types. ```python config = RAGAnythingConfig( context_window=5, context_mode="chunk", max_context_tokens=2000, include_headers=False, include_captions=False, context_filter_content_types=["text"] ) ``` -------------------------------- ### Initialize RAGAnything with Custom LLM Functions Source: https://context7.com/hkuds/rag-anything/llms.txt Demonstrates initializing RAGAnything with custom language model and vision model functions, along with an embedding function. Supports passing arbitrary `lightrag_kwargs` for fine-tuning LightRAG behavior. ```python import asyncio from functools import partial from raganything import RAGAnything, RAGAnythingConfig from lightrag.llm.openai import openai_complete_if_cache, openai_embed from lightrag.utils import EmbeddingFunc API_KEY = "sk-..." BASE_URL = "https://api.openai.com/v1" def llm_func(prompt, system_prompt=None, history_messages=[], **kwargs): return openai_complete_if_cache( "gpt-4o-mini", prompt, system_prompt=system_prompt, history_messages=history_messages, api_key=API_KEY, base_url=BASE_URL, **kwargs, ) def vision_func(prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs): if messages: return openai_complete_if_cache( "gpt-4o", "", messages=messages, api_key=API_KEY, base_url=BASE_URL, **kwargs, ) if image_data: return openai_complete_if_cache( "gpt-4o", "", messages=[ {"role": "system", "content": system_prompt} if system_prompt else None, {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}, ]}, ], api_key=API_KEY, base_url=BASE_URL, **kwargs, ) return llm_func(prompt, system_prompt, history_messages, **kwargs) embed_func = EmbeddingFunc( embedding_dim=3072, max_token_size=8192, func=partial(openai_embed.func, model="text-embedding-3-large", api_key=API_KEY, base_url=BASE_URL), ) config = RAGAnythingConfig(working_dir="./rag_storage", parser="mineru") rag = RAGAnything( config=config, llm_model_func=llm_func, vision_model_func=vision_func, # enables VLM-enhanced query automatically embedding_func=embed_func, # Pass any LightRAG kwargs: lightrag_kwargs={"max_parallel_insert": 4, "top_k": 60}, ) # No need to call initialize — done lazily on first use. ``` -------------------------------- ### Initialize LightRAG with Environment Variables Source: https://github.com/hkuds/rag-anything/blob/main/docs/offline_setup.md Python implementation showing how to load environment variables before importing LightRAG to ensure the tiktoken cache path is correctly registered. ```python import os import sys from pathlib import Path from dotenv import load_dotenv # Add project root directory to Python path sys.path.insert(0, str(Path(__file__).parent.parent)) # Load environment variables FIRST - before any imports that use tiktoken load_dotenv(dotenv_path=".env", override=False) # Now import LightRAG from lightrag import LightRAG from lightrag.utils import logger ``` -------------------------------- ### Initialize and Process Documents with RAGAnything Source: https://github.com/hkuds/rag-anything/blob/main/docs/enhanced_markdown.md Demonstrates how to initialize the RAGAnything system with specific configuration settings and trigger a complete document processing workflow. ```python from raganything import RAGAnything # Initialize system rag = RAGAnything(config={ "working_dir": "./storage", "enable_image_processing": True }) # Process document await rag.process_document_complete("document.pdf") ``` -------------------------------- ### RAGAnything Initialization Source: https://context7.com/hkuds/rag-anything/llms.txt Demonstrates how to initialize the RAGAnything class with custom language model functions, vision model functions, and embedding functions. It also shows how to pass additional keyword arguments to the underlying LightRAG instance. ```APIDOC ## RAGAnything Initialization ### Description Initialize the RAGAnything class with custom language model functions, vision model functions, and embedding functions. Supports BYO model functions and arbitrary `lightrag_kwargs` forwarded to LightRAG. ### Parameters - **config** (RAGAnythingConfig) - Configuration for RAGAnything, including working directory and parser. - **llm_model_func** (callable) - A function to handle language model completions. - **vision_model_func** (callable, optional) - A function to handle vision model completions. Enables VLM-enhanced query automatically if provided. - **embedding_func** (EmbeddingFunc) - An instance of EmbeddingFunc for generating embeddings. - **lightrag_kwargs** (dict, optional) - Arbitrary keyword arguments to be passed to the underlying LightRAG instance. ### Example ```python import asyncio from functools import partial from raganything import RAGAnything, RAGAnythingConfig from lightrag.llm.openai import openai_complete_if_cache, openai_embed from lightrag.utils import EmbeddingFunc API_KEY = "sk-..." BASE_URL = "https://api.openai.com/v1" def llm_func(prompt, system_prompt=None, history_messages=[], **kwargs): return openai_complete_if_cache( "gpt-4o-mini", prompt, system_prompt=system_prompt, history_messages=history_messages, api_key=API_KEY, base_url=BASE_URL, **kwargs, ) def vision_func(prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs): if messages: return openai_complete_if_cache( "gpt-4o", "", messages=messages, api_key=API_KEY, base_url=BASE_URL, **kwargs, ) if image_data: return openai_complete_if_cache( "gpt-4o", "", messages=[ {"role": "system", "content": system_prompt} if system_prompt else None, {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}, ]}, ], api_key=API_KEY, base_url=BASE_URL, **kwargs, ) return llm_func(prompt, system_prompt, history_messages, **kwargs) embed_func = EmbeddingFunc( embedding_dim=3072, max_token_size=8192, func=partial(openai_embed.func, model="text-embedding-3-large", api_key=API_KEY, base_url=BASE_URL), ) config = RAGAnythingConfig(working_dir="./rag_storage", parser="mineru") rag = RAGAnything( config=config, llm_model_func=llm_func, vision_model_func=vision_func, # enables VLM-enhanced query automatically embedding_func=embed_func, # Pass any LightRAG kwargs: lightrag_kwargs={"max_parallel_insert": 4, "top_k": 60}, ) # No need to call initialize — done lazily on first use. ``` ``` -------------------------------- ### Load and Use Existing LightRAG Instance with RAGAnything Source: https://github.com/hkuds/rag-anything/blob/main/README.md This snippet shows how to load a previously saved LightRAG instance and initialize RAGAnything with it. It includes setting up API keys, checking for existing instances, defining custom model functions for text and vision, and then performing a query and processing a new document. ```python import asyncio from functools import partial from raganything import RAGAnything, RAGAnythingConfig from lightrag import LightRAG from lightrag.llm.openai import openai_complete_if_cache, openai_embed from lightrag.kg.shared_storage import initialize_pipeline_status from lightrag.utils import EmbeddingFunc import os async def load_existing_lightrag(): # Set up API configuration api_key = "your-api-key" base_url = "your-base-url" # Optional # First, create or load existing LightRAG instance lightrag_working_dir = "./existing_lightrag_storage" # Check if previous LightRAG instance exists if os.path.exists(lightrag_working_dir) and os.listdir(lightrag_working_dir): print("✅ Found existing LightRAG instance, loading...") else: print("❌ No existing LightRAG instance found, will create new one") # Create/load LightRAG instance with your configuration lightrag_instance = LightRAG( working_dir=lightrag_working_dir, llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache( "gpt-4o-mini", prompt, system_prompt=system_prompt, history_messages=history_messages, api_key=api_key, base_url=base_url, **kwargs, ), embedding_func=EmbeddingFunc( embedding_dim=3072, max_token_size=8192, func=partial( openai_embed.func, model="text-embedding-3-large", api_key=api_key, base_url=base_url, ), ) ) # Initialize storage (this will load existing data if available) await lightrag_instance.initialize_storages() await initialize_pipeline_status() # Define vision model function for image processing def vision_model_func( prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs ): # If messages format is provided (for multimodal VLM enhanced query), use it directly if messages: return openai_complete_if_cache( "gpt-4o", "", system_prompt=None, history_messages=[], messages=messages, api_key=api_key, base_url=base_url, **kwargs, ) # Traditional single image format elif image_data: return openai_complete_if_cache( "gpt-4o", "", system_prompt=None, history_messages=[], messages=[ {"role": "system", "content": system_prompt} if system_prompt else None, { "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image_data}" }, }, ], } if image_data else {"role": "user", "content": prompt}, ], api_key=api_key, base_url=base_url, **kwargs, ) # Pure text format else: return lightrag_instance.llm_model_func(prompt, system_prompt, history_messages, **kwargs) # Now use existing LightRAG instance to initialize RAGAnything rag = RAGAnything( lightrag=lightrag_instance, # Pass existing LightRAG instance vision_model_func=vision_model_func, # Note: working_dir, llm_model_func, embedding_func, etc. are inherited from lightrag_instance ) # Query existing knowledge base result = await rag.aquery( "What data has been processed in this LightRAG instance?", mode="hybrid" ) print("Query result:", result) # Add new multimodal document to existing LightRAG instance await rag.process_document_complete( file_path="path/to/new/multimodal_document.pdf", output_dir="./output" ) if __name__ == "__main__": asyncio.run(load_existing_lightrag()) ``` -------------------------------- ### End-to-End Document Processing with RAGAnything Source: https://github.com/hkuds/rag-anything/blob/main/README.md This script demonstrates the complete workflow of RAGAnything, from initialization to document processing and querying. It requires setting up API keys, configuring the RAGAnything instance with specific parsing and processing options, and then performing both text and multimodal queries. ```python import asyncio from functools import partial from raganything import RAGAnything, RAGAnythingConfig from lightrag.llm.openai import openai_complete_if_cache, openai_embed from lightrag.utils import EmbeddingFunc async def main(): # Set up API configuration api_key = "your-api-key" base_url = "your-base-url" # Optional # Create RAGAnything configuration config = RAGAnythingConfig( working_dir="./rag_storage", parser="mineru", # Parser selection: mineru, docling, or paddleocr parse_method="auto", # Parse method: auto, ocr, or txt enable_image_processing=True, enable_table_processing=True, enable_equation_processing=True, ) # Define LLM model function def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs): return openai_complete_if_cache( "gpt-4o-mini", prompt, system_prompt=system_prompt, history_messages=history_messages, api_key=api_key, base_url=base_url, **kwargs, ) # Define vision model function for image processing def vision_model_func( prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs ): # If messages format is provided (for multimodal VLM enhanced query), use it directly if messages: return openai_complete_if_cache( "gpt-4o", "", system_prompt=None, history_messages=[], messages=messages, api_key=api_key, base_url=base_url, **kwargs, ) # Traditional single image format elif image_data: return openai_complete_if_cache( "gpt-4o", "", system_prompt=None, history_messages=[], messages=[ {"role": "system", "content": system_prompt} if system_prompt else None, { "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image_data}" }, }, ], } if image_data else {"role": "user", "content": prompt}, ], api_key=api_key, base_url=base_url, **kwargs, ) # Pure text format else: return llm_model_func(prompt, system_prompt, history_messages, **kwargs) # Define embedding function embedding_func = EmbeddingFunc( embedding_dim=3072, max_token_size=8192, func=partial( openai_embed.func, model="text-embedding-3-large", api_key=api_key, base_url=base_url, ), ) # Initialize RAGAnything rag = RAGAnything( config=config, llm_model_func=llm_model_func, vision_model_func=vision_model_func, embedding_func=embedding_func, ) # Process a document await rag.process_document_complete( file_path="path/to/your/document.pdf", output_dir="./output", parse_method="auto" ) # Query the processed content # Pure text query - for basic knowledge base search text_result = await rag.aquery( "What are the main findings shown in the figures and tables?", mode="hybrid" ) print("Text query result:", text_result) # Multimodal query with specific multimodal content multimodal_result = await rag.aquery_with_multimodal( "Explain this formula and its relevance to the document content", multimodal_content=[{ "type": "equation", "latex": "P(d|q) = \frac{P(q|d) \cdot P(d)}{P(q)}", "equation_caption": "Document relevance probability" }], mode="hybrid" ) print("Multimodal query result:", multimodal_result) if __name__ == "__main__": asyncio.run(main()) ``` -------------------------------- ### Insert Content List and Query Source: https://github.com/hkuds/rag-anything/blob/main/README.md Demonstrates inserting a content list with text and table data, then querying the inserted content. Ensure absolute paths for image files if used. ```python import asyncio from rag.rag import RAG async def insert_content_list_example(): rag = RAG() # Insert a content list with text and table content_list = [ { "type": "text", "text": "This is the main research paper content.", "page_idx": 0 }, { "type": "table", "table_body": "| Section | Summary |\n|---------|---------|\n| Intro | Background |\n| Methods | Approach |", "table_caption": ["Research Sections"], "page_idx": 1 } ] await rag.insert_content_list( content_list=content_list, file_path="research_paper.pdf", split_by_character_only=False, # Optional text splitting mode doc_id=None, # Optional custom document ID (will be auto-generated if not provided) display_stats=True # Show content statistics ) # Query the inserted content result = await rag.aquery( "What are the key findings and performance metrics mentioned in the research?", mode="hybrid" ) print("Query result:", result) # You can also insert multiple content lists with different document IDs another_content_list = [ { "type": "text", "text": "This is content from another document.", "page_idx": 0 # Page number where this content appears }, { "type": "table", "table_body": "| Feature | Value |\n|---------|-------|\n| Speed | Fast |\n| Accuracy | High |", "table_caption": ["Feature Comparison"], "page_idx": 1 # Page number where this table appears } ] await rag.insert_content_list( content_list=another_content_list, file_path="another_document.pdf", doc_id="custom-doc-id-123" # Custom document ID ) if __name__ == "__main__": asyncio.run(insert_content_list_example()) ``` -------------------------------- ### Initialize and Use BatchParser Source: https://github.com/hkuds/rag-anything/blob/main/docs/batch_processing.md Demonstrates the core BatchParser interface, including initialization, filtering supported files, and executing batch processing tasks synchronously or asynchronously. ```python class BatchParser: def __init__(self, parser_type: str = "mineru", max_workers: int = 4, ...): """Initialize batch parser""" def get_supported_extensions(self) -> List[str]: """Get list of supported file extensions""" def filter_supported_files(self, file_paths: List[str], recursive: bool = True) -> List[str]: """Filter files to only supported types""" def process_batch(self, file_paths: List[str], output_dir: str, ...) -> BatchProcessingResult: """Process files in batch""" async def process_batch_async(self, file_paths: List[str], output_dir: str, ...) -> BatchProcessingResult: """Process files in batch asynchronously""" ``` -------------------------------- ### Check Available Backends and Select Specific Backend Source: https://github.com/hkuds/rag-anything/blob/main/docs/enhanced_markdown.md Demonstrates how to check the availability of different PDF conversion backends (WeasyPrint, Pandoc) using `get_backend_info()` and how to explicitly select a backend during the conversion process. ```python # Check available backends converter = EnhancedMarkdownConverter() backend_info = converter.get_backend_info() print("Available backends:") for backend, available in backend_info["available_backends"].items(): status = "✅" if available else "❌" print(f" {status} {backend}") print(f"Recommended backend: {backend_info['recommended_backend']}") # Use specific backend converter.convert_file_to_pdf( input_path="document.md", output_path="document.pdf", method="weasyprint" # or "pandoc", "pandoc_system", "auto" ) ``` -------------------------------- ### Execute Batch Processing Source: https://github.com/hkuds/rag-anything/blob/main/docs/batch_processing.md Demonstrates how to initialize the BatchParser and process a list of files or directories with configurable concurrency and progress tracking. ```python from raganything.batch_parser import BatchParser batch_parser = BatchParser( parser_type="mineru", max_workers=4, show_progress=True, timeout_per_file=300, skip_installation_check=False ) result = batch_parser.process_batch( file_paths=["doc1.pdf", "doc2.docx", "folder/"], output_dir="./batch_output", parse_method="auto", recursive=True ) print(result.summary()) ``` -------------------------------- ### Process Image, Table, and Equation Modalities Source: https://context7.com/hkuds/rag-anything/llms.txt Demonstrates processing of image, table, and equation content types using their respective modal processors. Ensure LightRAG and necessary functions are initialized. ```python async def direct_modal_processing(lightrag_instance: LightRAG): ctx_config = ContextConfig( context_window=1, context_mode="page", max_context_tokens=2000, include_headers=True, ) context_extractor = ContextExtractor(config=ctx_config, tokenizer=lightrag_instance.tokenizer) # --- Image --- img_processor = ImageModalProcessor( lightrag=lightrag_instance, modal_caption_func=vision_func, # must accept image_data= kwarg context_extractor=context_extractor, ) img_processor.set_content_source(full_content_list, content_format="minerU") caption, entity_info, _ = await img_processor.process_multimodal_content( modal_content={ "img_path": "/abs/path/to/figure.png", "image_caption": ["Figure 2: Ablation Study"], "image_footnote": [], "page_idx": 5, }, content_type="image", file_path="paper.pdf", entity_name="Ablation Study Figure", item_info={"page_idx": 5, "index": 12, "type": "image"}, ) print("Image entity:", entity_info["entity_name"]) # --- Table --- tbl_processor = TableModalProcessor( lightrag=lightrag_instance, modal_caption_func=llm_func, context_extractor=context_extractor, ) caption, entity_info, _ = await tbl_processor.process_multimodal_content( modal_content={ "table_body": "| Method | F1 |\n|--------|----|\n| Ours | 0.94 |", "table_caption": ["Table 2: Results"], "page_idx": 7, }, content_type="table", file_path="paper.pdf", ) print("Table entity:", entity_info["entity_name"]) # --- Equation --- eq_processor = EquationModalProcessor( lightrag=lightrag_instance, modal_caption_func=llm_func, ) caption, entity_info, _ = await eq_processor.process_multimodal_content( modal_content={"latex": r"\mathcal{L} = -\sum y \log \hat{y}", "page_idx": 3}, content_type="equation", file_path="paper.pdf", ) print("Equation entity:", entity_info["entity_name"]) asyncio.run(direct_modal_processing(lightrag_instance)) ``` -------------------------------- ### High-Precision Context Configuration (Python) Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md An example of configuring RAGAnything for high-precision context analysis. This configuration uses a small context window (`context_window=1`), page-based context mode (`context_mode="page"`), and limits context tokens to 1000. It includes headers but excludes captions and filters context to only include 'text' content types. ```python config = RAGAnythingConfig( context_window=1, context_mode="page", max_context_tokens=1000, include_headers=True, include_captions=False, context_filter_content_types=["text"] ) ``` -------------------------------- ### Multimodal Query with RAGAnything.aquery_with_multimodal() Source: https://context7.com/hkuds/rag-anything/llms.txt Enrich queries with specific multimodal content like images, tables, or equations. The content is analyzed and appended to the query before retrieval. Requires `asyncio`. ```python import asyncio async def multimodal_query(): # Query with an image result = await rag.aquery_with_multimodal( query="How does this architecture compare to baselines in the document?", multimodal_content=[ { "type": "image", "img_path": "/abs/path/to/architecture.png", "image_caption": ["Figure 3: Proposed Architecture"], } ], mode="hybrid", ) print("Image query result:", result) # Query with a table result = await rag.aquery_with_multimodal( query="Which method in the document achieves similar accuracy?", multimodal_content=[ { "type": "table", "table_data": "Method,Accuracy\nOurs,95.2%\nBaseline,87.3%", "table_caption": "Performance comparison", } ], mode="mix", ) print("Table query result:", result) # Query with a LaTeX equation result = await rag.aquery_with_multimodal( query="Where is this formula used in the document?", multimodal_content=[ { "type": "equation", "latex": r"P(d|q) = \frac{P(q|d) \cdot P(d)}{P(q)}", "equation_caption": "Bayesian relevance formula", } ], mode="hybrid", ) print("Equation query result:", result) asyncio.run(multimodal_query()) ``` -------------------------------- ### Directly Insert Pre-parsed Content List into RAGAnything Source: https://github.com/hkuds/rag-anything/blob/main/README.md Use this method when you have a content list already structured, such as from external parsers. Ensure image paths are absolute. The `file_path` serves as a reference for citations. ```python import asyncio from functools import partial from raganything import RAGAnything, RAGAnythingConfig from lightrag.llm.openai import openai_complete_if_cache, openai_embed from lightrag.utils import EmbeddingFunc async def insert_content_list_example(): # Set up API configuration api_key = "your-api-key" base_url = "your-base-url" # Optional # Create RAGAnything configuration config = RAGAnythingConfig( working_dir="./rag_storage", enable_image_processing=True, enable_table_processing=True, enable_equation_processing=True, ) # Define model functions def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs): return openai_complete_if_cache( "gpt-4o-mini", prompt, system_prompt=system_prompt, history_messages=history_messages, api_key=api_key, base_url=base_url, **kwargs, ) def vision_model_func(prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs): # If messages format is provided (for multimodal VLM enhanced query), use it directly if messages: return openai_complete_if_cache( "gpt-4o", "", system_prompt=None, history_messages=[], messages=messages, api_key=api_key, base_url=base_url, **kwargs, ) # Traditional single image format elif image_data: return openai_complete_if_cache( "gpt-4o", "", system_prompt=None, history_messages=[], messages=[ {"role": "system", "content": system_prompt} if system_prompt else None, { "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}} ], } if image_data else {"role": "user", "content": prompt}, ], api_key=api_key, base_url=base_url, **kwargs, ) # Pure text format else: return llm_model_func(prompt, system_prompt, history_messages, **kwargs) embedding_func = EmbeddingFunc( embedding_dim=3072, max_token_size=8192, func=partial( openai_embed.func, model="text-embedding-3-large", api_key=api_key, base_url=base_url, ), ) # Initialize RAGAnything rag = RAGAnything( config=config, llm_model_func=llm_model_func, vision_model_func=vision_model_func, embedding_func=embedding_func, ) # Example: Pre-parsed content list from external source content_list = [ { "type": "text", "text": "This is the introduction section of our research paper.", "page_idx": 0 # Page number where this content appears }, { "type": "image", "img_path": "/absolute/path/to/figure1.jpg", # IMPORTANT: Use absolute path "image_caption": ["Figure 1: System Architecture"], "image_footnote": ["Source: Authors' original design"], "page_idx": 1 # Page number where this image appears }, { "type": "table", "table_body": "| Method | Accuracy | F1-Score |\n|--------|----------|----------|\n| Ours | 95.2% | 0.94 |\n| Baseline | 87.3% | 0.85 |", "table_caption": ["Table 1: Performance Comparison"], "table_footnote": ["Results on test dataset"], "page_idx": 2 # Page number where this table appears }, { "type": "equation", "latex": "P(d|q) = \frac{P(q|d) \cdot P(d)}{P(q)}", "text": "Document relevance probability formula", "page_idx": 3 # Page number where this equation appears }, { "type": "text", "text": "In conclusion, our method demonstrates superior performance across all metrics.", "page_idx": 4 # Page number where this content appears } ] # Insert the content list directly await rag.insert_content_list( content_list=content_list, file_path="research_paper.pdf", # Reference file name for citation split_by_character=None, # Optional text splitting ) ``` -------------------------------- ### Configure RAGAnything Context Settings Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md Demonstrates how to define context extraction parameters using Python configuration objects and environment variables. These settings control window size, token limits, and content filtering. ```python context_window: int = 1 context_mode: str = "page" max_context_tokens: int = 2000 include_headers: bool = True include_captions: bool = True context_filter_content_types: List[str] = ["text"] content_format: str = "minerU" ``` ```bash CONTEXT_WINDOW=2 CONTEXT_MODE=page MAX_CONTEXT_TOKENS=3000 INCLUDE_HEADERS=true INCLUDE_CAPTIONS=true CONTEXT_FILTER_CONTENT_TYPES=text,image CONTENT_FORMAT=minerU ``` -------------------------------- ### Initialize and Use RAGAnything with Context Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md Shows the instantiation of RAGAnything with a custom configuration and the execution of document processing. It also covers runtime updates to context settings and manual content source configuration. ```python from raganything import RAGAnything, RAGAnythingConfig config = RAGAnythingConfig( context_window=2, context_mode="page", max_context_tokens=3000, include_headers=True, include_captions=True, context_filter_content_types=["text", "image"], content_format="minerU" ) rag_anything = RAGAnything( config=config, llm_model_func=your_llm_function, embedding_func=your_embedding_function ) # Automatic processing await rag_anything.process_document_complete("document.pdf") # Manual updates rag_anything.set_content_source_for_context(content_list, "minerU") rag_anything.update_context_config(context_window=1, max_context_tokens=1500, include_captions=False) ``` -------------------------------- ### Setting Content Source and Processing Multimodal Content (Python) Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md Demonstrates how to set a content source using a list of content items and then process multimodal content, such as an image, with associated metadata. This involves specifying the content type, file path, entity name, and item information for context-aware processing. ```python processor.set_content_source(content_list, "minerU") item_info = { "page_idx": 2, "index": 5, "type": "image" } result = await processor.process_multimodal_content( modal_content=image_data, content_type="image", file_path="document.pdf", entity_name="Architecture Diagram", item_info=item_info ) ``` -------------------------------- ### Direct Modal Processor Integration Source: https://github.com/hkuds/rag-anything/blob/main/docs/context_aware_processing.md Illustrates how to manually initialize a ContextExtractor and inject it into a modal processor for targeted multimodal analysis. ```python from raganything.modalprocessors import ( ContextExtractor, ContextConfig, ImageModalProcessor ) config = ContextConfig( context_window=1, context_mode="page", max_context_tokens=2000, include_headers=True, include_captions=True, filter_content_types=["text"] ) context_extractor = ContextExtractor(config) processor = ImageModalProcessor(lightrag, caption_func, context_extractor) ``` -------------------------------- ### Configure Environment Variables for RAG-Anything Source: https://github.com/hkuds/rag-anything/blob/main/README.md Set environment variables in a .env file for RAG-Anything configuration. Includes API keys, output directory, and parser settings. ```bash OPENAI_API_KEY=your_openai_api_key OPENAI_BASE_URL=your_base_url # Optional OUTPUT_DIR=./output # Default output directory for parsed documents PARSER=mineru # Parser selection: mineru, docling, or paddleocr PARSE_METHOD=auto # Parse method: auto, ocr, or txt ```