### Getting Started Options Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Choose between Data Wizard Cloud for an easy, hosted experience or self-hosting with a Docker Container for full control and integration capabilities. ```html The easiest way to use Data Wizard! A hosted, ready-to-use version that requires no installation. For full control and integration capabilities. Install and run Data Wizard locally or on your own infrastructure using Docker. Ideal for developers. ``` -------------------------------- ### Deploy Data Wizard with Docker Compose Source: https://github.com/capevace/data-wizard-docs/blob/main/quick-start.mdx Defines and deploys the Data Wizard service using Docker Compose, configuring ports, volumes, and environment variables for a complete setup. ```yaml version: '3.8'\ \ services:\ data-wizard:\ name: data-wizard\ image: mateffy/data-wizard:latest\ ports:\ - "9090:80"\ - "4430:443"\ - "4430:443/udp"\ volumes:\ - data_wizard_storage:/app/storage\ - data_wizard_sqlite_data:/app/database\ - data_wizard_caddy_data:/data\ - data_wizard_caddy_config:/config\ environment:\ - APP_KEY=base64:[REPLACE_WITH_KEY]\ \ volumes:\ data_wizard_storage:\ data_wizard_sqlite_data:\ data_wizard_caddy_data:\ data_wizard_caddy_config: ``` -------------------------------- ### Install Mintlify CLI Source: https://github.com/capevace/data-wizard-docs/blob/main/README.md Installs the Mintlify Command Line Interface globally using npm. This tool is essential for previewing documentation changes locally. ```bash npm i -g mintlify ``` -------------------------------- ### Run Data Wizard with Docker Source: https://github.com/capevace/data-wizard-docs/blob/main/quick-start.mdx Launches the Data Wizard Docker container, mapping necessary ports and volumes for persistent storage and setting the essential APP_KEY environment variable. ```bash docker run \ --name data-wizard \ -p 9090:80 \ -p 4430:443 \ -p 4430:443/udp \ -v data_wizard_storage:/app/storage \ -v data_wizard_sqlite_data:/app/database \ -v data_wizard_caddy_data:/data \ -v data_wizard_caddy_config:/config \ -e APP_KEY=base64:[REPLACE_WITH_KEY] \ mateffy/data-wizard:latest ``` -------------------------------- ### Start Local Development Server Source: https://github.com/capevace/data-wizard-docs/blob/main/README.md Starts the Mintlify local development server. This command should be run from the root directory of your documentation project, where the 'mint.json' file is located. ```bash mintlify dev ``` -------------------------------- ### Generate APP_KEY Source: https://github.com/capevace/data-wizard-docs/blob/main/quick-start.mdx Generates a random base64 encoded APP_KEY required for Data Wizard's security. Ensure the `-base64` flag is used. ```bash openssl rand -base64 32 ``` -------------------------------- ### More Snippet Example Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/real-estate-properties-from-exposes.mdx This snippet demonstrates the inclusion of external markdown content, likely for displaying additional information or examples. ```markdown import More from '/snippets/more.mdx' ``` -------------------------------- ### Product Data Extraction from Brochures Example Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Shows how to extract product names and prices from online brochures. This example is valuable for market research, competitor analysis, and catalog management. ```mdx import More from '/snippets/more.mdx'; ``` -------------------------------- ### Example Product Brochure Data Output Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/products-from-brochures.mdx An example of the JSON output generated by the product brochure extractor, showing extracted product names and prices. ```json [ { "name": "Bottle of red wine", "original_price": 14.99, "discounted_price": 9.99 }, { "name": "Bottle of white wine", "original_price": 12.99, "discounted_price": 6.99 } ] ``` -------------------------------- ### Real Estate Expose Data Extraction Example Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Provides an example of extracting structured data from real estate exposes. This includes property details, pricing information, and location data, useful for real estate data management. ```mdx import More from '/snippets/more.mdx'; ``` -------------------------------- ### More LLM Information Source: https://github.com/capevace/data-wizard-docs/blob/main/configure-llm.mdx This snippet likely includes additional details or examples related to choosing an LLM, possibly from an external markdown file. ```react import More from '/snippets/more.mdx'; ``` -------------------------------- ### Implement Strategy Logic Source: https://github.com/capevace/data-wizard-docs/blob/main/custom-strategies.mdx Provides examples of implementing custom extraction logic within the `run()` method of a custom strategy. The first example shows a basic implementation, while the second demonstrates validating invoice data by checking the total against the sum of line items. ```php use Mateffy\Magic\Extraction\Strategies\Extractor; class MyCustomStrategy extends Extractor { public function run(array $artifacts): array { // Implement your strategy here } } ``` ```php use Mateffy\Magic\Extraction\Strategies\Extractor; use Mateffy\Magic\Exceptions\JsonSchemaValidationError; class ValidatedInvoiceStrategy extends ParallelStrategy { public function run(array $artifacts): array { $data = parent::run($artifacts); // Validate an invoice total: $total = 0; foreach ($data['line_items'] as $item) { $total += $item['amount']; } if ($total !== $data['total']) { throw new \JsonSchemaValidationError( 'Invoice total does not match the sum of line items' ); } // The returned data is now // guaranteed to be valid invoice data. return $data; } } ``` -------------------------------- ### Learn how to extract some data Source: https://github.com/capevace/data-wizard-docs/blob/main/snippets/more.mdx Step by step guide to extract data from documents using Data Wizard. ```markdown Step by step guide to extract data from documents using Data Wizard. ``` -------------------------------- ### Example contents.json Structure Source: https://github.com/capevace/data-wizard-docs/blob/main/preprocessing.mdx Illustrates the structure of the `contents.json` file, detailing how document content is organized into Slices, including text, images, and page information with their respective properties. ```json [ { "page": 1, "type": "text", "text": "This is the text on the first page of the document. Lorem ipsum dolor sit amet..." }, { "page": 1, "type": "image", "mimetype": "image/jpeg", "path": "images/image1.jpg", "x": 455.0, "y": 28.55999755859375, "width": 88.32000732421875, "height": 688.3200073242188 }, { "page": 1, "type": "page-image", "mimetype": "image/jpeg", "path": "pages/page1.jpg" }, { "page": 1, "type": "page-image-marked", "mimetype": "image/jpeg", "path": "pages_marked/page1.jpg" } ] ``` -------------------------------- ### Start Extraction Source: https://github.com/capevace/data-wizard-docs/blob/main/extracting-data.mdx Initiate data extraction from a bucket or an extractor. View progress and final data in raw JSON or GUI format. ```markdown ## Run inside Data Wizard All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it's being generated, as well as the final extracted data. You can view the data both as raw JSON or in the GUI derived from the JSON schema.

Launch from Extractor

![Launch from Bucket](./images/screenshots/extractors/start.png) ![Launch from Bucket](./images/screenshots/buckets/start.png)
All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it's being generated, as well as the final extracted data. You can view the data both as raw JSON or in the GUI derived from the JSON schema. You can use the built-in UI to create and configure your extractor. ![View Data in GUI](./images/screenshots/run/run-gui.png) You can use the built-in UI to create and configure your extractor. ![View Data as JSON](./images/screenshots/run/run-json.png) You can inspect each step of the extraction process to see how the AI is interpreting your instructions and what data it is returning. You can use the built-in UI to create and configure your extractor. ![View Data in GUI](./images/screenshots/run/run-chat-1.png) You can use the built-in UI to create and configure your extractor. ![View Data as JSON](./images/screenshots/run/run-chat-2.png) asd asd asd asd
``` -------------------------------- ### Customer Feedback JSON Output Example Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/customer-feedback-to-data.mdx An example of the structured JSON data produced by the customer feedback extractor. This format captures key details like submission date, customer information, feedback text, rating, and suggestions. ```json { "formType": "Customer Feedback", "submissionDate": "2024-09-08", "customerName": "Jane Doe", "email": "jane.doe@example.com", "feedbackText": "The service was excellent, and the staff were very friendly. I especially appreciated the quick response time to my inquiry.", "rating": 5, "suggestions": "Perhaps offer more variety in your product catalog." } ``` -------------------------------- ### Customer Feedback to JSON Conversion Example Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Illustrates transforming customer feedback, whether handwritten or printed, into structured JSON format. This facilitates easier analysis and aids in service improvement initiatives. ```mdx import More from '/snippets/more.mdx'; ``` -------------------------------- ### More Snippet Source: https://github.com/capevace/data-wizard-docs/blob/main/extractors.mdx This snippet likely contains additional details or examples related to the documentation. ```markdown import More from '/snippets/more.mdx'; ``` -------------------------------- ### Tax Form Data Example Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/data-from-paper-tax-forms.mdx Example of structured JSON output for processed tax form data, including taxpayer information, financial figures, and due dates. ```json { "formType": "Tax Form 1040", "taxYear": 2023, "taxpayerID": "12-3456789", "filingStatus": "Single", "income": 75000.00, "deductions": 12000.00, "taxLiability": 15000.00, "paymentDueDate": "2024-04-15" } ``` -------------------------------- ### Run Data Wizard with Docker Source: https://github.com/capevace/data-wizard-docs/blob/main/deployment.mdx Starts the Data Wizard Docker container, mapping necessary ports and volumes for persistent storage and configuration. It includes options for HTTP/HTTPS access, data persistence, and setting the essential APP_KEY environment variable. ```bash docker run \ --name data-wizard \ -p 9090:80 \ -p 4430:443 \ -p 4430:443/udp \ -v data_wizard_storage:/app/storage \ -v data_wizard_sqlite_data:/app/database \ -v data_wizard_caddy_data:/data \ -v data_wizard_caddy_config:/config \ -e APP_KEY=[REPLACE_WITH_APP_KEY] \ mateffy/data-wizard:latest ``` -------------------------------- ### Example Real Estate Data Output Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/real-estate-properties-from-exposes.mdx This JSON structure represents the expected output after extracting data from a real estate exposé. It includes property details, unit information, and artifact IDs for images and floorplans. ```json { "name": "Modern Apartment in City Center", "address": "12 Example Street", "description_text": "Spacious apartment with modern amenities in a vibrant city center location.", "units": [ { "usages": [ "living" ], "label": "Apartment 1", "floor": "2nd Floor", "rent_per_m2": 15.50, "images": [ "artifact:images/image1.png", "artifact:images/image2.png" ], "floorplans": [ "artifact:images/image7.png" ] }, { "usages": [ "living" ], "label": "Apartment 2", "floor": "3rd Floor", "rent_per_m2": 16.00, "images": [], "floorplans": [] } ], "images": [ "artifact:images/image9.png" ], "floorplans": [] } ``` -------------------------------- ### Invoice Data Extraction Example Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Demonstrates extracting structured data from scanned invoices. This includes key information such as invoice numbers, dates, line items, and total amounts. It's useful for automating invoice processing. ```mdx import More from '/snippets/more.mdx'; ``` -------------------------------- ### Tax Forms Data Extraction Example Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Details the process of extracting structured data from paper tax forms. It covers personal information, income details, deductions, and credits, streamlining tax document processing. ```mdx import More from '/snippets/more.mdx'; ``` -------------------------------- ### Example JSON Schema for Product Extraction Source: https://github.com/capevace/data-wizard-docs/blob/main/extractors.mdx This JSON schema defines the structure for extracting product data from supermarket brochures. It includes properties for product name, original price, and discounted price, with validation rules and UI hints. ```APIDOC { "type": "object", "required": ["products"], "properties": { "products": { "type": "array", "magic_ui": "table", "items": { "type": "object", "required": ["name", "original_price"], "properties": { "name": { "type": "string", "maxLength": 255, "description": "The name of the product." }, "original_price": { "type": "number", "minimum": 0, "multipleOf": 0.01, "description": "The original price of the product." }, "discounted_price": { "type": ["number", "null"], "minimum": 0, "multipleOf": 0.01, "description": "The discounted price for all customers. Prices only applying to customers with a membership card should not be included here." } } } } } } ``` -------------------------------- ### Re-install Dependencies Source: https://github.com/capevace/data-wizard-docs/blob/main/README.md Re-installs project dependencies, often used to resolve issues when 'mintlify dev' is not running correctly. ```bash mintlify install ``` -------------------------------- ### Get Buckets API Source: https://github.com/capevace/data-wizard-docs/blob/main/endpoint/get.mdx Retrieves a list of buckets from the system. This endpoint is used to fetch all available buckets, which can be used for further operations like data extraction or management. ```APIDOC GET /api/buckets Description: Retrieves a list of all available buckets. Parameters: None Responses: 200 OK: Description: A list of buckets. Content: application/json: Schema: type: array items: type: object properties: id: type: string description: The unique identifier for the bucket. name: type: string description: The name of the bucket. createdAt: type: string format: date-time description: The timestamp when the bucket was created. 500 Internal Server Error: Description: An unexpected error occurred on the server. ``` -------------------------------- ### Data Wizard Extraction Workflow Source: https://github.com/capevace/data-wizard-docs/blob/main/extracting-data.mdx This snippet outlines the key steps involved in setting up and running an data extraction task using Data Wizard. It details the process from creating an extractor to running it within an application. ```markdown ## Prepare your extraction task Before we can extract some data, you'll need to tell the wizard what data you want to extract and how to extract it. You can just describe the shape of data you want to extract, and an AI will generate an initial draft for you. ![Create extractor](./images/screenshots/setup/quick-create-extractor.png)

Edit the generated schema to your liking and add other instructions for the AI to follow. Read more in the [Extractors](./extractors) section. ![Define JSON Schema](./images/screenshots/setup/edit-extractor.png) Extractors are the core configuration objects in Data Wizard

You can select from a large number of LLMs thanks to the [LLM Magic](https://github.com/Capevace/llm-magic) PHP package. You will need to add your API keys in the LLM settings before you can use them in an extractor. Find out more in the [LLM Provider Configuration](./configure-llm) section. ![Define JSON Schema](./images/screenshots/setup/select-model.png) Configure your Large Language Model (LLM) API provider in Data Wizard to connect to leading LLMs like OpenAI, Anthropic, Google AI, Mistral AI, and more.

There are multiple [built-in strategies](./strategies) to choose from, or you can create your own custom strategy. ![Define JSON Schema](./images/screenshots/setup/select-strategy.png) Learn about the built-in and custom extraction strategies available in Data Wizard.

After you have configured your extractor, you can run it to extract data from your documents. You can either use the built-in UI to do this, or you can integrate the feature into an existing application using the iFrame and HTTP API. Via Data Wizard's backend UI Via the embedded iFrame UI
``` -------------------------------- ### Data Extraction Workflow Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Illustrates the general workflow of Data Wizard, showing how input data, LLM configuration, and output format interact to produce extracted and validated JSON data. ```mermaid graph TB Input[Input data] -- PDF / Word files --> Extraction[Data Wizard] LLM[LLM Config] -- Prompt & Strategy --> Extraction Output[Output format] -- JSON Schema --> Extraction Extraction --> Results[Extracted and validated JSON data] ``` -------------------------------- ### LLM Provider Configuration Source: https://github.com/capevace/data-wizard-docs/blob/main/snippets/more.mdx Set up your Large Language Model API keys. ```markdown Set up your Large Language Model API keys. ``` -------------------------------- ### GraphQL API Endpoint and Example Query Source: https://github.com/capevace/data-wizard-docs/blob/main/apis.mdx Data Wizard exposes a GraphQL endpoint for flexible data querying. This section provides the endpoint URL and an example of a GraphQL query to retrieve saved extractors. ```graphql https://YOUR_DATA_WIZARD_URL/api/graphql query { savedExtractors { collection { id label } } } ``` -------------------------------- ### List All Extractions Source: https://github.com/capevace/data-wizard-docs/blob/main/endpoint/list.mdx Retrieves a list of all available extractions via the API. This endpoint is used to get an overview of all extraction jobs or configurations within the system. ```APIDOC GET /api/v1/extractions Description: Lists all extractions. Response: 200 OK: content: application/json: schema: type: array items: type: object properties: id: type: string description: Unique identifier for the extraction. name: type: string description: Name of the extraction. status: type: string description: Current status of the extraction (e.g., 'completed', 'running', 'failed'). createdAt: type: string format: date-time description: Timestamp when the extraction was created. updatedAt: type: string format: date-time description: Timestamp when the extraction was last updated. ``` -------------------------------- ### Strategies Source: https://github.com/capevace/data-wizard-docs/blob/main/snippets/more.mdx Understand different data processing strategies. ```markdown Understand different data processing strategies. ``` -------------------------------- ### Data Wizard Integration into Application Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Details how Data Wizard can be integrated into an application, showing the flow from a software UI embedding Data Wizard's UI, through file processing and LLM extraction, to final data delivery. ```mermaid graph TB A[Example software UI] -- Embeds an iFrame --> B[Data Wizard Embedded UI] B -- Upload files --> F[Data Wizard Core] F -- Extract text and images from files --> G[Artifacts] G --> H[Extraction Strategy] H -.-> I[LLM] I -.-> H H -- JSON data --> F F <-."Streaming results into\nautomatic UI".-> B B -- Final data via download,\nJavaScript API or webhook --> A ``` -------------------------------- ### Simple Invoice JSON Output Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/paper-invoice-to-structured-data.mdx Example JSON output from the invoice extractor, detailing invoice number, dates, seller and buyer information, line items with quantities and prices, total amounts, and payment details. ```json { "invoiceNumber": "INV-2022-001", "issueDate": "2022-01-01", "currency": "EUR", "seller": { "name": "ACME Inc.", "address": "123 Main St.", "postalCode": "12345", "city": "Springfield", "country": "US", "vatNumber": "US123456789" }, "buyer": { "customerNumber": "CUST-123", "name": "Buyer Corp.", "address": "456 Elm St.", "postalCode": "54321", "city": "Shelbyville", "country": "US" }, "lineItems": [ { "position": 1, "description": "Product A", "unitPrice": 100.0, "quantity": 2, "vatRate": 19.0, "netAmount": 200.0 }, { "position": 2, "description": "Product B", "unitPrice": 50.0, "quantity": 3, "vatRate": 19.0, "netAmount": 150.0 } ], "totalAmounts": { "netTotal": 350.0, "taxTotal": 66.5, "grossTotal": 416.5, "dueTotal": 416.5 }, "paymentDetails": { "paymentTerms": "Net 30 days", "paymentMethod": "SEPA_TRANSFER", "iban": "DE89370400440532013000" } } ``` -------------------------------- ### Invoice Data Structure Example Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/paper-invoice-to-structured-data.mdx This JSON snippet illustrates the expected structure for invoice data, including details about the seller, buyer, line items, and total amounts. It serves as a schema for data extraction and validation. ```json { "invoiceNumber": { "type": "string", "description": "Unique invoice identifier" }, "issueDate": { "type": "string", "description": "Date the invoice was issued" }, "currency": { "type": "string", "description": "Currency code (e.g., EUR, USD)" }, "seller": { "type": "object", "description": "Information about the seller", "properties": { "name": { "type": "string", "description": "Seller's name" }, "address": { "type": "string", "description": "Seller's address" } }, "required": [ "name", "address" ] }, "buyer": { "type": "object", "description": "Information about the buyer", "properties": { "name": { "type": "string", "description": "Buyer's name" }, "address": { "type": "string", "description": "Buyer's address" } }, "required": [ "name", "address" ] }, "lineItems": { "type": "array", "description": "List of items or services on the invoice", "items": { "type": "object", "properties": { "description": { "type": "string", "description": "Description of the item/service" }, "quantity": { "type": "number", "description": "Quantity of the item/service" }, "unitPrice": { "type": "number", "description": "Price per unit" }, "totalPrice": { "type": "number", "description": "Total price for the line item" } }, "required": [ "description", "quantity", "unitPrice", "totalPrice" ] } }, "totalAmounts": { "type": "object", "description": "Summary of all amounts", "properties": { "netTotal": { "type": "number", "description": "Total amount before tax" }, "taxTotal": { "type": "number", "description": "Total tax amount" }, "grossTotal": { "type": "number", "description": "Total amount including tax" }, "dueTotal": { "type": "number", "description": "Total amount due" } }, "required": [ "netTotal", "taxTotal", "grossTotal", "dueTotal" ] }, "paymentDetails": { "type": "object", "description": "Payment information", "properties": { "paymentTerms": { "type": "string", "description": "Payment terms" }, "paymentMethod": { "type": "string", "description": "Payment method", "enum": [ "SEPA_TRANSFER", "CREDIT_CARD", "PAYPAL" ] }, "iban": { "type": "string", "description": "IBAN for bank transfer" } }, "required": [ "paymentTerms", "paymentMethod" ] } } ``` -------------------------------- ### Add Smart Import Feature to SaaS Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Enable a 'smart import' feature in your SaaS application by embedding Data Wizard via iFrames or its REST/GraphQL API. Users can upload documents, and extracted data is streamed back in real-time. ```html You can offer a "smart import" feature in your SaaS application, allowing users to upload documents and automatically populate your application with the extracted data.
**Use Case:** You are a SaaS provider for CRM, accounting, or inventory management software and want to offer your users a "smart import" feature.
**Solution:** Embed Data Wizard directly into your SaaS application using iFrames or the REST/GraphQL API. Provide a seamless user experience by integrating data extraction directly into your workflow. Users can upload documents within your application, and Data Wizard will stream the extracted data back in real-time.
``` -------------------------------- ### Create Custom Strategy Source: https://github.com/capevace/data-wizard-docs/blob/main/custom-strategies.mdx Demonstrates how to create a custom strategy by implementing the `Strategy` interface or extending the `Extractor` class. It also shows how to extend an existing strategy or create a completely custom one. ```php use Mateffy\Magic\Extraction\Strategies\Strategy; use Mateffy\Magic\Extraction\Strategies\Extractor; // Create a custom strategy class MyCustomStrategy extends Extractor {} // Extend an existing strategy class MyCustomizedStrategy extends SequentialStrategy {} // Or completely custom by doing everything yourself class MyCompletelyCustomStrategy implements Strategy {} ``` -------------------------------- ### Customized iFrame Embedding Source: https://github.com/capevace/data-wizard-docs/blob/main/integrate.mdx This example demonstrates how to embed the Data Wizard iFrame with custom dimensions and styling. It includes width, height, frameborder, and inline styles for better integration into the host application's layout. ```html
``` -------------------------------- ### Competitor Analysis and Market Research Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Gather product and pricing information from competitor brochures, websites, or advertisements for market research. Data Wizard automates the extraction of this data for efficient analysis. ```html You can gather product and pricing information from competitor brochures, websites, or advertisements for market research and competitive analysis.
**Use Case:** You need to gather product and pricing information from competitor brochures, websites, or advertisements for market research and competitive analysis.
**Solution:** Use Data Wizard to automatically extract product details, pricing, and other relevant information from publicly available documents. Gain valuable market insights quickly and efficiently, without manual data scraping and entry.
``` -------------------------------- ### Run in Own Application Source: https://github.com/capevace/data-wizard-docs/blob/main/extracting-data.mdx Instructions and code for running extractions within your own application using Data Wizard. ```markdown ## Run inside your own application import More from '/snippets/more.mdx'; ``` -------------------------------- ### Update and Restart Data Wizard Docker Container Source: https://github.com/capevace/data-wizard-docs/blob/main/deployment.mdx Commands to update the Data Wizard Docker image to the latest version, stop the current container, remove it, and then start a new container with the updated image. It's recommended to back up data before performing updates. ```bash docker pull mateffy/data-wizard:latest\ docker stop data-wizard\ docker rm data-wizard\ docker run --name data-wizard -p 9090:80 -p 4430:443 -p 4430:443/udp -v data_wizard_storage:/app/storage -v data_wizard_sqlite_data:/app/database -v data_wizard_caddy_data:/data -v data_wizard_caddy_config:/config -e APP_KEY=[REPLACE_WITH_APP_KEY] mateffy/data-wizard:latest ``` -------------------------------- ### Programmatic Data Extraction Workflow Overview Source: https://github.com/capevace/data-wizard-docs/blob/main/apis.mdx This Mermaid diagram illustrates the programmatic data extraction workflow using the HTTP or GraphQL API. It covers file upload, extraction runs, and receiving notifications. ```mermaid graph TB subgraph User-Driven File Upload A[Create a Bucket POST /api/buckets] --> B[User Uploads Files via Embeddable URL]; B --> D[Redirect to Extractor URL or Embed iFrame]; end subgraph Programmatic File Upload C[Create a Bucket POST /api/buckets] --> C1[Upload Files]; end D --> E[Start Extraction Run]; C1 --> E; E --> F[Webhook Notifications]; E --> G[Poll API for Updates]; F --> H[Receive Data]; G --> H; ``` -------------------------------- ### Configure LLM API Keys via Environment Variables (Docker) Source: https://github.com/capevace/data-wizard-docs/blob/main/configure-llm.mdx This snippet demonstrates how to configure LLM API keys for Data Wizard when running in a Docker container using environment variables. It shows examples for both a direct Docker run command and a Docker Compose file. Ensure you replace placeholder values with your actual API keys and application key. ```bash docker run \ -p 9090:80 \ -e OPENAI_API_TOKEN= \ mateffy/data-wizard:latest ``` ```yaml services: data-wizard: image: mateffy/data-wizard:latest ports: - ... volumes: - ... environment: - APP_KEY= - OPENAI_API_TOKEN= ``` -------------------------------- ### iFrame Theme Customization Source: https://github.com/capevace/data-wizard-docs/blob/main/integrate.mdx Demonstrates how to customize the Data Wizard iFrame's theme using URL parameters and JavaScript postMessage API. ```javascript // Set initial theme via URL parameter // Example: // Dynamically change theme using postMessage const wizardFrame = document.getElementById('data-wizard-iframe'); // Assuming you have an iframe with this ID if (wizardFrame) { wizardFrame.contentWindow.postMessage({ event: 'set_theme', theme: 'dark' }, '*'); } ``` -------------------------------- ### AI-Powered Data Extraction for SaaS Platforms Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx Utilize Data Wizard as the core data extraction engine for platforms requiring robust and adaptable data extraction. Its modular architecture and LLM abstraction layer allow for easy switching between LLM providers and customization. ```html You can use Data Wizard as the core data extraction engine for your platform, supporting a wide range of document types and extraction tasks.
**Use Case:** You are building a document processing or data analysis platform and need robust, adaptable data extraction capabilities.
**Solution:** Use Data Wizard as the core data extraction engine for your platform using the REST/GraphQL API. Its modular architecture and LLM abstraction layer allow you to easily switch between different LLM providers, customize extraction strategies, and adapt to evolving LLM technologies.
``` -------------------------------- ### Custom Strategies Link Source: https://github.com/capevace/data-wizard-docs/blob/main/strategies.mdx Provides a link to learn how to build custom strategies for more control over the extraction process. ```markdown You can create custom strategies to tailor the extraction process to your specific needs. Custom strategies allow you to define how the document is processed, how the LLM is interacted with, and how the results are merged. ``` -------------------------------- ### Register Custom Strategy Source: https://github.com/capevace/data-wizard-docs/blob/main/custom-strategies.mdx Shows how to register a custom strategy with the Data Wizard by calling `Magic::registerStrategy()` in the `boot()` method of your service provider. This makes the custom strategy available in the UI. ```php use Illuminate\Support\ServiceProvider; use Mateffy\Magic\Magic; class AppServiceProvider extends ServiceProvider { public function register() { Magic::registerStrategy('my-custom-strategy', MyCustomStrategy::class); } } ```