### Getting Started Options
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Choose between Data Wizard Cloud for an easy, hosted experience or self-hosting with a Docker Container for full control and integration capabilities.
```html
The easiest way to use Data Wizard! A hosted, ready-to-use version that requires no installation.
For full control and integration capabilities. Install and run Data Wizard locally or on your own infrastructure using Docker. Ideal for developers.
```
--------------------------------
### Deploy Data Wizard with Docker Compose
Source: https://github.com/capevace/data-wizard-docs/blob/main/quick-start.mdx
Defines and deploys the Data Wizard service using Docker Compose, configuring ports, volumes, and environment variables for a complete setup.
```yaml
version: '3.8'\
\
services:\
data-wizard:\
name: data-wizard\
image: mateffy/data-wizard:latest\
ports:\
- "9090:80"\
- "4430:443"\
- "4430:443/udp"\
volumes:\
- data_wizard_storage:/app/storage\
- data_wizard_sqlite_data:/app/database\
- data_wizard_caddy_data:/data\
- data_wizard_caddy_config:/config\
environment:\
- APP_KEY=base64:[REPLACE_WITH_KEY]\
\
volumes:\
data_wizard_storage:\
data_wizard_sqlite_data:\
data_wizard_caddy_data:\
data_wizard_caddy_config:
```
--------------------------------
### Install Mintlify CLI
Source: https://github.com/capevace/data-wizard-docs/blob/main/README.md
Installs the Mintlify Command Line Interface globally using npm. This tool is essential for previewing documentation changes locally.
```bash
npm i -g mintlify
```
--------------------------------
### Run Data Wizard with Docker
Source: https://github.com/capevace/data-wizard-docs/blob/main/quick-start.mdx
Launches the Data Wizard Docker container, mapping necessary ports and volumes for persistent storage and setting the essential APP_KEY environment variable.
```bash
docker run \
--name data-wizard \
-p 9090:80 \
-p 4430:443 \
-p 4430:443/udp \
-v data_wizard_storage:/app/storage \
-v data_wizard_sqlite_data:/app/database \
-v data_wizard_caddy_data:/data \
-v data_wizard_caddy_config:/config \
-e APP_KEY=base64:[REPLACE_WITH_KEY] \
mateffy/data-wizard:latest
```
--------------------------------
### Start Local Development Server
Source: https://github.com/capevace/data-wizard-docs/blob/main/README.md
Starts the Mintlify local development server. This command should be run from the root directory of your documentation project, where the 'mint.json' file is located.
```bash
mintlify dev
```
--------------------------------
### Generate APP_KEY
Source: https://github.com/capevace/data-wizard-docs/blob/main/quick-start.mdx
Generates a random base64 encoded APP_KEY required for Data Wizard's security. Ensure the `-base64` flag is used.
```bash
openssl rand -base64 32
```
--------------------------------
### More Snippet Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/real-estate-properties-from-exposes.mdx
This snippet demonstrates the inclusion of external markdown content, likely for displaying additional information or examples.
```markdown
import More from '/snippets/more.mdx'
```
--------------------------------
### Product Data Extraction from Brochures Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Shows how to extract product names and prices from online brochures. This example is valuable for market research, competitor analysis, and catalog management.
```mdx
import More from '/snippets/more.mdx';
```
--------------------------------
### Example Product Brochure Data Output
Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/products-from-brochures.mdx
An example of the JSON output generated by the product brochure extractor, showing extracted product names and prices.
```json
[
{
"name": "Bottle of red wine",
"original_price": 14.99,
"discounted_price": 9.99
},
{
"name": "Bottle of white wine",
"original_price": 12.99,
"discounted_price": 6.99
}
]
```
--------------------------------
### Real Estate Expose Data Extraction Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Provides an example of extracting structured data from real estate exposes. This includes property details, pricing information, and location data, useful for real estate data management.
```mdx
import More from '/snippets/more.mdx';
```
--------------------------------
### More LLM Information
Source: https://github.com/capevace/data-wizard-docs/blob/main/configure-llm.mdx
This snippet likely includes additional details or examples related to choosing an LLM, possibly from an external markdown file.
```react
import More from '/snippets/more.mdx';
```
--------------------------------
### Implement Strategy Logic
Source: https://github.com/capevace/data-wizard-docs/blob/main/custom-strategies.mdx
Provides examples of implementing custom extraction logic within the `run()` method of a custom strategy. The first example shows a basic implementation, while the second demonstrates validating invoice data by checking the total against the sum of line items.
```php
use Mateffy\Magic\Extraction\Strategies\Extractor;
class MyCustomStrategy extends Extractor
{
public function run(array $artifacts): array
{
// Implement your strategy here
}
}
```
```php
use Mateffy\Magic\Extraction\Strategies\Extractor;
use Mateffy\Magic\Exceptions\JsonSchemaValidationError;
class ValidatedInvoiceStrategy extends ParallelStrategy
{
public function run(array $artifacts): array
{
$data = parent::run($artifacts);
// Validate an invoice total:
$total = 0;
foreach ($data['line_items'] as $item) {
$total += $item['amount'];
}
if ($total !== $data['total']) {
throw new \JsonSchemaValidationError(
'Invoice total does not match the sum of line items'
);
}
// The returned data is now
// guaranteed to be valid invoice data.
return $data;
}
}
```
--------------------------------
### Learn how to extract some data
Source: https://github.com/capevace/data-wizard-docs/blob/main/snippets/more.mdx
Step by step guide to extract data from documents using Data Wizard.
```markdown
Step by step guide to extract data from documents using Data Wizard.
```
--------------------------------
### Example contents.json Structure
Source: https://github.com/capevace/data-wizard-docs/blob/main/preprocessing.mdx
Illustrates the structure of the `contents.json` file, detailing how document content is organized into Slices, including text, images, and page information with their respective properties.
```json
[
{
"page": 1,
"type": "text",
"text": "This is the text on the first page of the document. Lorem ipsum dolor sit amet..."
},
{
"page": 1,
"type": "image",
"mimetype": "image/jpeg",
"path": "images/image1.jpg",
"x": 455.0,
"y": 28.55999755859375,
"width": 88.32000732421875,
"height": 688.3200073242188
},
{
"page": 1,
"type": "page-image",
"mimetype": "image/jpeg",
"path": "pages/page1.jpg"
},
{
"page": 1,
"type": "page-image-marked",
"mimetype": "image/jpeg",
"path": "pages_marked/page1.jpg"
}
]
```
--------------------------------
### Start Extraction
Source: https://github.com/capevace/data-wizard-docs/blob/main/extracting-data.mdx
Initiate data extraction from a bucket or an extractor. View progress and final data in raw JSON or GUI format.
```markdown
## Run inside Data Wizard
All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it's being generated, as well as the final extracted data.
You can view the data both as raw JSON or in the GUI derived from the JSON schema.


All of the data that Data Wizard generates is viewable on the run page. This includes in-progress data as it's being generated, as well as the final extracted data.
You can view the data both as raw JSON or in the GUI derived from the JSON schema.
You can use the built-in UI to create and configure your extractor.

You can use the built-in UI to create and configure your extractor.

You can inspect each step of the extraction process to see how the AI is interpreting your instructions and what data it is returning.
You can use the built-in UI to create and configure your extractor.

You can use the built-in UI to create and configure your extractor.

asd
asd
asd
asd
```
--------------------------------
### Customer Feedback JSON Output Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/customer-feedback-to-data.mdx
An example of the structured JSON data produced by the customer feedback extractor. This format captures key details like submission date, customer information, feedback text, rating, and suggestions.
```json
{
"formType": "Customer Feedback",
"submissionDate": "2024-09-08",
"customerName": "Jane Doe",
"email": "jane.doe@example.com",
"feedbackText": "The service was excellent, and the staff were very friendly. I especially appreciated the quick response time to my inquiry.",
"rating": 5,
"suggestions": "Perhaps offer more variety in your product catalog."
}
```
--------------------------------
### Customer Feedback to JSON Conversion Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Illustrates transforming customer feedback, whether handwritten or printed, into structured JSON format. This facilitates easier analysis and aids in service improvement initiatives.
```mdx
import More from '/snippets/more.mdx';
```
--------------------------------
### More Snippet
Source: https://github.com/capevace/data-wizard-docs/blob/main/extractors.mdx
This snippet likely contains additional details or examples related to the documentation.
```markdown
import More from '/snippets/more.mdx';
```
--------------------------------
### Tax Form Data Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/data-from-paper-tax-forms.mdx
Example of structured JSON output for processed tax form data, including taxpayer information, financial figures, and due dates.
```json
{
"formType": "Tax Form 1040",
"taxYear": 2023,
"taxpayerID": "12-3456789",
"filingStatus": "Single",
"income": 75000.00,
"deductions": 12000.00,
"taxLiability": 15000.00,
"paymentDueDate": "2024-04-15"
}
```
--------------------------------
### Run Data Wizard with Docker
Source: https://github.com/capevace/data-wizard-docs/blob/main/deployment.mdx
Starts the Data Wizard Docker container, mapping necessary ports and volumes for persistent storage and configuration. It includes options for HTTP/HTTPS access, data persistence, and setting the essential APP_KEY environment variable.
```bash
docker run \
--name data-wizard \
-p 9090:80 \
-p 4430:443 \
-p 4430:443/udp \
-v data_wizard_storage:/app/storage \
-v data_wizard_sqlite_data:/app/database \
-v data_wizard_caddy_data:/data \
-v data_wizard_caddy_config:/config \
-e APP_KEY=[REPLACE_WITH_APP_KEY] \
mateffy/data-wizard:latest
```
--------------------------------
### Example Real Estate Data Output
Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/real-estate-properties-from-exposes.mdx
This JSON structure represents the expected output after extracting data from a real estate exposé. It includes property details, unit information, and artifact IDs for images and floorplans.
```json
{
"name": "Modern Apartment in City Center",
"address": "12 Example Street",
"description_text": "Spacious apartment with modern amenities in a vibrant city center location.",
"units": [
{
"usages": [
"living"
],
"label": "Apartment 1",
"floor": "2nd Floor",
"rent_per_m2": 15.50,
"images": [
"artifact:images/image1.png",
"artifact:images/image2.png"
],
"floorplans": [
"artifact:images/image7.png"
]
},
{
"usages": [
"living"
],
"label": "Apartment 2",
"floor": "3rd Floor",
"rent_per_m2": 16.00,
"images": [],
"floorplans": []
}
],
"images": [
"artifact:images/image9.png"
],
"floorplans": []
}
```
--------------------------------
### Invoice Data Extraction Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Demonstrates extracting structured data from scanned invoices. This includes key information such as invoice numbers, dates, line items, and total amounts. It's useful for automating invoice processing.
```mdx
import More from '/snippets/more.mdx';
```
--------------------------------
### Tax Forms Data Extraction Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Details the process of extracting structured data from paper tax forms. It covers personal information, income details, deductions, and credits, streamlining tax document processing.
```mdx
import More from '/snippets/more.mdx';
```
--------------------------------
### Example JSON Schema for Product Extraction
Source: https://github.com/capevace/data-wizard-docs/blob/main/extractors.mdx
This JSON schema defines the structure for extracting product data from supermarket brochures. It includes properties for product name, original price, and discounted price, with validation rules and UI hints.
```APIDOC
{
"type": "object",
"required": ["products"],
"properties": {
"products": {
"type": "array",
"magic_ui": "table",
"items": {
"type": "object",
"required": ["name", "original_price"],
"properties": {
"name": {
"type": "string",
"maxLength": 255,
"description": "The name of the product."
},
"original_price": {
"type": "number",
"minimum": 0,
"multipleOf": 0.01,
"description": "The original price of the product."
},
"discounted_price": {
"type": ["number", "null"],
"minimum": 0,
"multipleOf": 0.01,
"description": "The discounted price for all customers. Prices only applying to customers with a membership card should not be included here."
}
}
}
}
}
}
```
--------------------------------
### Re-install Dependencies
Source: https://github.com/capevace/data-wizard-docs/blob/main/README.md
Re-installs project dependencies, often used to resolve issues when 'mintlify dev' is not running correctly.
```bash
mintlify install
```
--------------------------------
### Get Buckets API
Source: https://github.com/capevace/data-wizard-docs/blob/main/endpoint/get.mdx
Retrieves a list of buckets from the system. This endpoint is used to fetch all available buckets, which can be used for further operations like data extraction or management.
```APIDOC
GET /api/buckets
Description:
Retrieves a list of all available buckets.
Parameters:
None
Responses:
200 OK:
Description: A list of buckets.
Content:
application/json:
Schema:
type: array
items:
type: object
properties:
id:
type: string
description: The unique identifier for the bucket.
name:
type: string
description: The name of the bucket.
createdAt:
type: string
format: date-time
description: The timestamp when the bucket was created.
500 Internal Server Error:
Description: An unexpected error occurred on the server.
```
--------------------------------
### Data Wizard Extraction Workflow
Source: https://github.com/capevace/data-wizard-docs/blob/main/extracting-data.mdx
This snippet outlines the key steps involved in setting up and running an data extraction task using Data Wizard. It details the process from creating an extractor to running it within an application.
```markdown
## Prepare your extraction task
Before we can extract some data, you'll need to tell the wizard what data you want to extract and how to extract it.
You can just describe the shape of data you want to extract, and an AI will generate an initial draft for you.

Edit the generated schema to your liking and add other instructions for the AI to follow. Read more in the [Extractors](./extractors) section.

Extractors are the core configuration objects in Data Wizard
You can select from a large number of LLMs thanks to the [LLM Magic](https://github.com/Capevace/llm-magic) PHP package.
You will need to add your API keys in the LLM settings before you can use them in an extractor. Find out more in the [LLM Provider Configuration](./configure-llm) section.

Configure your Large Language Model (LLM) API provider in Data Wizard to connect to leading LLMs like OpenAI, Anthropic, Google AI, Mistral AI, and more.
There are multiple [built-in strategies](./strategies) to choose from, or you can create your own custom strategy.

Learn about the built-in and custom extraction strategies available in Data Wizard.
After you have configured your extractor, you can run it to extract data from your documents.
You can either use the built-in UI to do this, or you can integrate the feature into an existing application using the iFrame and HTTP API.
Via Data Wizard's backend UI
Via the embedded iFrame UI
```
--------------------------------
### Data Extraction Workflow
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Illustrates the general workflow of Data Wizard, showing how input data, LLM configuration, and output format interact to produce extracted and validated JSON data.
```mermaid
graph TB
Input[Input data] -- PDF / Word files --> Extraction[Data Wizard]
LLM[LLM Config] -- Prompt & Strategy --> Extraction
Output[Output format] -- JSON Schema --> Extraction
Extraction --> Results[Extracted and validated JSON data]
```
--------------------------------
### LLM Provider Configuration
Source: https://github.com/capevace/data-wizard-docs/blob/main/snippets/more.mdx
Set up your Large Language Model API keys.
```markdown
Set up your Large Language Model API keys.
```
--------------------------------
### GraphQL API Endpoint and Example Query
Source: https://github.com/capevace/data-wizard-docs/blob/main/apis.mdx
Data Wizard exposes a GraphQL endpoint for flexible data querying. This section provides the endpoint URL and an example of a GraphQL query to retrieve saved extractors.
```graphql
https://YOUR_DATA_WIZARD_URL/api/graphql
query {
savedExtractors {
collection {
id
label
}
}
}
```
--------------------------------
### List All Extractions
Source: https://github.com/capevace/data-wizard-docs/blob/main/endpoint/list.mdx
Retrieves a list of all available extractions via the API. This endpoint is used to get an overview of all extraction jobs or configurations within the system.
```APIDOC
GET /api/v1/extractions
Description:
Lists all extractions.
Response:
200 OK:
content:
application/json:
schema:
type: array
items:
type: object
properties:
id:
type: string
description: Unique identifier for the extraction.
name:
type: string
description: Name of the extraction.
status:
type: string
description: Current status of the extraction (e.g., 'completed', 'running', 'failed').
createdAt:
type: string
format: date-time
description: Timestamp when the extraction was created.
updatedAt:
type: string
format: date-time
description: Timestamp when the extraction was last updated.
```
--------------------------------
### Strategies
Source: https://github.com/capevace/data-wizard-docs/blob/main/snippets/more.mdx
Understand different data processing strategies.
```markdown
Understand different data processing strategies.
```
--------------------------------
### Data Wizard Integration into Application
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Details how Data Wizard can be integrated into an application, showing the flow from a software UI embedding Data Wizard's UI, through file processing and LLM extraction, to final data delivery.
```mermaid
graph TB
A[Example software UI] -- Embeds an iFrame --> B[Data Wizard Embedded UI]
B -- Upload files --> F[Data Wizard Core]
F -- Extract text and images from files --> G[Artifacts]
G --> H[Extraction Strategy]
H -.-> I[LLM]
I -.-> H
H -- JSON data --> F
F <-."Streaming results into\nautomatic UI".-> B
B -- Final data via download,\nJavaScript API or webhook --> A
```
--------------------------------
### Simple Invoice JSON Output
Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/paper-invoice-to-structured-data.mdx
Example JSON output from the invoice extractor, detailing invoice number, dates, seller and buyer information, line items with quantities and prices, total amounts, and payment details.
```json
{
"invoiceNumber": "INV-2022-001",
"issueDate": "2022-01-01",
"currency": "EUR",
"seller": {
"name": "ACME Inc.",
"address": "123 Main St.",
"postalCode": "12345",
"city": "Springfield",
"country": "US",
"vatNumber": "US123456789"
},
"buyer": {
"customerNumber": "CUST-123",
"name": "Buyer Corp.",
"address": "456 Elm St.",
"postalCode": "54321",
"city": "Shelbyville",
"country": "US"
},
"lineItems": [
{
"position": 1,
"description": "Product A",
"unitPrice": 100.0,
"quantity": 2,
"vatRate": 19.0,
"netAmount": 200.0
},
{
"position": 2,
"description": "Product B",
"unitPrice": 50.0,
"quantity": 3,
"vatRate": 19.0,
"netAmount": 150.0
}
],
"totalAmounts": {
"netTotal": 350.0,
"taxTotal": 66.5,
"grossTotal": 416.5,
"dueTotal": 416.5
},
"paymentDetails": {
"paymentTerms": "Net 30 days",
"paymentMethod": "SEPA_TRANSFER",
"iban": "DE89370400440532013000"
}
}
```
--------------------------------
### Invoice Data Structure Example
Source: https://github.com/capevace/data-wizard-docs/blob/main/examples/paper-invoice-to-structured-data.mdx
This JSON snippet illustrates the expected structure for invoice data, including details about the seller, buyer, line items, and total amounts. It serves as a schema for data extraction and validation.
```json
{
"invoiceNumber": {
"type": "string",
"description": "Unique invoice identifier"
},
"issueDate": {
"type": "string",
"description": "Date the invoice was issued"
},
"currency": {
"type": "string",
"description": "Currency code (e.g., EUR, USD)"
},
"seller": {
"type": "object",
"description": "Information about the seller",
"properties": {
"name": {
"type": "string",
"description": "Seller's name"
},
"address": {
"type": "string",
"description": "Seller's address"
}
},
"required": [
"name",
"address"
]
},
"buyer": {
"type": "object",
"description": "Information about the buyer",
"properties": {
"name": {
"type": "string",
"description": "Buyer's name"
},
"address": {
"type": "string",
"description": "Buyer's address"
}
},
"required": [
"name",
"address"
]
},
"lineItems": {
"type": "array",
"description": "List of items or services on the invoice",
"items": {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "Description of the item/service"
},
"quantity": {
"type": "number",
"description": "Quantity of the item/service"
},
"unitPrice": {
"type": "number",
"description": "Price per unit"
},
"totalPrice": {
"type": "number",
"description": "Total price for the line item"
}
},
"required": [
"description",
"quantity",
"unitPrice",
"totalPrice"
]
}
},
"totalAmounts": {
"type": "object",
"description": "Summary of all amounts",
"properties": {
"netTotal": {
"type": "number",
"description": "Total amount before tax"
},
"taxTotal": {
"type": "number",
"description": "Total tax amount"
},
"grossTotal": {
"type": "number",
"description": "Total amount including tax"
},
"dueTotal": {
"type": "number",
"description": "Total amount due"
}
},
"required": [
"netTotal",
"taxTotal",
"grossTotal",
"dueTotal"
]
},
"paymentDetails": {
"type": "object",
"description": "Payment information",
"properties": {
"paymentTerms": {
"type": "string",
"description": "Payment terms"
},
"paymentMethod": {
"type": "string",
"description": "Payment method",
"enum": [
"SEPA_TRANSFER",
"CREDIT_CARD",
"PAYPAL"
]
},
"iban": {
"type": "string",
"description": "IBAN for bank transfer"
}
},
"required": [
"paymentTerms",
"paymentMethod"
]
}
}
```
--------------------------------
### Add Smart Import Feature to SaaS
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Enable a 'smart import' feature in your SaaS application by embedding Data Wizard via iFrames or its REST/GraphQL API. Users can upload documents, and extracted data is streamed back in real-time.
```html
You can offer a "smart import" feature in your SaaS application, allowing users to upload documents and automatically populate your application with the extracted data.
**Use Case:** You are a SaaS provider for CRM, accounting, or inventory management software and want to offer your users a "smart import" feature.
**Solution:** Embed Data Wizard directly into your SaaS application using iFrames or the REST/GraphQL API. Provide a seamless user experience by integrating data extraction directly into your workflow. Users can upload documents within your application, and Data Wizard will stream the extracted data back in real-time.
```
--------------------------------
### Create Custom Strategy
Source: https://github.com/capevace/data-wizard-docs/blob/main/custom-strategies.mdx
Demonstrates how to create a custom strategy by implementing the `Strategy` interface or extending the `Extractor` class. It also shows how to extend an existing strategy or create a completely custom one.
```php
use Mateffy\Magic\Extraction\Strategies\Strategy;
use Mateffy\Magic\Extraction\Strategies\Extractor;
// Create a custom strategy
class MyCustomStrategy extends Extractor {}
// Extend an existing strategy
class MyCustomizedStrategy extends SequentialStrategy {}
// Or completely custom by doing everything yourself
class MyCompletelyCustomStrategy implements Strategy {}
```
--------------------------------
### Customized iFrame Embedding
Source: https://github.com/capevace/data-wizard-docs/blob/main/integrate.mdx
This example demonstrates how to embed the Data Wizard iFrame with custom dimensions and styling. It includes width, height, frameborder, and inline styles for better integration into the host application's layout.
```html
```
--------------------------------
### Competitor Analysis and Market Research
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Gather product and pricing information from competitor brochures, websites, or advertisements for market research. Data Wizard automates the extraction of this data for efficient analysis.
```html
You can gather product and pricing information from competitor brochures, websites, or advertisements for market research and competitive analysis.
**Use Case:** You need to gather product and pricing information from competitor brochures, websites, or advertisements for market research and competitive analysis.
**Solution:** Use Data Wizard to automatically extract product details, pricing, and other relevant information from publicly available documents. Gain valuable market insights quickly and efficiently, without manual data scraping and entry.
```
--------------------------------
### Run in Own Application
Source: https://github.com/capevace/data-wizard-docs/blob/main/extracting-data.mdx
Instructions and code for running extractions within your own application using Data Wizard.
```markdown
## Run inside your own application
import More from '/snippets/more.mdx';
```
--------------------------------
### Update and Restart Data Wizard Docker Container
Source: https://github.com/capevace/data-wizard-docs/blob/main/deployment.mdx
Commands to update the Data Wizard Docker image to the latest version, stop the current container, remove it, and then start a new container with the updated image. It's recommended to back up data before performing updates.
```bash
docker pull mateffy/data-wizard:latest\
docker stop data-wizard\
docker rm data-wizard\
docker run --name data-wizard -p 9090:80 -p 4430:443 -p 4430:443/udp -v data_wizard_storage:/app/storage -v data_wizard_sqlite_data:/app/database -v data_wizard_caddy_data:/data -v data_wizard_caddy_config:/config -e APP_KEY=[REPLACE_WITH_APP_KEY] mateffy/data-wizard:latest
```
--------------------------------
### Programmatic Data Extraction Workflow Overview
Source: https://github.com/capevace/data-wizard-docs/blob/main/apis.mdx
This Mermaid diagram illustrates the programmatic data extraction workflow using the HTTP or GraphQL API. It covers file upload, extraction runs, and receiving notifications.
```mermaid
graph TB
subgraph User-Driven File Upload
A[Create a Bucket POST /api/buckets] --> B[User Uploads Files via Embeddable URL];
B --> D[Redirect to Extractor URL or Embed iFrame];
end
subgraph Programmatic File Upload
C[Create a Bucket POST /api/buckets] --> C1[Upload Files];
end
D --> E[Start Extraction Run];
C1 --> E;
E --> F[Webhook Notifications];
E --> G[Poll API for Updates];
F --> H[Receive Data];
G --> H;
```
--------------------------------
### Configure LLM API Keys via Environment Variables (Docker)
Source: https://github.com/capevace/data-wizard-docs/blob/main/configure-llm.mdx
This snippet demonstrates how to configure LLM API keys for Data Wizard when running in a Docker container using environment variables. It shows examples for both a direct Docker run command and a Docker Compose file. Ensure you replace placeholder values with your actual API keys and application key.
```bash
docker run \
-p 9090:80 \
-e OPENAI_API_TOKEN= \
mateffy/data-wizard:latest
```
```yaml
services:
data-wizard:
image: mateffy/data-wizard:latest
ports:
- ...
volumes:
- ...
environment:
- APP_KEY=
- OPENAI_API_TOKEN=
```
--------------------------------
### iFrame Theme Customization
Source: https://github.com/capevace/data-wizard-docs/blob/main/integrate.mdx
Demonstrates how to customize the Data Wizard iFrame's theme using URL parameters and JavaScript postMessage API.
```javascript
// Set initial theme via URL parameter
// Example:
// Dynamically change theme using postMessage
const wizardFrame = document.getElementById('data-wizard-iframe'); // Assuming you have an iframe with this ID
if (wizardFrame) {
wizardFrame.contentWindow.postMessage({
event: 'set_theme',
theme: 'dark'
}, '*');
}
```
--------------------------------
### AI-Powered Data Extraction for SaaS Platforms
Source: https://github.com/capevace/data-wizard-docs/blob/main/introduction.mdx
Utilize Data Wizard as the core data extraction engine for platforms requiring robust and adaptable data extraction. Its modular architecture and LLM abstraction layer allow for easy switching between LLM providers and customization.
```html
You can use Data Wizard as the core data extraction engine for your platform, supporting a wide range of document types and extraction tasks.
**Use Case:** You are building a document processing or data analysis platform and need robust, adaptable data extraction capabilities.
**Solution:** Use Data Wizard as the core data extraction engine for your platform using the REST/GraphQL API. Its modular architecture and LLM abstraction layer allow you to easily switch between different LLM providers, customize extraction strategies, and adapt to evolving LLM technologies.
```
--------------------------------
### Custom Strategies Link
Source: https://github.com/capevace/data-wizard-docs/blob/main/strategies.mdx
Provides a link to learn how to build custom strategies for more control over the extraction process.
```markdown
You can create custom strategies to tailor the extraction process to your specific needs. Custom strategies allow you to define how the document is processed, how the LLM is interacted with, and how the results are merged.
```
--------------------------------
### Register Custom Strategy
Source: https://github.com/capevace/data-wizard-docs/blob/main/custom-strategies.mdx
Shows how to register a custom strategy with the Data Wizard by calling `Magic::registerStrategy()` in the `boot()` method of your service provider. This makes the custom strategy available in the UI.
```php
use Illuminate\Support\ServiceProvider;
use Mateffy\Magic\Magic;
class AppServiceProvider extends ServiceProvider
{
public function register()
{
Magic::registerStrategy('my-custom-strategy', MyCustomStrategy::class);
}
}
```