### Quick Start: Scrape a Single Page

Source: https://docs.llmcrawl.dev/sdk-javascript

A basic example demonstrating how to initialize the LLMCrawl client and scrape a single web page, logging the extracted markdown content.

```typescript
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({
  apiKey: "your-api-key-here",
});
// Scrape a single page
const result = await client.scrape("https://example.com");
console.log(result.data?.markdown);
```

--------------------------------

### Install LLMCrawl SDK

Source: https://docs.llmcrawl.dev/sdk-javascript

Instructions for installing the LLMCrawl JavaScript SDK using npm, yarn, or pnpm.

```bash
npm install @llmcrawl/llmcrawl-js
```

```bash
yarn add @llmcrawl/llmcrawl-js
```

```bash
pnpm add @llmcrawl/llmcrawl-js
```

--------------------------------

### Install LLMCrawl JavaScript SDK

Source: https://docs.llmcrawl.dev/index

Installs the official LLMCrawl JavaScript/TypeScript SDK using npm. This SDK allows for easy integration with Node.js, React, Next.js, and browser applications.

```bash
npm install @llmcrawl/llmcrawl-js
```

--------------------------------

### Initiate Crawl Request

Source: https://docs.llmcrawl.dev/tags/Crawling

Demonstrates how to send a POST request to the /v1/crawl endpoint to start a web crawl. Includes configuration for paths, depth, limits, scraping options, and webhooks.

```cURL
curl https://api.llmcrawl.dev/v1/crawl \
  --request POST \
  --header 'Authorization: Token' \
  --header 'Content-Type: application/json' \
  --data '{
  "includePaths": [
    [
      "/blog/*",
      "/articles/*",
      "/docs/*"
    ]
  ],
  "excludePaths": [
    [
      "/admin/*",
      "/private/*",
      "/api/*"
    ]
  ],
  "maxDepth": 3,
  "limit": 500,
  "allowBackwardLinks": false,
  "allowExternalLinks": false,
  "ignoreSitemap": true,
  "url": "string",
  "origin": "api",
  "scrapeOptions": {
    "formats": [
      [
        "markdown",
        "rawHtml"
      ]
    ],
    "headers": {
      "additionalProperties": "string"
    },
    "includeTags": [
      [
        "h1",
        "h2",
        "p",
        "article"
      ]
    ],
    "excludeTags": [
      [
        "nav",
        "footer",
        "script",
        "style"
      ]
    ],
    "waitFor": 3000,
    "extract": {
      "mode": "string",
      "schema": {
        "type": "object",
        "properties": {
          "title": {
            "type": "string"
          },
          "price": {
            "type": "number"
          },
          "description": {
            "type": "string"
          }
        },
        "required": [
          "title",
          "price"
        ]
      },
      "systemPrompt": "Based on the information on the page, extract all the information from the schema. Try to extract all the fields even those that might not be marked as required.",
      "prompt": "Extract the main article title and author from this page"
    }
  },
  "webhookUrls": [
    [
      "https://your-webhook.com/crawl-status"
    ]
  ],
  "webhookMetadata": {
    "crawlId": "crawl_123",
    "userId": "user_456"
  }
}'
```

```JavaScript
fetch('https://api.llmcrawl.dev/v1/crawl', {
  method: 'POST',
  headers: {
    Authorization: 'Token',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    includePaths: [{
      0: '/blog/*',
      1: '/articles/*',
      2: '/docs/*'
    }],
    excludePaths: [{
      0: '/admin/*',
      1: '/private/*',
      2: '/api/*'
    }],
    maxDepth: 3,
    limit: 500,
    allowBackwardLinks: false,
    allowExternalLinks: false,
    ignoreSitemap: true,
    url: 'string',
    origin: 'api',
    scrapeOptions: {
      formats: [{
        0: 'markdown',
        1: 'rawHtml'
      }],
      headers: {
        additionalProperties: 'string'
      },
      includeTags: [{
        0: 'h1',
        1: 'h2',
        2: 'p',
        3: 'article'
      }],
      excludeTags: [{
        0: 'nav',
        1: 'footer',
        2: 'script',
        3: 'style'
      }],
      waitFor: 3000,
      extract: {
        mode: 'string',
        schema: {
          type: 'object',
          properties: {
            title: {
              type: 'string'
            },
            price: {
              type: 'number'
            },
            description: {
              type: 'string'
            }
          },
          required: ['title',         'price']
        },
        systemPrompt: 'Based on the information on the page, extract all the information from the schema. Try to extract all the fields even those that might not be marked as required.',
        prompt: 'Extract the main article title and author from this page'
      }
    },
    webhookUrls: [{
      0: 'https://your-webhook.com/crawl-status'
    }],
    webhookMetadata: {
      crawlId: 'crawl_123',
      userId: 'user_456'
    }
  })
})
```

```PHP
$ch = curl_init("https://api.llmcrawl.dev/v1/crawl");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Authorization: Token', 'Content-Type: application/json']);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
  'includePaths' => [
    [
      '/blog/*',
      '/articles/*',
      '/docs/*'
    ]
  ],
  'excludePaths' => [
    [
      '/admin/*',
      '/private/*',
      '/api/*'
    ]
  ],
  'maxDepth' => 3,
  'limit' => 500,
  'allowBackwardLinks' => false,
  'allowExternalLinks' => false,
  'ignoreSitemap' => true,
  'url' => 'string',
  'origin' => 'api',
  'scrapeOptions' => [
    'formats' => [
      [
        'markdown',
        'rawHtml'
      ]
    ],
    'headers' => [
      'additionalProperties' => 'string'
    ],
    'includeTags' => [
      [
        'h1',
        'h2',
        'p',
        'article'
      ]
    ],
    'excludeTags' => [
      [
        'nav',
        'footer',
        'script',
        'style'
      ]
    ],
    'waitFor' => 3000,
    'extract' => [
      'mode' => 'string',
      'schema' => [
        'type' => 'object',
        'properties' => [
          'title' => [
            'type' => 'string'
          ],
          'price' => [
            'type' => 'number'
          ],
          'description' => [
            'type' => 'string'
          ]
        ],
        'required' => [
          'title',

```

--------------------------------

### Knowledge Base Creation

Source: https://docs.llmcrawl.dev/examples

Builds a searchable knowledge base from website content by crawling specified paths and extracting structured data. It includes logic to poll crawl status and process results into a defined interface.

```typescript
interface KnowledgeEntry {
  id: string;
  title: string;
  content: string;
  url: string;
  section: string;
  keywords: string[];
}
async function buildKnowledgeBase(baseUrl: string): Promise<KnowledgeEntry[]> {
  const crawl = await client.crawl(baseUrl, {
    limit: 1000,
    includePaths: ["/docs/*", "/guides/*", "/tutorials/*"],
    scrapeOptions: {
      formats: ["markdown"],
      extract: {
        schema: {
          type: "object",
          properties: {
            title: { type: "string" },
            section: { type: "string" },
            content: { type: "string" },
            keywords: { type: "array", items: { type: "string" } },
          },
        },
      },
    },
  });
  if (!crawl.success) return [];
  // Wait for completion
  let status = await client.getCrawlStatus(crawl.id);
  while (status.success && status.status === "scraping") {
    await new Promise((resolve) => setTimeout(resolve, 5000));
    status = await client.getCrawlStatus(crawl.id);
  }
  if (!status.success || status.status !== "completed") return [];
  // Process results
  const knowledgeBase: KnowledgeEntry[] = status.data
    .filter((page) => page.extract && page.markdown)
    .map((page, index) => {
      const extracted = JSON.parse(page.extract);
      return {
        id: `kb_${index + 1}`,
        title: extracted.title || page.metadata?.title || "Untitled",
        content: page.markdown,
        url: page.metadata?.url || "",
        section: extracted.section || "General",
        keywords: extracted.keywords || [],
      };
    });
  return knowledgeBase;
}
// Usage
const kb = await buildKnowledgeBase("https://docs.myapp.com");
console.log(`Knowledge base created with ${kb.length} entries`);
```

--------------------------------

### API Documentation Scraping

Source: https://docs.llmcrawl.dev/examples

Extracts structured API documentation from a given URL using a defined schema. It specifies inclusion paths and scraping options, including the format and extraction schema.

```typescript
const apiDocSchema = {
  type: "object",
  properties: {
    title: { type: "string" },
    description: { type: "string" },
    endpoint: { type: "string" },
    method: { type: "string" },
    parameters: {
      type: "array",
      items: {
        type: "object",
        properties: {
          name: { type: "string" },
          type: { type: "string" },
          required: { type: "boolean" },
          description: { type: "string" },
        },
      },
    },
    responseExample: { type: "string" },
    codeExamples: {
      type: "array",
      items: {
        type: "object",
        properties: {
          language: { type: "string" },
          code: { type: "string" },
        },
      },
    },
  },
};
const docsCrawl = await client.crawl("https://docs.api.example.com", {
  limit: 500,
  includePaths: ["/reference/*", "/endpoints/*"],
  scrapeOptions: {
    formats: ["markdown"],
    extract: { schema: apiDocSchema },
  },
});
```

--------------------------------

### LLMCrawl API Reference - Scraping, Crawling, and Mapping

Source: https://docs.llmcrawl.dev/examples

Provides an overview of the LLMCrawl API endpoints for scraping, crawling, and mapping operations. It details the HTTP methods, paths, and associated operations available for interacting with the LLMCrawl service.

```APIDOC
Scraping:
  POST /v1/scrape
    - Initiates a web scraping task.
    - Related documentation: https://docs.llmcrawl.dev/operations/postV1Scrape.html

Crawling:
  POST /v1/crawl
    - Starts a web crawling task.
    - Related documentation: https://docs.llmcrawl.dev/operations/postV1Crawl.html

  GET /v1/crawl/{id}
    - Retrieves the status or results of a specific crawling task.
    - Parameters:
      - id: The unique identifier of the crawl task.
    - Related documentation: https://docs.llmcrawl.dev/operations/getV1CrawlById.html

  DELETE /v1/crawl/{id}/cancel
    - Cancels an ongoing crawling task.
    - Parameters:
      - id: The unique identifier of the crawl task to cancel.
    - Related documentation: https://docs.llmcrawl.dev/operations/deleteV1CrawlByIdCancel.html

Mapping:
  POST /v1/map
    - Performs a mapping operation, likely related to data transformation or analysis.
    - Related documentation: https://docs.llmcrawl.dev/operations/postV1Map.html
```

--------------------------------

### Price Monitoring

Source: https://docs.llmcrawl.dev/examples

Monitors product prices across multiple websites by scraping product pages. It defines an interface for product data and uses a schema to extract name, price, and currency. Includes basic error handling and rate limiting.

```typescript
interface Product {
  name: string;
  price: number;
  url: string;
  timestamp: Date;
}
async function monitorPrices(productUrls: string[]): Promise<Product[]> {
  const priceSchema = {
    type: "object",
    properties: {
      name: { type: "string" },
      price: { type: "number" },
      currency: { type: "string" },
    },
    required: ["name", "price"],
  };
  const products: Product[] = [];
  for (const url of productUrls) {
    try {
      const result = await client.scrape(url, {
        extract: { schema: priceSchema },
      });
      if (result.success && result.data.extract) {
        const data = JSON.parse(result.data.extract);
        products.push({
          name: data.name,
          price: data.price,
          url,
          timestamp: new Date(),
        });
      }
    } catch (error) {
      console.error(`Failed to scrape ${url}:`, error);
    }
    // Rate limiting
    await new Promise((resolve) => setTimeout(resolve, 1000));
  }
  return products;
}
// Usage
const productUrls = [
  "https://store1.example.com/product/123",
  "https://store2.example.com/item/456",
  "https://store3.example.com/product/789",
];
const prices = await monitorPrices(productUrls);
console.log("Current prices:", prices);
```

--------------------------------

### Install JavaScript/TypeScript SDK

Source: https://docs.llmcrawl.dev/sdks

Installs the official LLMCrawl JavaScript/TypeScript SDK using npm. This package provides full API coverage and TypeScript support for Node.js and browser environments.

```bash
npm install @llmcrawl/llmcrawl-js
```

--------------------------------

### Extract Product Information with TypeScript

Source: https://docs.llmcrawl.dev/examples

Demonstrates how to extract structured product data from an e-commerce website using the LLMCrawl JavaScript/TypeScript SDK. It defines a JSON schema for product details and uses the `scrape` method to extract and parse the information.

```typescript
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({ apiKey: "your-api-key" });
const productSchema = {
  type: "object",
  properties: {
    name: { type: "string" },
    price: { type: "number" },
    originalPrice: { type: "number" },
    discount: { type: "number" },
    rating: { type: "number" },
    reviewCount: { type: "number" },
    inStock: { type: "boolean" },
    description: { type: "string" },
    specifications: {
      type: "object",
      properties: {
        brand: { type: "string" },
        model: { type: "string" },
        color: { type: "string" },
        size: { type: "string" },
      },
    },
    images: {
      type: "array",
      items: { type: "string" },
    },
  },
  required: ["name", "price", "inStock"],
};
const result = await client.scrape("https://store.example.com/product/123", {
  formats: ["markdown"],
  extract: { schema: productSchema },
});
if (result.success) {
  const product = JSON.parse(result.data.extract);
  console.log(`Product: ${product.name}`);
  console.log(`Price: $${product.price}`);
  console.log(`In Stock: ${product.inStock}`);
}
```

--------------------------------

### Blog Content Crawling

Source: https://docs.llmcrawl.dev/examples

Crawls blog sites to extract content, allowing specification of crawl limits, included/excluded paths, and scraping options like formats and extraction schemas. It also demonstrates monitoring crawl progress and processing the crawled data.

```typescript
const blogCrawl = await client.crawl("https://blog.example.com", {
  limit: 200,
  includePaths: ["/posts/*", "/articles/*"],
  excludePaths: ["/admin/*", "/author/*"],
  scrapeOptions: {
    formats: ["markdown"],
    extract: {
      schema: {
        type: "object",
        properties: {
          title: { type: "string" },
          author: { type: "string" },
          publishDate: { type: "string" },
          content: { type: "string" },
          tags: { type: "array", items: { type: "string" } },
          summary: { type: "string" },
        },
      },
    },
  },
});
// Monitor progress
if (blogCrawl.success) {
  let status = await client.getCrawlStatus(blogCrawl.id);
  while (status.success && status.status === "scraping") {
    console.log(`Progress: ${status.completed}/${status.total} articles`);
    await new Promise((resolve) => setTimeout(resolve, 10000));
    status = await client.getCrawlStatus(blogCrawl.id);
  }
  if (status.success && status.status === "completed") {
    console.log(`Successfully crawled ${status.data.length} articles`);
    // Process articles
    const articles = status.data
      .filter((page) => page.extract)
      .map((page) => JSON.parse(page.extract));
    // Create content database
    const contentDB = articles.map((article, index) => ({
      id: index + 1,
      title: article.title,
      author: article.author,
      publishDate: article.publishDate,
      wordCount: article.content.split(" ").length,
      tags: article.tags || [],
    }));
    console.log("Content database created:", contentDB.length, "entries");
  }
}
```

--------------------------------

### Market Analysis with LLMCrawl

Source: https://docs.llmcrawl.dev/examples

Analyzes property market trends by scraping property data from a list of URLs. It calculates metrics like average price, median price, and price range. Requires a 'client' object with a 'scrape' method and a 'propertySchema'.

```typescript
async function analyzeMarket(searchUrls: string[]) {
  const properties = [];
  for (const url of searchUrls) {
    const result = await client.scrape(url, {
      extract: { schema: propertySchema },
    });
    if (result.success && result.data.extract) {
      const property = JSON.parse(result.data.extract);
      properties.push(property);
    }
    await new Promise((resolve) => setTimeout(resolve, 2000));
  }
  // Calculate market metrics
  const prices = properties.map((p) => p.price).filter((p) => p > 0);
  const avgPrice = prices.reduce((a, b) => a + b, 0) / prices.length;
  const medianPrice = prices.sort((a, b) => a - b)[
    Math.floor(prices.length / 2)
  ];
  return {
    totalProperties: properties.length,
    averagePrice: avgPrice,
    medianPrice: medianPrice,
    priceRange: {
      min: Math.min(...prices),
      max: Math.max(...prices),
    },
    propertyTypes: [...new Set(properties.map((p) => p.propertyType))],
  };
}
```

--------------------------------

### Stock and Financial Data Extraction Schema

Source: https://docs.llmcrawl.dev/examples

Defines a JSON schema for extracting stock and financial information, such as current price, change percentage, market cap, and P/E ratio. This schema is used with LLMCrawl's 'scrape' method to get structured financial data from company pages.

```typescript
const financialSchema = {
  type: "object",
  properties: {
    symbol: { type: "string" },
    companyName: { type: "string" },
    currentPrice: { type: "number" },
    change: { type: "number" },
    changePercent: { type: "number" },
    volume: { type: "number" },
    marketCap: { type: "string" },
    peRatio: { type: "number" },
    dividendYield: { type: "number" },
    earningsDate: { type: "string" },
  },
};
const stockData = await client.scrape(
  "https://finance.example.com/stock/AAPL",
  {
    extract: { schema: financialSchema },
  }
);
```

--------------------------------

### Authentication and Client Initialization

Source: https://docs.llmcrawl.dev/sdk-javascript

Demonstrates how to initialize the LLMCrawl client with an API key and an optional custom base URL for API requests.

```typescript
const client = new LLMCrawl({
  apiKey: "your-api-key",
  baseUrl: "https://api.llmcrawl.dev", // Optional custom base URL
});
```

--------------------------------

### Initialize LLMCrawl Client and Scrape

Source: https://docs.llmcrawl.dev/sdks

Demonstrates how to initialize the LLMCrawl client with an API key and perform a basic scrape operation for a given URL. The result contains extracted data, including markdown.

```typescript
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({ apiKey: "your-api-key" });
const result = await client.scrape("https://example.com");
```

--------------------------------

### Data Validation and Cleaning with Ajv

Source: https://docs.llmcrawl.dev/examples

This example demonstrates how to validate and clean data against a JSON schema using the Ajv library. It includes functions to compile a schema, validate data, and attempt to clean data based on validation errors.

```typescript
import Ajv from "ajv";
const ajv = new Ajv();
function validateAndCleanData(data: any, schema: any) {
  const validate = ajv.compile(schema);
  const valid = validate(data);
  if (!valid) {
    console.warn("Validation errors:", validate.errors);
    // Attempt to clean/fix data
    return cleanData(data, validate.errors);
  }
  return data;
}
function cleanData(data: any, errors: any[]) {
  // Implement data cleaning logic based on validation errors
  const cleaned = { ...data };
  errors.forEach((error) => {
    if (error.keyword === "type" && error.params.type === "number") {
      const path = error.instancePath.replace("/", "");
      if (typeof cleaned[path] === "string") {
        const num = parseFloat(cleaned[path].replace(/[^0-9.-]/g, ""));
        if (!isNaN(num)) cleaned[path] = num;
      }
    }
  });
  return cleaned;
}
```

--------------------------------

### LLMCrawl API: Get Crawl Job Status

Source: https://docs.llmcrawl.dev/operations/getV1CrawlById

Retrieves the status of a specific crawl job using its unique ID. Requires Bearer token authorization. The response includes job progress, completion status, and data extracted from the crawl.

```APIDOC
GET /v1/crawl/{id}
  Get crawl job status
  Authorizations:
    bearerAuth: TypeHTTP (bearer)
  Parameters:
    Path Parameters:
      id*: Type:string, Required
  Responses:
    200:
      Successful response
      Content-Type: application/json
      Schema: JSON
        {
          "success": true,
          "status": "string",
          "completed": 0,
          "total": 0,
          "expiresAt": "string",
          "next": "string",
          "data": [
            {
              "markdown": "string",
              "extract": "string",
              "html": "string",
              "rawHtml": "string",
              "links": [
                "string"
              ],
              "screenshot": "string",
              "metadata": {
                "additionalProperties": {}
              }
            }
          ]
        }
    404:
      Not Found
    500:
      Internal Server Error
  Playground:
    Authorization: bearerAuth
    Variables:
      Key: id*, Value: 
  Samples:
    cURL:
      curl 'https://api.llmcrawl.dev/v1/crawl/{id}' \
        --header 'Authorization: Token'
    JavaScript:
      fetch('https://api.llmcrawl.dev/v1/crawl/{id}', {
        headers: {
          Authorization: 'Token'
        }
      })
    PHP:
      $ch = curl_init("https://api.llmcrawl.dev/v1/crawl/{id}");
      curl_setopt($ch, CURLOPT_HTTPHEADER, ['Authorization: Token']);
      curl_exec($ch);
      curl_close($ch);
    Python:
      requests.get("https://api.llmcrawl.dev/v1/crawl/{id}",
          headers={
            "Authorization": "Token"
          }
      )
```

--------------------------------

### Cancel Crawl Samples

Source: https://docs.llmcrawl.dev/tags/Crawling

Provides code samples for canceling a crawl using different programming languages and tools. These examples demonstrate how to make the DELETE request to the API endpoint.

```cURL
curl 'https://api.llmcrawl.dev/v1/crawl/{id}/cancel' \
  --request DELETE \
  --header 'Authorization: Token'
```

```JavaScript
fetch('https://api.llmcrawl.dev/v1/crawl/{id}/cancel', {
  method: 'DELETE',
  headers: {
    Authorization: 'Token'
  }
})
```

```PHP
$ch = curl_init("https://api.llmcrawl.dev/v1/crawl/{id}/cancel");
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Authorization: Token']);
curl_exec($ch);
curl_close($ch);
```

```Python
requests.delete(
    "https://api.llmcrawl.dev/v1/crawl/{id}/cancel",
    headers={
      "Authorization": "Token"
    }
)
```

--------------------------------

### Property Listing Extraction

Source: https://docs.llmcrawl.dev/examples

Extracts detailed property information from a real estate listing URL using a predefined JSON schema. This allows for structured data retrieval of listings.

```typescript
const propertySchema = {
  type: "object",
  properties: {
    title: { type: "string" },
    price: { type: "number" },
    address: { type: "string" },
    bedrooms: { type: "number" },
    bathrooms: { type: "number" },
    sqft: { type: "number" },
    propertyType: { type: "string" },
    description: { type: "string" },
    features: { type: "array", items: { type: "string" } },
    images: { type: "array", items: { type: "string" } },
    agent: {
      type: "object",
      properties: {
        name: { type: "string" },
        phone: { type: "string" },
        email: { type: "string" },
      },
    },
  },
  required: ["title", "price", "address"],
};
const property = await client.scrape("https://realty.example.com/listing/123", {
  formats: ["markdown"],
  extract: { schema: propertySchema },
});
```

--------------------------------

### Initialize LLMCrawl Client and Scrape URL (JavaScript)

Source: https://docs.llmcrawl.dev/index

Initializes the LLMCrawl client with an API key and demonstrates how to scrape a single URL. The markdown content is accessed from the result's data property.

```typescript
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({
  apiKey: "your-api-key-here",
});
// Scrape a single page
const result = await client.scrape("https://example.com");
console.log(result.data?.markdown);
```

--------------------------------

### Scrape Options

Source: https://docs.llmcrawl.dev/sdk-javascript

Configuration options for the `scrape()` method, detailing available parameters for output formats, headers, timeouts, page loading delays, AI extraction schemas, and webhook configurations.

```APIDOC
Scrape Options:
  `formats`: `string[]` | Output formats: `'markdown'`, `'html'`, `'rawHtml'`, `'links'`, `'screenshot'`, `'screenshot@fullPage'`
  `headers`: `object` | Custom HTTP headers
  `includeTags`: `string[]` | HTML tags to include in output
  `excludeTags`: `string[]` | HTML tags to exclude from output
  `timeout`: `number` | Request timeout in milliseconds (1000-90000)
  `waitFor`: `number` | Delay before capturing content (0-60000ms)
  `extract`: `ExtractOptions` | AI extraction configuration
  `webhookUrls`: `string[]` | URLs to send results to
  `metadata`: `object` | Additional metadata to include
```

--------------------------------

### Get Crawl Job Status

Source: https://docs.llmcrawl.dev/api-reference

Retrieves the status and results of a specific web crawl job using its unique ID. Includes details on completion progress, data extracted, and expiration time.

```APIDOC
GET /v1/crawl/{id}

Get crawl job status.

**Authorizations:**
- bearerAuth (HTTP Bearer Token)

**Parameters:**
- **id** (string, required): The unique identifier of the crawl job.

**Responses:**
- **200 OK**: Successful response.
  Content-Type: application/json
  Schema:
  ```json
  {
    "success": true,
    "status": "string",
    "completed": integer,
    "total": integer,
    "expiresAt": string,
    "next": string,
    "data": [
      {
        "markdown": string,
        "extract": string,
        "html": string,
        "rawHtml": string,
        "links": [string],
        "screenshot": string,
        "metadata": {
          "additionalProperties": {}
        }
      }
    ]
  }
  ```
- **404 Not Found**: If the crawl job ID does not exist.
- **500 Internal Server Error**: If there was an error on the server.
```

```cURL
curl 'https://api.llmcrawl.dev/v1/crawl/{id}' \
  --header 'Authorization: Token'
```

```JavaScript
fetch('https://api.llmcrawl.dev/v1/crawl/{id}', {
  headers: {
    Authorization: 'Token'
  }
})
```

```PHP
$ch = curl_init("https://api.llmcrawl.dev/v1/crawl/{id}");
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Authorization: Token']);
curl_exec($ch);
curl_close($ch);
```

```Python
import requests

response = requests.get("https://api.llmcrawl.dev/v1/crawl/{id}",
    headers={
      "Authorization": "Token"
    }
)
print(response.json())
```

--------------------------------

### LLMCrawl Crawling API Operations

Source: https://docs.llmcrawl.dev/operations/postV1Crawl

This section details the API endpoints for managing website crawls. It includes starting a new crawl, checking the status of an ongoing crawl, and canceling a crawl.

```APIDOC
POST /v1/crawl
  Crawl a website.
  Authorizations:
    bearerAuth (Type: HTTP (bearer))
  Request Body (application/json):
    Schema: JSON
    Properties:
      includePaths: array (string)
      excludePaths: array (string)
      maxDepth: integer
      limit: integer
      allowBackwardLinks: boolean
      allowExternalLinks: boolean
      ignoreSitemap: boolean
      url: string (required)
      origin: string
      scrapeOptions: object
        formats: array (string)
        headers: object (additionalProperties: string)
        includeTags: array (string)
        excludeTags: array (string)
        waitFor: integer
        extract: object
          mode: string
          schema: object
            type: object
            properties: {
              title: { type: string },
              price: { type: number },
              description: { type: string }
            }
            required: [title, price]
          systemPrompt: string
          prompt: string
      webhookUrls: array (string)
      webhookMetadata: object (crawlId: string, userId: string)
  Responses:
    200: Successful response (Content-Type: application/json)
      Schema: JSON
      Body: {
        success: boolean,
        id: string,
        url: string
      }
    400, 403, 500: Error responses
```

```APIDOC
GET /v1/crawl/{id}
  Retrieve the status of a specific crawl.
  Parameters:
    id: string (path parameter, required) - The ID of the crawl to retrieve.
  Responses:
    200: Successful response (Content-Type: application/json)
      Schema: JSON
      Body: (Details for crawl status response would be here)
    404: Crawl not found.
```

```APIDOC
DELETE /v1/crawl/{id}/cancel
  Cancel an ongoing crawl.
  Parameters:
    id: string (path parameter, required) - The ID of the crawl to cancel.
  Responses:
    200: Successful response (Content-Type: application/json)
      Schema: JSON
      Body: {
        success: boolean,
        message: string
      }
    404: Crawl not found or already completed/canceled.
```

--------------------------------

### Data Processing with Pandas (Python)

Source: https://docs.llmcrawl.dev/sdk-python

Shows how to integrate LLMCrawl scraping with Pandas for efficient data processing. This example defines a helper function to scrape and extract data, then process it in a structured manner.

```python
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed

def scrape_and_extract(url: str, schema: dict) -> dict:
    """Scrape a URL and extract structured data"""
    result = client.scrape(url, extract={"schema": schema})
    if result["success"]:
        return {
            "url": url,
            "success": True,
            "data": json.loads(result["data"]["extract"])
        }
    return {"url": url, "success": False, "error": result.get("error")}

# Example usage with Pandas (assuming client is initialized)
# urls_to_scrape = ["url1", "url2", ...]
# extraction_schema = {...}
# 
# with ThreadPoolExecutor(max_workers=5) as executor:
#     futures = [executor.submit(scrape_and_extract, url, extraction_schema) for url in urls_to_scrape]
#     results = []
#     for future in as_completed(futures):
#         results.append(future.result())
# 
# df = pd.DataFrame(results)
# print(df)
```

--------------------------------

### LLMCrawl Python SDK Preview API

Source: https://docs.llmcrawl.dev/sdk-python

This snippet demonstrates the expected API design for the LLMCrawl Python SDK. It shows how to initialize the client, scrape a single page, and perform AI-powered data extraction using Pydantic models. The SDK supports both synchronous and asynchronous operations.

```python
from llmcrawl import LLMCrawl
client = LLMCrawl(api_key="your-api-key")
# Scrape a single page
result = await client.scrape("https://example.com")
print(result.data.markdown)
# AI-powered extraction with Pydantic
from pydantic import BaseModel
class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
result = await client.scrape(
    "https://store.example.com/product/123",
    extract_model=Product
)
product = result.data.extract  # Type: Product
```

--------------------------------

### Get Crawl Job Status (API)

Source: https://docs.llmcrawl.dev/tags/Crawling

Retrieves the status of a specific crawl job using its ID. This endpoint returns information about the crawl's progress, completion status, and any extracted data.

```APIDOC
GET /v1/crawl/{id}

Get crawl job status

Authorizations:
  bearerAuth
  Type: HTTP (bearer)

Parameters:
  Path Parameters:
    id*:
      Type: string
      Required

Responses:
  200:
    Successful response
    Content-Type: application/json
    Schema: JSON
    {
      "success": true,
      "status": "string",
      "completed": 0,
      "total": 0,
      "expiresAt": "string",
      "next": "string",
      "data": [
        {
          "markdown": "string",
          "extract": "string",
          "html": "string",
          "rawHtml": "string",
          "links": [
            "string"
          ],
          "screenshot": "string",
          "metadata": {
            "additionalProperties": {}
          }
        }
      ]
    }

Playground:
  Authorization: bearerAuth
  Variables:
    Key: id*
    Value: 
  Try it out

Samples:
  cURL:
    curl 'https://api.llmcrawl.dev/v1/crawl/{id}' \
      --header 'Authorization: Token'
  JavaScript:
    fetch('https://api.llmcrawl.dev/v1/crawl/{id}', {
      headers: {
        Authorization: 'Token'
      }
    })
  PHP:
    $ch = curl_init("https://api.llmcrawl.dev/v1/crawl/{id}");
    curl_setopt($ch, CURLOPT_HTTPHEADER, ['Authorization: Token']);
    curl_exec($ch);
    curl_close($ch);
  Python:
    requests.get("https://api.llmcrawl.dev/v1/crawl/{id}",
        headers={
          "Authorization": "Token"
        }
    )
```

--------------------------------

### LLMCrawl Client Initialization: v0.x vs v1.0.0

Source: https://docs.llmcrawl.dev/sdk-javascript

Illustrates the difference in initializing the LLMCrawl client between version 0.x and version 1.0.0 of the SDK. The migration involves changing the import statement and the way the API key is passed during client instantiation.

```javascript
// Before (v0.x):
import LLMCrawl from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl("your-api-key");
```

```javascript
// After (v1.0.0):
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({
  apiKey: "your-api-key",
});
```

--------------------------------

### Node.js Scraping and File Saving

Source: https://docs.llmcrawl.dev/sdks

An example for Node.js applications demonstrating how to scrape a website using the LLMCrawl SDK and save the extracted markdown content to a local file. It includes error handling for the scrape operation.

```typescript
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
import fs from "fs/promises";

const client = new LLMCrawl({ apiKey: process.env.LLMCRAWL_API_KEY });
// Scrape and save to file
const result = await client.scrape("https://docs.example.com");
if (result.success) {
  await fs.writeFile("content.md", result.data.markdown);
}
```

--------------------------------

### News Article Extraction

Source: https://docs.llmcrawl.dev/examples

Extracts structured data from news articles using a predefined schema. It specifies properties like headline, author, content, and tags, and demonstrates how to parse the extracted JSON data.

```typescript
const articleSchema = {
  type: "object",
  properties: {
    headline: { type: "string" },
    subheadline: { type: "string" },
    author: { type: "string" },
    publishDate: { type: "string" },
    content: { type: "string" },
    tags: { type: "array", items: { type: "string" } },
    category: { type: "string" },
    readTime: { type: "number" },
    relatedArticles: {
      type: "array",
      items: {
        type: "object",
        properties: {
          title: { type: "string" },
          url: { type: "string" },
        },
      },
    },
  },
  required: ["headline", "content"],
};
const newsResult = await client.scrape("https://news.example.com/article/123", {
  formats: ["markdown"],
  extract: { schema: articleSchema },
});
if (newsResult.success) {
  const article = JSON.parse(newsResult.data.extract);
  console.log(`Article: ${article.headline}`);
  console.log(`Author: ${article.author}`);
  console.log(`Published: ${article.publishDate}`);
}
```

--------------------------------

### Node.js Environment Usage

Source: https://docs.llmcrawl.dev/sdk-javascript

Shows how to initialize and use the LLMCrawl SDK within a Node.js environment. It includes importing the client, setting the API key from environment variables, and writing scraped content to a file.

```typescript
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
import fs from "fs/promises";

const client = new LLMCrawl({
  apiKey: process.env.LLMCRAWL_API_KEY,
});

const result = await client.scrape("https://example.com");
if (result.success) {
  await fs.writeFile("output.md", result.data.markdown);
}
```