### Plugin Registration Example

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

An example of how to register a new plugin with LibreCrawl, including essential configuration like ID, name, and tab settings.

```APIDOC
## POST /api/plugins/register

### Description
Registers a new custom plugin with LibreCrawl.

### Method
POST

### Endpoint
/api/plugins/register

### Request Body
- **id** (string) - Required - Unique ID for the plugin.
- **name** (string) - Required - Display name of the plugin.
- **tab** (object) - Required - Configuration for the plugin's tab in the UI.
  - **label** (string) - Required - Text for the tab button.
  - **icon** (string) - Optional - Emoji or icon for the tab.
  - **position** (number) - Optional - Position of the tab.

### Request Example
```json
{
  "id": "my-plugin",
  "name": "My Plugin",
  "tab": {
    "label": "My Tab",
    "icon": "🔥"
  }
}
```

### Response
#### Success Response (200)
- **message** (string) - Confirmation message.

#### Response Example
```json
{
  "message": "Plugin registered successfully."
}
```
```

--------------------------------

### Install Python Dependencies

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Commands to install required Python packages and Playwright browser binaries.

```bash
pip install -r requirements.txt
```

```bash
playwright install chromium
```

--------------------------------

### Deploy with Docker

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Commands to clone the repository and start the application using Docker Compose.

```bash
# Clone the repository
git clone https://github.com/PhialsBasement/LibreCrawl.git
cd LibreCrawl

# Copy environment file
cp .env.example .env

# Start LibreCrawl
docker-compose up -d

# Open browser to http://localhost:5000
```

--------------------------------

### Run Application via Python

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Commands to start the application in standard or local mode.

```bash
# Standard mode (with authentication and tier system)
python main.py

# Local mode (all users get admin tier, no rate limits)
python main.py --local
# or
python main.py -l
```

--------------------------------

### Run Automatic Startup Scripts

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Use these scripts to automatically check for dependencies, install requirements, and launch the application.

```batch
start-librecrawl.bat
```

```bash
chmod +x start-librecrawl.sh
./start-librecrawl.sh
```

--------------------------------

### Start a new crawl

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Initiates a crawl for a specified URL. Requires a JSON payload containing the target URL.

```bash
# Start a crawl
curl -X POST http://localhost:5000/api/start_crawl \
  -H "Content-Type: application/json" \
  -H "Cookie: session=your_session_cookie" \
  -d '{"url": "https://example.com"}'

# Response
{
  "success": true,
  "message": "Crawl started successfully",
  "crawl_id": 123
}
```

--------------------------------

### GET /api/get_settings

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Retrieves the current crawler configuration settings.

```APIDOC
## GET /api/get_settings

### Description
Fetches the current configuration settings for the crawler.

### Method
GET

### Endpoint
http://localhost:5000/api/get_settings

### Response
#### Success Response (200)
- **success** (boolean) - Status of the request
- **settings** (object) - The current crawler configuration

#### Response Example
{
  "success": true,
  "settings": {
    "maxDepth": 3,
    "maxUrls": 5000000,
    "crawlDelay": 1,
    "enableJavaScript": false,
    "userAgent": "LibreCrawl/1.0 (Web Crawler)",
    "respectRobotsTxt": true,
    "discoverSitemaps": true
  }
}
```

--------------------------------

### Configure Crawler Settings

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Initialize the SettingsManager for a specific user and tier to get and update crawler settings. Settings are filtered by tier permissions. Obtain crawler-ready configuration and reset settings to defaults.

```python
from src.settings_manager import SettingsManager

settings_manager = SettingsManager(
    session_id='unique-session',
    user_id=1,
    tier='admin'  # guest, user, extra, admin
)

settings = settings_manager.get_settings()
print(f"Max Depth: {settings['maxDepth']}")
print(f"Max URLs: {settings['maxUrls']}")
print(f"JavaScript Enabled: {settings['enableJavaScript']}")

success, message = settings_manager.save_settings({
    'maxDepth': 5,
    'maxUrls': 10000,
    'crawlDelay': 0.5,
    'enableJavaScript': True,
    'jsWaitTime': 3,
    'jsTimeout': 30,
    'userAgent': 'MyCustomBot/1.0',
    'includePatterns': '/blog/*\n/products/*',
    'excludePatterns': '/admin/*',
    'issueExclusionPatterns': '/login*\n/checkout/*'
})

crawler_config = settings_manager.get_crawler_config()

settings_manager.reset_settings()
```

--------------------------------

### GET /api/user/info

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Retrieves information about the currently authenticated user.

```APIDOC
## GET /api/user/info

### Description
Returns details about the current user, including crawl limits.

### Method
GET

### Endpoint
http://localhost:5000/api/user/info

### Response
#### Success Response (200)
- **success** (boolean) - Status of the request
- **user** (object) - User details including id, username, tier, and crawl stats

#### Response Example
{
  "success": true,
  "user": {
    "id": 1,
    "username": "newuser",
    "tier": "user",
    "crawls_today": 2,
    "crawls_remaining": -1
  }
}
```

--------------------------------

### GET /api/visualization_data

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Retrieves graph data for site structure visualization.

```APIDOC
## GET /api/visualization_data

### Description
Fetches nodes and edges representing the crawled site structure.

### Method
GET

### Endpoint
http://localhost:5000/api/visualization_data

### Response
#### Success Response (200)
- **success** (boolean) - Status of the request
- **nodes** (array) - List of site nodes
- **edges** (array) - List of connections between nodes
- **total_pages** (integer) - Total pages found

#### Response Example
{
  "success": true,
  "nodes": [{"data": {"id": "node-0", "url": "https://example.com", "status_code": 200, "title": "Home"}}],
  "edges": [{"data": {"id": "edge-0-1", "source": "node-0", "target": "node-1"}}],
  "total_pages": 500
}
```

--------------------------------

### GET /links

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Retrieves all discovered links from the link manager, including source, target, anchor text, internal status, and placement.

```APIDOC
## GET /links

### Description
Retrieves all discovered links from the link manager.

### Method
GET

### Response
#### Success Response (200)
- **source_url** (string) - The URL where the link was found
- **target_url** (string) - The destination URL
- **anchor_text** (string) - The text content of the link
- **is_internal** (boolean) - Whether the link is internal to the domain
- **placement** (string) - The location of the link (e.g., navigation, body, footer)
```

--------------------------------

### Calculate Crawl Duration

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/templates/dashboard.html

Calculates the duration of a crawl given its start and end times. Formats the output into hours, minutes, and seconds.

```javascript
function calculateDuration(start, end) {
  const startTime = new Date(start);
  const endTime = new Date(end);
  const diff = endTime - startTime;
  const hours = Math.floor(diff / 3600000);
  const minutes = Math.floor((diff % 3600000) / 60000);
  const seconds = Math.floor((diff % 60000) / 1000);
  if (hours > 0) return `${hours}h ${minutes}m`;
  if (minutes > 0) return `${minutes}m ${seconds}s`;
  return `${seconds}s`;
}
```

--------------------------------

### Get crawl status

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Retrieves real-time progress and statistics. Supports incremental updates using query parameters to filter by index.

```bash
# Get full status
curl http://localhost:5000/api/crawl_status \
  -H "Cookie: session=your_session_cookie"

# Incremental update (fetch only new data since index)
curl "http://localhost:5000/api/crawl_status?url_since=50&link_since=100&issue_since=10" \
  -H "Cookie: session=your_session_cookie"

# Response
{
  "status": "running",
  "stats": {
    "discovered": 150,
    "crawled": 75,
    "depth": 3,
    "speed": 2.5
  },
  "urls": [...],
  "links": [...],
  "issues": [...],
  "progress": 50.0,
  "is_running_pagespeed": false,
  "memory": {"current_mb": 256, "peak_mb": 300}
}
```

--------------------------------

### Configure Application Running Modes

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Execute main.py with various flags to toggle authentication, registration, guest access, and demo constraints.

```bash
# Standard mode (with authentication)
python main.py

# Local mode (no auth, all users get admin tier)
python main.py --local
# or
python main.py -l

# Disable new registrations
python main.py --disable-register
# or
python main.py -dr

# Disable guest access
python main.py --disable-guest
# or
python main.py -dg

# Demo mode (1.5GB memory limit per user)
python main.py --demo
# or
python main.py -dm

# Combined flags
python main.py --local --disable-guest --demo
```

--------------------------------

### Deploy via Docker

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Prepare the environment configuration file before launching the containerized service.

```bash
# Docker deployment
cp .env.example .env

# Edit .env for production
# LOCAL_MODE=false
# HOST_BINDING=0.0.0.0
# REGISTRATION_DISABLED=false

docker-compose up -d
```

--------------------------------

### Configure Environment Variables

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Settings for the .env file to toggle between local mode and production deployment.

```bash
# .env file
LOCAL_MODE=true
HOST_BINDING=127.0.0.1
REGISTRATION_DISABLED=false
```

```bash
# .env file
LOCAL_MODE=false
HOST_BINDING=0.0.0.0
REGISTRATION_DISABLED=false
```

--------------------------------

### Manage User Authentication via API

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Perform user registration, login, and account information retrieval.

```bash
curl -X POST http://localhost:5000/api/register \
  -H "Content-Type: application/json" \
  -d '{
    "username": "newuser",
    "email": "user@example.com",
    "password": "securepassword123"
  }'
```

```bash
curl -X POST http://localhost:5000/api/login \
  -H "Content-Type: application/json" \
  -d '{
    "username": "newuser",
    "password": "securepassword123"
  }'
```

```bash
curl -X POST http://localhost:5000/api/guest-login
```

```bash
curl http://localhost:5000/api/user/info \
  -H "Cookie: session=your_session_cookie"
```

--------------------------------

### GET /api/crawl_status

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Returns real-time crawl progress including discovered URLs, links, issues, and statistics.

```APIDOC
## GET /api/crawl_status

### Description
Returns real-time crawl progress including discovered URLs, links, issues, and statistics. Supports incremental polling to fetch only new data.

### Method
GET

### Endpoint
/api/crawl_status

### Parameters
#### Query Parameters
- **url_since** (integer) - Optional - Fetch URLs since this index
- **link_since** (integer) - Optional - Fetch links since this index
- **issue_since** (integer) - Optional - Fetch issues since this index

### Response
#### Success Response (200)
- **status** (string) - Current status of the crawl
- **stats** (object) - Crawl statistics
- **progress** (float) - Percentage completion

#### Response Example
{
  "status": "running",
  "stats": {
    "discovered": 150,
    "crawled": 75,
    "depth": 3,
    "speed": 2.5
  },
  "progress": 50.0
}
```

--------------------------------

### Initial Load and Interval Refresh

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/templates/dashboard.html

Initializes the application by loading statistics and the crawl list on page load. Sets up an interval to periodically refresh both stats and the crawl list every 30 seconds.

```javascript
loadStats();
loadCrawls();
setInterval(() => {
  loadStats();
  loadCrawls();
}, 30000);
```

--------------------------------

### POST /api/start_crawl

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Initiates a new crawl from a given URL, respecting robots.txt and discovering sitemaps.

```APIDOC
## POST /api/start_crawl

### Description
Initiates a new crawl from a given URL. The crawler automatically discovers sitemaps, respects robots.txt, and extracts comprehensive SEO data from each page.

### Method
POST

### Endpoint
/api/start_crawl

### Request Body
- **url** (string) - Required - The URL to start the crawl from.

### Request Example
{
  "url": "https://example.com"
}

### Response
#### Success Response (200)
- **success** (boolean) - Status of the request
- **message** (string) - Confirmation message
- **crawl_id** (integer) - Unique identifier for the crawl

#### Response Example
{
  "success": true,
  "message": "Crawl started successfully",
  "crawl_id": 123
}
```

--------------------------------

### Styling Guidelines

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Information on using CSS classes provided by LibreCrawl for consistent UI styling within plugins.

```APIDOC
## Plugin Styling

### Description
LibreCrawl provides a set of CSS classes to ensure your plugin's UI matches the application's design. Apply these classes to your HTML elements.

### Available CSS Classes
- **`.plugin-content`**: The main container for your plugin's UI. Apply padding and scrolling styles here.
- **`.plugin-header`**: Use for header sections within your plugin's content.
- **`.data-table`**: Automatically styles HTML tables to match LibreCrawl's appearance.
- **`.stat-card`**: Styles elements used for displaying statistics.
- **`.score-good`**, **`.score-needs-improvement`**, **`.score-poor`**: Classes for indicating different levels of quality or status.

### Scrolling Implementation
To ensure proper scrolling within the tab pane, wrap your plugin's content in a div with the following styles:

```html
<div class="plugin-content" style="padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);">
  <!-- Your plugin's HTML content here -->
</div>
```

The `max-height` calculation ensures the content area respects the overall UI layout and provides a scrollable experience when content exceeds the available space.
```

--------------------------------

### POST /crawls

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Creates a new crawl record in the database with initial configuration.

```APIDOC
## POST /crawls

### Description
Creates a new crawl record in the database with initial configuration.

### Method
POST

### Request Body
- **user_id** (integer) - Required - ID of the user
- **session_id** (string) - Required - Unique session identifier
- **base_url** (string) - Required - Starting URL
- **base_domain** (string) - Required - Base domain
- **config_snapshot** (object) - Required - Snapshot of the configuration

### Response
#### Success Response (200)
- **crawl_id** (integer) - The ID of the created crawl
```

--------------------------------

### Extract SEO Data from HTML

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Use SEOExtractor to parse HTML and extract various SEO-related data points. Ensure BeautifulSoup is installed and the HTML content is properly parsed.

```python
from bs4 import BeautifulSoup
from src.core.seo_extractor import SEOExtractor

html = """
<html lang="en">
<head>
    <title>Example Page Title</title>
    <meta name="description" content="This is a meta description">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="canonical" href="https://example.com/page">
    <meta property="og:title" content="OG Title">
    <script type="application/ld+json">{"@type": "WebPage"}</script>
</head>
<body>
    <h1>Main Heading</h1>
    <h2>Subheading 1</h2>
    <img src="/image.jpg" alt="Alt text">
</body>
</html>
"""

soup = BeautifulSoup(html, 'html.parser')
extractor = SEOExtractor()

# Initialize result structure
result = {
    'url': 'https://example.com/page',
    'meta_tags': {}, 'og_tags': {}, 'twitter_tags': {},
    'json_ld': [], 'images': [], 'hreflang': [], 'schema_org': [],
    'analytics': {'google_analytics': False, 'gtag': False, 'ga4_id': '', 'gtm_id': ''},
    'internal_links': 0, 'external_links': 0
}

# Extract all SEO data
extractor.extract_basic_seo_data(soup, result)
extractor.extract_meta_tags(soup, result)
extractor.extract_opengraph_tags(soup, result)
extractor.extract_twitter_tags(soup, result)
extractor.extract_json_ld(soup, result)
extractor.extract_images(soup, result['url'], result)
extractor.extract_link_counts(soup, result, 'example.com')

print(f"Title: {result['title']}")  # "Example Page Title"
print(f"H1: {result['h1']}")  # "Main Heading"
print(f"Canonical: {result['canonical_url']}")  # "https://example.com/page"
print(f"Language: {result['lang']}")  # "en"

```

--------------------------------

### Guest Login Functionality with JavaScript

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/templates/login.html

Handles the guest login button click event. It disables the button, shows a loading state, and makes a POST request to the /api/guest-login endpoint. It provides user feedback via alerts and redirects on successful guest entry.

```javascript
// Guest login
const guestBtn = document.getElementById('guestBtn');
if (guestBtn) guestBtn.addEventListener('click', async () => {
  hideAlert();
  guestBtn.disabled = true;
  guestBtn.textContent = 'Entering as guest...';

  try {
    const response = await fetch('/api/guest-login', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      }
    });
    const data = await response.json();

    if (data.success) {
      showAlert('Entering as guest...', 'success');
      setTimeout(() => {
        window.location.href = '/';
      }, 500);
    } else {
      showAlert(data.message || 'Failed to enter as guest', 'error');
      guestBtn.disabled = false;
      guestBtn.textContent = 'Continue as Guest (3 crawls/24h)';
    }
  } catch (error) {
    showAlert('An error occurred. Please try again.', 'error');
    guestBtn.disabled = false;
    guestBtn.textContent = 'Continue as Guest (3 crawls/24h)';
  }
});
```

--------------------------------

### Retrieve Visualization Data via API

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Fetch graph data representing the site structure for visualization purposes.

```bash
curl http://localhost:5000/api/visualization_data \
  -H "Cookie: session=your_session_cookie"
```

--------------------------------

### Configuration Settings

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Details on the various settings that can be configured within LibreCrawl's UI.

```APIDOC
## LibreCrawl Configuration Settings

### Description
LibreCrawl allows extensive customization through its Settings interface, covering crawler behavior, request handling, rendering, filtering, and more.

### Configurable Areas
- **Crawler Settings**: Control crawl depth (up to 5 million URLs), delays between requests, and handling of external links.
- **Request Settings**: Configure the User-Agent string, request timeouts, proxy settings, and adherence to `robots.txt`.
- **JavaScript Rendering**: Adjust settings for the browser engine, wait times after page load, and viewport size for rendering dynamic content.
- **Filters**: Define patterns for URL inclusion or exclusion, and specify file types to ignore during crawling.
- **Export Options**: Select the desired formats (CSV, JSON, XML) and specify which data fields to include in exports.
- **Custom CSS**: Personalize the LibreCrawl user interface by applying custom CSS rules.
- **Issue Exclusion**: Set up patterns to exclude specific SEO issues from being reported.

### PageSpeed Analysis
To enhance rate limits for PageSpeed analysis (from limited to 25,000 requests per day), add a Google API key in the `Settings > Requests` section.
```

--------------------------------

### POST /api/login

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Authenticates a user and returns a session.

```APIDOC
## POST /api/login

### Description
Authenticates a user with username and password.

### Method
POST

### Endpoint
http://localhost:5000/api/login

### Request Body
- **username** (string) - Required - Username
- **password** (string) - Required - Password

### Request Example
{
  "username": "newuser",
  "password": "securepassword123"
}
```

--------------------------------

### POST /api/register

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Registers a new user account.

```APIDOC
## POST /api/register

### Description
Creates a new user account.

### Method
POST

### Endpoint
http://localhost:5000/api/register

### Request Body
- **username** (string) - Required - Unique username
- **email** (string) - Required - User email address
- **password** (string) - Required - User password

### Request Example
{
  "username": "newuser",
  "email": "user@example.com",
  "password": "securepassword123"
}
```

--------------------------------

### Running Modes

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Explanation of the different modes LibreCrawl can run in: Standard and Local.

```APIDOC
## LibreCrawl Running Modes

### Description
LibreCrawl offers two primary running modes, each with different access control and limitations, suitable for various use cases.

### Standard Mode
- **Default mode**.
- Features a full authentication system (login/register).
- Implements tier-based access control (Guest, User, Extra, Admin).
- **Guest users** are limited to 3 crawls per 24 hours, enforced via IP.
- Suitable for public demonstrations or shared environments.

### Local Mode (`--local` or `-l` flag)
- **Activated by** running LibreCrawl with the `--local` or `-l` command-line flag.
- **All users** are automatically granted admin-tier access.
- **No rate limits** or tier-based restrictions are enforced.
- Ideal for personal use, local development, and self-hosted instances where strict access control is not required.
- **Recommended** for testing and development purposes.
```

--------------------------------

### Lifecycle Hooks and Utilities

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Overview of the available lifecycle hooks and utility functions for plugin development.

```APIDOC
## Plugin Lifecycle Hooks and Utilities

### Description
Details the functions that LibreCrawl calls at various points in its operation, and the utility functions available to plugins.

### Lifecycle Hooks
- **`onLoad()`**: Called once when the plugin is initially loaded by LibreCrawl.
- **`onTabActivate(container, data)`**: Triggered when the user navigates to the plugin's tab. Receives the tab's DOM container and crawl data.
- **`onTabDeactivate()`**: Called when the user switches away from the plugin's tab.
- **`onDataUpdate(data)`**: Executed during live crawls whenever new crawl data becomes available. Useful for real-time UI updates.
- **`onCrawlComplete(data)`**: Called after a crawl has finished, providing the final crawl data.

### Utilities
Access built-in helper functions through `this.utils`:
- **`this.utils.showNotification(message, type)`**: Displays a notification to the user. `type` can be 'success', 'error', or 'info'.
- **`this.utils.formatUrl(url)`**: Formats a given URL according to LibreCrawl's standards.
- **`this.utils.escapeHtml(text)`**: Escapes HTML special characters in a string to prevent XSS attacks.
```

--------------------------------

### Plugin Configuration Reference

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Details on the configuration object required for registering a LibreCrawl plugin.

```APIDOC
## Plugin Configuration Object

### Description
Defines the structure and properties for configuring a LibreCrawl plugin.

### Fields
- **id** (string) - Required - Unique identifier for the plugin. Used for internal referencing and tab identification.
- **name** (string) - Required - The display name of the plugin, shown in the UI.
- **version** (string) - Optional - The version number of the plugin.
- **author** (string) - Optional - The name of the plugin's author.
- **description** (string) - Optional - A brief description of the plugin's functionality.
- **tab** (object) - Required - Configuration for how the plugin appears as a tab.
  - **label** (string) - Required - The text displayed on the tab button.
  - **icon** (string) - Optional - An emoji or icon to display on the tab button.
  - **position** (number) - Optional - Specifies the order of the tab. Defaults to appending to the end.
```

--------------------------------

### Manage Crawler Settings via API

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Retrieve or update crawler configuration settings using the get_settings and save_settings endpoints.

```bash
curl http://localhost:5000/api/get_settings \
  -H "Cookie: session=your_session_cookie"
```

```bash
curl -X POST http://localhost:5000/api/save_settings \
  -H "Content-Type: application/json" \
  -H "Cookie: session=your_session_cookie" \
  -d '{
    "maxDepth": 5,
    "maxUrls": 10000,
    "crawlDelay": 0.5,
    "enableJavaScript": true,
    "jsWaitTime": 3,
    "includePatterns": "/blog/*\n/products/*",
    "excludePatterns": "/admin/*\n/cart/*"
  }'
```

--------------------------------

### Multi-tenancy and Session Management

Source: https://github.com/phialsbasement/librecrawl/blob/main/README.md

Explanation of how LibreCrawl handles multiple users and manages their sessions.

```APIDOC
## LibreCrawl Multi-tenancy and Sessions

### Description
LibreCrawl is designed to support multiple concurrent users, providing isolated environments and managing session data effectively.

### Key Features
- **Isolated Instances**: Each browser session operates with its own independent crawler instance and crawl data.
- **Persistent Settings**: User-specific settings are stored in the browser's `localStorage`, ensuring persistence across restarts.
- **Per-Browser Themes**: Custom CSS themes applied by users are specific to their browser.
- **Session Expiration**: User sessions automatically expire after 1 hour of inactivity.
- **Data Isolation**: Crawl data is kept separate between different users/sessions.
```

--------------------------------

### Register a Custom Analysis Plugin

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Use LibreCrawlPlugin.register to define a new UI tab with custom analysis logic. Ensure the file is saved in web/static/plugins/.

```javascript
// Save as web/static/plugins/my-plugin.js

LibreCrawlPlugin.register({
    // Required configuration
    id: 'word-count-analyzer',
    name: 'Word Count Analyzer',

    tab: {
        label: 'Word Count',
        icon: '📊',
        position: 'end'
    },

    // Optional metadata
    version: '1.0.0',
    author: 'Your Name',

    // Called when plugin loads
    onLoad() {
        console.log('Word Count Analyzer loaded');
        this.thresholds = { thin: 300, good: 1000 };
    },

    // Called when tab is activated
    onTabActivate(container, data) {
        const analysis = this.analyzeWordCounts(data.urls);

        container.innerHTML = `
            <div class="plugin-content" style="padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);">
                <h2>Word Count Analysis</h2>
                <div class="stat-card">
                    <p>Total Pages: ${data.urls.length}</p>
                    <p>Thin Content (< 300 words): ${analysis.thin}</p>
                    <p>Good Content (300-1000 words): ${analysis.moderate}</p>
                    <p>Rich Content (> 1000 words): ${analysis.rich}</p>
                    <p>Average Words: ${analysis.average.toFixed(0)}</p>
                </div>
                <h3>Pages by Word Count</h3>
                <table class="data-table">
                    <thead><tr><th>URL</th><th>Words</th><th>Status</th></tr></thead>
                    <tbody>
                        ${analysis.pages.map(p => `
                            <tr>
                                <td>${this.utils.escapeHtml(p.url)}</td>
                                <td>${p.word_count}</td>
                                <td class="${p.status}">${p.statusLabel}</td>
                            </tr>
                        `).join('')}
                    </tbody>
                </table>
            </div>
        `;
    },

    // Called during live crawls
    onDataUpdate(data) {
        if (this.isActive && this.container) {
            this.onTabActivate(this.container, data);
        }
    },

    // Custom analysis method
    analyzeWordCounts(urls) {
        let thin = 0, moderate = 0, rich = 0, total = 0;
        const pages = [];

        urls.forEach(url => {
            const wc = url.word_count || 0;
            total += wc;

            let status, statusLabel;
            if (wc < this.thresholds.thin) {
                thin++; status = 'score-poor'; statusLabel = 'Thin';
            } else if (wc < this.thresholds.good) {
                moderate++; status = 'score-needs-improvement'; statusLabel = 'Moderate';
            } else {
                rich++; status = 'score-good'; statusLabel = 'Rich';
            }

            pages.push({ url: url.url, word_count: wc, status, statusLabel });
        });

        pages.sort((a, b) => a.word_count - b.word_count);

        return {
            thin, moderate, rich,
            average: urls.length > 0 ? total / urls.length : 0,
            pages: pages.slice(0, 50)
        };
    }
});
```

--------------------------------

### Define Plugin Configuration Object

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/static/plugins/README.md

The configuration object passed to the register method defines metadata and tab appearance.

```javascript
{
  id: string,              // Unique identifier
  name: string,            // Display name
  version: string,         // Optional version
  author: string,          // Optional author
  description: string,     // Optional description

  tab: {
    label: string,         // Tab button text
    icon: string,          // Optional emoji/icon
    position: number       // Optional position (default: append to end)
  }
}
```

--------------------------------

### Login Form Handling with JavaScript

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/templates/login.html

Handles user login submission by preventing default form behavior, showing/hiding alerts, and making an asynchronous POST request to the /api/login endpoint. It updates the UI based on the API response and redirects on success.

```javascript
const form = document.getElementById('loginForm');
const loginBtn = document.getElementById('loginBtn');
const alert = document.getElementById('alert');

function showAlert(message, type) {
  alert.textContent = message;
  alert.className = `alert alert-${type} show`;
}

function hideAlert() {
  alert.className = 'alert';
}

form.addEventListener('submit', async (e) => {
  e.preventDefault();
  hideAlert();
  const username = document.getElementById('username').value;
  const password = document.getElementById('password').value;
  loginBtn.disabled = true;
  loginBtn.textContent = 'Logging in...';

  try {
    const response = await fetch('/api/login', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ username, password })
    });
    const data = await response.json();

    if (data.success) {
      showAlert(data.message, 'success');
      setTimeout(() => {
        window.location.href = '/';
      }, 500);
    } else {
      showAlert(data.message, 'error');
      loginBtn.disabled = false;
      loginBtn.textContent = 'Login';
    }
  } catch (error) {
    showAlert('An error occurred. Please try again.', 'error');
    loginBtn.disabled = false;
    loginBtn.textContent = 'Login';
  }
});
```

--------------------------------

### Manage Link Discovery and Tracking

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

The LinkManager class helps in discovering and tracking URLs. Initialize it with the base domain. Use `collect_all_links` to gather links from HTML content, providing the parsed HTML, source URL, and a list to store results.

```python
from bs4 import BeautifulSoup
from src.core.link_manager import LinkManager

# Initialize with base domain
link_manager = LinkManager('example.com')

html = """
<html>
<body>
    <nav><a href="/products">Products</a></nav>
    <a href="/about">About Us</a>
    <a href="https://external.com">External Link</a>
    <footer><a href="/contact">Contact</a></footer>
</body>
</html>
"""

soup = BeautifulSoup(html, 'html.parser')
source_url = 'https://example.com'
crawl_results = []

# Collect all links for the Links tab
link_manager.collect_all_links(soup, source_url, crawl_results)

```

--------------------------------

### Initialize and Control WebCrawler

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Configure and manage the lifecycle of a crawl operation using the WebCrawler class.

```python
from src.crawler import WebCrawler

# Create crawler instance
crawler = WebCrawler()

# Configure crawler
crawler.update_config({
    'max_depth': 3,
    'max_urls': 1000,
    'delay': 0.5,
    'user_agent': 'MyBot/1.0',
    'timeout': 10,
    'enable_javascript': False,
    'respect_robots': True,
    'crawl_external': False,
    'include_extensions': ['html', 'htm', 'php'],
    'exclude_patterns': ['/admin/*', '/cart/*']
})

# Start crawl (runs in background thread)
success, message = crawler.start_crawl(
    'https://example.com',
    user_id=1,
    session_id='unique-session-id'
)
print(f"Started: {success}, {message}")

# Get status during crawl
status = crawler.get_status()
print(f"Crawled: {status['stats']['crawled']}/{status['stats']['discovered']}")
print(f"Issues found: {len(status['issues'])}")

# Stop crawl
crawler.stop_crawl()
```

--------------------------------

### Plugin Lifecycle Hooks

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/static/plugins/README.md

Overview of the lifecycle methods available for plugin management during the crawl process.

```APIDOC
## Lifecycle Hooks

### Description
Methods that allow plugins to respond to application events and crawl states.

### Hooks
- **onLoad()** - Called when the plugin is initially loaded.
- **onTabActivate(container, data)** - Called when the user switches to the plugin tab.
- **onTabDeactivate()** - Called when the user switches away from the plugin tab.
- **onDataUpdate(data)** - Called during live crawls when new data is available.
- **onCrawlComplete(data)** - Called when the crawl process finishes.
```

--------------------------------

### Create and Manage Crawls

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Utilize functions for creating new crawl records, saving batches of crawled data (URLs, links, issues), updating crawl statistics, and setting crawl status. Load existing crawl data and delete crawls.

```python
from src.crawl_db import (
    create_crawl, set_crawl_status, update_crawl_stats,
    save_url_batch, save_links_batch, save_issues_batch,
    get_crawl_by_id, get_user_crawls, load_crawled_urls,
    load_crawl_links, load_crawl_issues, delete_crawl
)

crawl_id = create_crawl(
    user_id=1,
    session_id='unique-session',
    base_url='https://example.com',
    base_domain='example.com',
    config_snapshot={'max_depth': 3, 'max_urls': 1000}
)

urls = [
    {'url': 'https://example.com', 'status_code': 200, 'title': 'Home', ...},
    {'url': 'https://example.com/about', 'status_code': 200, 'title': 'About', ...}
]
save_url_batch(crawl_id, urls)

links = [
    {'source_url': 'https://example.com', 'target_url': 'https://example.com/about',
     'anchor_text': 'About Us', 'is_internal': True}
]
save_links_batch(crawl_id, links)

issues = [
    {'url': 'https://example.com/about', 'type': 'warning',
     'category': 'SEO', 'issue': 'Title Too Short', 'details': '...'}
]
save_issues_batch(crawl_id, issues)

update_crawl_stats(crawl_id, discovered=100, crawled=50, max_depth=3)

set_crawl_status(crawl_id, 'completed')  # running, paused, completed, failed, stopped

crawl = get_crawl_by_id(crawl_id)
urls = load_crawled_urls(crawl_id)
links = load_crawl_links(crawl_id)
issues = load_crawl_issues(crawl_id)

user_crawls = get_user_crawls(user_id=1, limit=50, status_filter='completed')

delete_crawl(crawl_id)
```

--------------------------------

### Load Crawl List and Render Table

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/templates/dashboard.html

Fetches the list of crawls from the API and dynamically generates table rows for display. Handles cases with no crawls or errors during fetching. Includes buttons for actions like resume, load, and delete.

```javascript
async function loadCrawls() {
  try {
    const response = await fetch('/api/crawls/list');
    const data = await response.json();
    const tbody = document.getElementById('crawls-tbody');
    if (data.success && data.crawls.length > 0) {
      tbody.innerHTML = data.crawls.map(crawl => {
        const started = new Date(crawl.started_at).toLocaleString();
        const duration = crawl.completed_at ? calculateDuration(crawl.started_at, crawl.completed_at) : 'In progress';
        return `
          <tr>
            <td>${started}</td>
            <td><a href="${crawl.base_url}" target="_blank" class="url-link">${crawl.base_url}</a></td>
            <td><span class="status-badge status-${crawl.status}">${crawl.status}</span></td>
            <td>${crawl.urls_crawled || 0}</td>
            <td>${duration}</td>
            <td class="actions">
              ${(crawl.status === 'paused' || crawl.status === 'failed') && crawl.urls_crawled > 0 ? `<button class="btn btn-primary btn-small" onclick="resumeCrawl(${crawl.id})">Resume</button>` : ''}
              ${crawl.urls_crawled > 0 ? `<button class="btn btn-secondary btn-small" onclick="viewCrawl(${crawl.id})">Load</button>` : ''}
              <button class="btn btn-danger btn-small" onclick="deleteCrawl(${crawl.id})">Delete</button>
            </td>
          </tr>
        `;
      }).join('');
    } else {
      tbody.innerHTML = `
        <tr>
          <td colspan="6" class="empty-state">
            <h3>No crawls found</h3>
            <p>Start a new crawl from the main application</p>
          </td>
        </tr>
      `;
    }
  } catch (error) {
    console.error('Error loading crawls:', error);
    document.getElementById('crawls-tbody').innerHTML = `
      <tr>
        <td colspan="6" class="empty-state">
          <h3>Error loading crawls</h3>
          <p>${error.message}</p>
        </td>
      </tr>
    `;
  }
}
```

--------------------------------

### POST /settings

Source: https://context7.com/phialsbasement/librecrawl/llms.txt

Updates crawler settings for a specific user, subject to tier-based permission filtering.

```APIDOC
## POST /settings

### Description
Updates crawler settings for a specific user, subject to tier-based permission filtering.

### Method
POST

### Request Body
- **maxDepth** (integer) - Optional - Maximum crawl depth
- **maxUrls** (integer) - Optional - Maximum number of URLs to crawl
- **crawlDelay** (float) - Optional - Delay between requests
- **enableJavaScript** (boolean) - Optional - Whether to enable JS rendering
- **jsWaitTime** (integer) - Optional - Time to wait for JS
- **jsTimeout** (integer) - Optional - Timeout for JS rendering
- **userAgent** (string) - Optional - User agent string
- **includePatterns** (string) - Optional - Patterns to include
- **excludePatterns** (string) - Optional - Patterns to exclude
- **issueExclusionPatterns** (string) - Optional - Patterns to exclude for issues

### Response
#### Success Response (200)
- **success** (boolean) - Whether the update was successful
- **message** (string) - Status message
```

--------------------------------

### Set Plugin Container HTML

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/static/plugins/README.md

Use the provided container element to inject your UI, ensuring proper styling and scrolling behavior.

```javascript
container.innerHTML = `
  <div class="plugin-content" style="padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);">
    <!-- Your content here -->
  </div>
`;
```

--------------------------------

### Access Plugin Utilities

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/static/plugins/README.md

Built-in helper functions are available via the this.utils context within your plugin.

```javascript
this.utils.showNotification(message, type) // 'success', 'error', 'info'
this.utils.formatUrl(url)
this.utils.escapeHtml(text)
```

--------------------------------

### Load Crawl Action

Source: https://github.com/phialsbasement/librecrawl/blob/main/web/templates/dashboard.html

Sends a POST request to load a specific crawl. Prompts the user for confirmation, warning about potential data loss. On success, it sets session storage items and redirects to the main page.

```javascript
async function viewCrawl(crawlId) {
  if (!confirm('Load this crawl? Any unsaved current data will be lost.')) return;
  try {
    const response = await fetch(`/api/crawls/${crawlId}/load`, {
      method: 'POST'
    });
    const data = await response.json();
    if (data.success) {
      sessionStorage.setItem('force_ui_refresh', 'true');
      sessionStorage.setItem('loaded_urls', data.urls_count);
      sessionStorage.setItem('loaded_links', data.links_count);
      sessionStorage.setItem('loaded_issues', data.issues_count);
      window.location.href = '/';
    } else {
      alert('Error: ' + (data.error || data.message));
    }
  } catch (error) {
    alert('Error loading crawl: ' + error.message);
  }
}
```