### Quick Start ShadowCrawl with Docker Source: https://context7.com/devshero/shadowcrawl/llms.txt This bash script provides a quick start guide for setting up the full ShadowCrawl stack using Docker. It includes cloning the repository, navigating into the directory, and starting the services in detached mode with build. It also includes commands to verify that the SearXNG and Qdrant services are running. ```bash # Quick start with Docker (full stack) git clone https://github.com/DevsHero/shadowcrawl.git cd shadowcrawl docker compose -f docker-compose-local.yml up -d --build # Verify services are running curl http://localhost:8890/search?q=test # SearXNG curl http://localhost:6344 # Qdrant ``` -------------------------------- ### Run macOS Preflight Setup for HITL Source: https://github.com/devshero/shadowcrawl/blob/main/docs/VSCODE_SETUP.md This command executes a setup script within the 'shadowcrawl' container, specifically for macOS. It's particularly important when setting up the 'non_robot_search' feature for HITL (Human-In-The-Loop) functionality. ```bash docker compose -f docker-compose-local.yml exec -T shadowcrawl shadowcrawl --setup ``` -------------------------------- ### Shadowcrawl Setup Script Source: https://github.com/devshero/shadowcrawl/blob/main/docs/NON_ROBOT_SEARCH.md This bash script command navigates to the 'mcp-server' directory and executes the Shadowcrawl binary with the '--setup' flag. This is used for guiding users through macOS preflight checks, particularly for permission-related issues with global keys and kill-switches. ```bash cd mcp-server ./target/release/shadowcrawl --setup ``` -------------------------------- ### Start Qdrant Vector Database using Docker Compose Source: https://github.com/devshero/shadowcrawl/blob/main/docs/HISTORY_FEATURE.md This snippet demonstrates how to start the Qdrant vector database using Docker Compose. It ensures Qdrant is running in detached mode, making it available for the ShadowCrawl application. ```bash docker-compose up qdrant -d ``` -------------------------------- ### Manage Services with Docker Compose Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Manages the Shadowcrawl services using docker-compose. This includes starting all services, checking logs, and stopping services. Assumes a 'docker-compose-local.yml' file is present. ```bash # Start all services (SearXNG + MCP Server) docker compose -f docker-compose-local.yml up -d --build # Check logs docker compose -f docker-compose-local.yml logs -f shadowcrawl # Stop all services docker compose -f docker-compose-local.yml down ``` -------------------------------- ### Run Docker Stack and Local MCP Server (macOS) Source: https://github.com/devshero/shadowcrawl/blob/main/docs/NON_ROBOT_SEARCH.md Starts the necessary Docker dependencies (SearXNG, Qdrant, Browserless) and then runs the Shadowcrawl MCP stdio server locally. This setup allows the HITL tool to launch a native browser. ```bash docker compose -f docker-compose-local.yml up -d --build ``` ```bash cd mcp-server SEARXNG_URL=http://localhost:8890 \ QDRANT_URL=http://localhost:6344 \ RUST_LOG=info \ ./target/release/shadowcrawl-mcp ``` -------------------------------- ### Rust Async/Await Fundamentals Source: https://github.com/devshero/shadowcrawl/blob/main/sample-results/research_history_json.txt Demonstrates the basic usage of Rust's async and await keywords for asynchronous programming. It shows how to define asynchronous functions and await their results. This example is fundamental to understanding Rust's concurrency model. ```rust async fn example() -> i32 { let x = async { return 5; }; x.await } // To run this, you would typically use a runtime like tokio: // #[tokio::main] // async fn main() { // let result = example().await; // println!("Result: {}", result); // } ``` -------------------------------- ### Start Qdrant Vector Database Directly with Docker Source: https://github.com/devshero/shadowcrawl/blob/main/docs/HISTORY_FEATURE.md This snippet shows how to run the Qdrant vector database directly using a Docker command. It maps ports, mounts a volume for persistent storage, and names the container 'qdrant'. ```bash docker run -d \ -p 6333:6333 \ -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ --name qdrant \ qdrant/qdrant:latest ``` -------------------------------- ### Continue.dev MCP Server Configuration (YAML) Source: https://github.com/devshero/shadowcrawl/blob/main/docs/IDE_SETUP.md Configuration for Continue.dev to connect to the Shadowcrawl MCP server using YAML format. This file should be placed in `.continue/mcpServers/shadowcrawl.yaml`. Ensure the `command` and `args` correctly point to your Docker setup and the absolute path to `docker-compose-local.yml`. MCP can only be used in agent mode. ```yaml name: shadowcrawl version: 2.0.0-rc schema: v1 mcpServers: - name: shadowcrawl command: docker args: - compose - -f - /absolute/path/to/search-scrape/docker-compose-local.yml - exec - -i - -T - shadowcrawl - shadowcrawl-mcp ``` -------------------------------- ### Pull Published Docker Images Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Pulls Docker images from GitHub Container Registry. Examples show how to pull the latest image, a specific version by commit SHA, and a specific branch. ```bash # Pull the latest image docker pull ghcr.io/YOUR_USERNAME/shadowcrawl:latest # Pull specific version by commit SHA docker pull ghcr.io/YOUR_USERNAME/shadowcrawl:main-abc1234 # Pull specific branch docker pull ghcr.io/YOUR_USERNAME/shadowcrawl:main ``` -------------------------------- ### Build Docker Image Locally Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Builds the Docker image for the shadowcrawl-mcp service. This command should be run from the 'search-scrape' directory. ```bash cd search-scrape docker build -t shadowcrawl-mcp:latest . ``` -------------------------------- ### Configure Local MCP Server for ShadowCrawl with HITL Source: https://github.com/devshero/shadowcrawl/blob/main/README.md This JSONC configuration example sets up ShadowCrawl as a local MCP server, enabling HITL tools like 'fetch_web_high_fidelity' and 'non_robot_search'. It includes various environment variables for network settings, limits, proxy management, and HITL quality-of-life features. ```jsonc { "servers": { "shadowcrawl-local": { "type": "stdio", "command": "env", "args": [ "RUST_LOG=info", "// Optional (only if you run the full stack locally):", "SEARXNG_URL=http://localhost:8890", "BROWSERLESS_URL=http://localhost:3010", "BROWSERLESS_TOKEN=mcp_stealth_session", "QDRANT_URL=http://localhost:6344", "// Network + limits:", "HTTP_TIMEOUT_SECS=30", "HTTP_CONNECT_TIMEOUT_SECS=10", "OUTBOUND_LIMIT=32", "MAX_CONTENT_CHARS=10000", "MAX_LINKS=100", "// Optional (proxy manager):", "IP_LIST_PATH=/YOUR_PATH/shadowcrawl/ip.txt", "PROXY_SOURCE_PATH=/YOUR_PATH/shadowcrawl/proxy_source.json", "// HITL / non_robot_search quality-of-life:", "// SHADOWCRAWL_NON_ROBOT_AUTO_ALLOW=1", "// SHADOWCRAWL_RENDER_PROFILE_DIR=/YOUR_PROFILE_DIR", "// CHROME_EXECUTABLE=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser", "/YOUR_PATH/shadowcrawl/mcp-server/target/release/shadowcrawl-mcp" ] } } } ``` -------------------------------- ### View Shadowcrawl Container Logs Source: https://github.com/devshero/shadowcrawl/blob/main/docs/VSCODE_SETUP.md This command streams the logs from the 'shadowcrawl' container in real-time. It's useful for debugging issues related to the MCP server's operation. ```bash docker compose -f docker-compose-local.yml logs -f shadowcrawl ``` -------------------------------- ### Run SearXNG Rust Test Suite Source: https://github.com/devshero/shadowcrawl/blob/main/docs/SEARXNG_TUNING.md Command to execute the Rust test suite for the SearXNG MCP server. This requires navigating to the `mcp-server` directory before running the tests. ```bash cd mcp-server cargo test ``` -------------------------------- ### Run Shadowcrawl Container with Environment Variables Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Runs the Shadowcrawl Docker container, demonstrating how to configure it using environment variables. Key variables include SEARXNG_URL, QDRANT_URL, RUST_LOG, and MAX_CONTENT_CHARS. ```bash docker run -e SEARXNG_URL=http://searxng:8080 \ -e QDRANT_URL=http://qdrant:6334 \ -e RUST_LOG=info \ -e MAX_CONTENT_CHARS=10000 \ ghcr.io/YOUR_USERNAME/shadowcrawl:latest ``` -------------------------------- ### Claude Desktop MCP Server Configuration Source: https://github.com/devshero/shadowcrawl/blob/main/docs/IDE_SETUP.md Configuration for Claude Desktop to connect to the Shadowcrawl MCP server. This JSON file should be edited via Claude Desktop's settings. Ensure the `command` and `args` correctly point to your Docker setup and the absolute path to `docker-compose-local.yml`. ```json { "mcpServers": { "shadowcrawl": { "command": "docker", "args": [ "compose", "-f", "/absolute/path/to/search-scrape/docker-compose-local.yml", "exec", "-i", "-T", "shadowcrawl", "shadowcrawl-mcp" ] } } } ``` -------------------------------- ### Bash: Troubleshoot Qdrant Connection Issues Source: https://github.com/devshero/shadowcrawl/blob/main/docs/HISTORY_FEATURE.md A collection of bash commands to troubleshoot Qdrant connection problems. It includes checking if Qdrant is running via curl, starting Qdrant using docker-compose, and setting the QDRANT_URL environment variable. These steps help diagnose and resolve the 'Memory feature is not available' error. ```bash # Check Qdrant is running curl http://localhost:6333/collections # If not, start it docker-compose up qdrant -d # Set environment variable export QDRANT_URL=http://localhost:6333 ``` -------------------------------- ### Configure VS Code MCP Server Settings Source: https://github.com/devshero/shadowcrawl/blob/main/docs/VSCODE_SETUP.md This JSON configuration snippet is added to VS Code's workspace settings (settings.json). It defines an MCP server named 'shadowcrawl' that connects to the running Docker container, specifying the command and arguments to execute the 'shadowcrawl-mcp' binary. ```json { "mcp.servers": { "shadowcrawl": { "command": "docker", "args": [ "compose", "-f", "/absolute/path/to/search-scrape/docker-compose-local.yml", "exec", "-i", "-T", "shadowcrawl", "shadowcrawl-mcp" ], "env": { "RUST_LOG": "info" } } } } ``` -------------------------------- ### Test Docker Image Locally Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Runs the built Docker image locally. Two modes are provided: an HTTP server mode and a standard input/output (stdio) server mode. Environment variables like SEARXNG_URL and RUST_LOG can be configured. ```bash # Run HTTP server (host 5001 -> container 5000) docker run --rm \ -e SEARXNG_URL=http://localhost:8888 \ -e RUST_LOG=info \ -p 5001:5000 \ shadowcrawl-mcp:latest # Or run MCP stdio server docker run --rm -it \ -e SEARXNG_URL=http://localhost:8888 \ shadowcrawl-mcp:latest \ shadowcrawl-mcp ``` -------------------------------- ### Rollback to a Specific Docker Image Version Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Demonstrates how to roll back to a previous version of the Shadowcrawl service by pulling and running a specific Docker image tagged with a commit SHA. ```bash # Pull and run specific version docker pull ghcr.io/YOUR_USERNAME/shadowcrawl:main-abc1234 docker run ghcr.io/YOUR_USERNAME/shadowcrawl:main-abc1234 ``` -------------------------------- ### Enable Non-Robot Search in Docker Build Source: https://github.com/devshero/shadowcrawl/blob/main/docs/IDE_SETUP.md This command rebuilds the Docker image with the `non_robot_search` feature enabled, which launches a local GUI browser. This requires compiling the feature into the container image. Note that typical Docker deployments may not be able to open the host browser. ```bash SHADOWCRAWL_CARGO_FEATURES=non_robot_search \ docker compose -f docker-compose-local.yml up -d --build ``` -------------------------------- ### Execute Shadowcrawl MCP Server via Docker Exec Source: https://github.com/devshero/shadowcrawl/blob/main/docs/IDE_SETUP.md This command executes the `shadowcrawl-mcp` binary within the running Shadowcrawl Docker container, connecting it to stdio. This is the recommended method for most clients. Ensure the Docker stack is running. ```bash docker compose -f docker-compose-local.yml exec -i -T shadowcrawl shadowcrawl-mcp ``` -------------------------------- ### Trigger Versioned Docker Build Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Creates a Git tag to trigger a versioned Docker build. This process results in images being tagged with the specific version (e.g., v2.0.0-rc) in addition to 'latest'. ```bash # Create version tag to trigger versioned build git tag v2.0.0-rc git push origin v2.0.0-rc # This creates images with tags like: v2.0.0-rc, latest ``` -------------------------------- ### Run SearXNG Release Validation Source: https://github.com/devshero/shadowcrawl/blob/main/docs/SEARXNG_TUNING.md These commands are used to perform release validation checks for SearXNG. The first command checks the health endpoint, and the second retrieves and displays the first few lines of the MCP tools output. ```bash curl -fsS http://localhost:5001/health curl -fsS http://localhost:5001/mcp/tools | head ``` -------------------------------- ### Configure Google Mobile UI in SearXNG Source: https://github.com/devshero/shadowcrawl/blob/main/docs/SEARXNG_TUNING.md This snippet shows how to configure the Google engine in SearXNG to use its mobile user interface. Setting `use_mobile_ui: true` can reduce UI/JS friction and potentially decrease blocks, though it may alter snippet richness and result metadata. The alternative, `use_mobile_ui: false`, is also shown. ```yaml - name: google engine: google use_mobile_ui: true ``` ```yaml - name: google engine: google use_mobile_ui: false ``` -------------------------------- ### Check Shadowcrawl Health Status Source: https://github.com/devshero/shadowcrawl/blob/main/docs/VSCODE_SETUP.md This command sends an HTTP GET request to the health check endpoint of the Shadowcrawl service. It returns a success status if the service is running correctly. ```bash curl -fsS http://localhost:5001/health ``` -------------------------------- ### Rust: Auto-initialize Qdrant Collection on Startup Source: https://github.com/devshero/shadowcrawl/blob/main/docs/HISTORY_FEATURE.md This Rust code snippet demonstrates the auto-initialization process for a Qdrant collection on application startup. It connects to Qdrant, checks for collection existence, creates it with specified vector dimensions and distance if it doesn't exist, and loads the fastembed model. This is triggered when the QDRANT_URL environment variable is set. ```rust // On startup if QDRANT_URL is set: // - Connect to Qdrant // - Check if collection exists // - If not: Create with 384-dim vectors, cosine distance // - Load fastembed model ``` -------------------------------- ### Bash: Verify Qdrant Accessibility and Logs Source: https://github.com/devshero/shadowcrawl/blob/main/docs/HISTORY_FEATURE.md This bash script provides commands to verify Qdrant's accessibility and check its logs. It includes checking running containers, viewing Qdrant logs, and ensuring port 6333 is not blocked. These are crucial steps for diagnosing 'Failed to initialize memory' errors. ```bash # Verify Qdrant is accessible docker ps | grep qdrant # Check logs docker logs qdrant # Ensure port 6333 is not blocked netstat -an | grep 6333 ``` -------------------------------- ### Clone and Launch ShadowCrawl with Docker Source: https://github.com/devshero/shadowcrawl/blob/main/README.md This command sequence clones the ShadowCrawl repository and launches the full stack using Docker Compose. This method is fast but does not support the HITL/GUI renderer. ```bash git clone https://github.com/DevsHero/shadowcrawl.git cd shadowcrawl docker compose -f docker-compose-local.yml up -d --build ``` -------------------------------- ### Proxy Manager Actions Source: https://github.com/devshero/shadowcrawl/blob/main/docs/PROXY_MANAGER_IMPLEMENTATION.md This section details the available actions for the `proxy_manager` MCP tool, including fetching, listing, checking status, switching, and testing proxies. ```APIDOC ## Proxy Manager Actions ### Description This API exposes actions for the `proxy_manager` tool to manage proxy lists. ### Actions - **grab** - **Description**: Fetches proxies from `proxy_source.json` and optionally writes them into `ip.txt`. - **Method**: POST (assumed, as it modifies state) - **Endpoint**: `/proxy_manager/grab` - **Parameters**: - **Query Parameters**: - `output_path` (string) - Optional - Path to write the fetched proxies. - **list** - **Description**: Lists proxies currently present in `ip.txt`. - **Method**: GET - **Endpoint**: `/proxy_manager/list` - **status** - **Description**: Shows the current status of the proxy manager. Requires `ip.txt` to be available. - **Method**: GET - **Endpoint**: `/proxy_manager/status` - **switch** - **Description**: Selects the best proxy from the registry built from `ip.txt`. - **Method**: POST (assumed, as it selects a proxy) - **Endpoint**: `/proxy_manager/switch` - **test** - **Description**: Tests a single proxy against a target URL. - **Method**: POST - **Endpoint**: `/proxy_manager/test` - **Parameters**: - **Request Body**: - `proxy` (string) - Required - The proxy to test (e.g., `http://1.2.3.4:8080`). - `target_url` (string) - Required - The URL to test the proxy against. ``` -------------------------------- ### Restart SearXNG Stack Source: https://github.com/devshero/shadowcrawl/blob/main/docs/SEARXNG_TUNING.md Command to restart the SearXNG Docker stack, applying configuration changes. This command rebuilds the images and restarts the containers in detached mode. ```bash docker compose -f docker-compose-local.yml up -d --build ``` -------------------------------- ### Build Non-Robot Search Binaries (Cargo) Source: https://github.com/devshero/shadowcrawl/blob/main/docs/NON_ROBOT_SEARCH.md Builds the `shadowcrawl` and `shadowcrawl-mcp` binaries with the `non_robot_search` Cargo feature enabled. This is necessary to include the Non-Robot Search functionality in your Shadowcrawl installation. ```bash cd mcp-server cargo build --release --features non_robot_search --bin shadowcrawl --bin shadowcrawl-mcp ``` -------------------------------- ### Build ShadowCrawl MCP Server with HITL Feature Source: https://github.com/devshero/shadowcrawl/blob/main/README.md This command builds the ShadowCrawl MCP stdio server with the 'non_robot_search' feature enabled, which is required for the HITL/GUI renderer. The compiled binary will be located in the 'mcp-server/target/release/' directory. ```bash cd mcp-server cargo build --release --bin shadowcrawl-mcp --features non_robot_search ``` -------------------------------- ### Build and Run Shadowcrawl with Non-Robot Search Feature Source: https://context7.com/devshero/shadowcrawl/llms.txt Instructions for building the Shadowcrawl MCP server with the non-robot search feature enabled and running it with specific environment configurations. This includes setting environment variables for SearxNG, Qdrant, and the Chrome executable path. ```bash # Build with HITL feature enabled (macOS) cd mcp-server cargo build --release --bin shadowcrawl-mcp --features non_robot_search # Run with environment configuration SEARXNG_URL=http://localhost:8890 \ QDRANT_URL=http://localhost:6344 \ SHADOWCRAWL_NON_ROBOT_AUTO_ALLOW=1 \ CHROME_EXECUTABLE="/Applications/Brave Browser.app/Contents/MacOS/Brave Browser" \ ./target/release/shadowcrawl-mcp ``` -------------------------------- ### Trigger Docker Build via Commit Message Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Triggers an automated Docker build and push to GitHub Container Registry by including '[build]' at the end of a commit message. This is a convention for initiating CI/CD pipelines. ```bash # Commit message ending with [build] to trigger Docker build git commit -m "Release v2.0.0-rc [build]" git push # Repo convention: keep [build] at the end so it's easy to grep in history. ``` -------------------------------- ### Rust Async/Await Syntax for Asynchronous Operations Source: https://github.com/devshero/shadowcrawl/blob/main/sample-results/scrape_url.txt Demonstrates the basic `async` and `await` keywords in Rust, which are fundamental for writing asynchronous code. These keywords enable functions to pause execution and yield control to the runtime when waiting for I/O operations or other asynchronous tasks to complete. ```rust async fn my_async_function() -> String { // ... asynchronous operations ... "Operation complete!".to_string() } #[tokio::main] async fn main() { let result = my_async_function().await; println!("{}", result); } ``` -------------------------------- ### Skip Docker Build via Commit Message Source: https://github.com/devshero/shadowcrawl/blob/main/docs/DOCKER_DEPLOYMENT.md Commits without the '[build]' suffix will skip the automated Docker build process, allowing for regular code updates or documentation changes without triggering a full deployment. ```bash # Normal commits without [build] will skip Docker build git commit -m "Update docs" git push git commit -m "Fix typo" git push ```